Re: grep for metacharacters



hi tim,

well, it turns out i misspoke when i said i was able to match the tab
character -- i happened to have "t" in the line (and only the line) with the
tab character. thus, i also cannot seem to get grep or egrep to match \t, \f,
etc.

in running "info grep" and "info regrex" i can see no reference to support for
these escaped special characters. worse, i can see no reference to specifying
octal or hex codes for characters (\0011 or \x09), which might have been
another way to specify these special characters.

so, rather shockingly, it seems like Linux's grep/egrep DON'T support these
special escape sequences...

two workarounds come to mind:

if you want to stick with grep, use the -f option. i can put a single tab or
formfeed character in a file and match based on that:

grep -f tabchar.txt myfile.txt
grep -f ffchar.txt myfile.txt

not especially "neat", but it works, works in scripts, and avoids any issues
of having to further escape the pattern.

another option is to switch to a program that has the appropriate regex
support, such as perl:

perl -ne '/\011/ and print;' myfile.txt

matches tabs in myfile.txt.

hope this helps.

doug


On Wed, 12 Sep 2007 07:23:08 -0400, Tim Boyer <tim@xxxxxxxxxxxxxx> wrote:
On Wed, 12 Sep 2007 02:34:43 +0000 (UTC), Doug Morse <morse@xxxxxxxxx> wrote:

very strange. have you tried quoting the regex:

grep -c "\t"

you probably know this, but just in case: if you're running this grep
from, say, a bash script, you might have to escape the escape
character:

grep -c \\t

and this can go on ad naseum, e.g., a subshell in a script might need:

`grep -c \\\\t`

also, i should point out that i didn't read your original post close
enough and that i should note a correction to it. "grep -c \t" will
count the number of LINES having a tab character, NOT the number of
tab characters (as you wrote).

lastly, i just ran "grep -c \t" and "egrep -c \t" on RHEL4 with a file
containing tabs and it worked just fine. seems like it's gotta be a
backslash escaping issue, or perhaps your file doesn't have any tabs?



It's one I constructed just for this - it's six lines of ^M, ^L, tabs, and
numbers. And egrep in RHEL5 doesn't see _any_ of them:

[root@dg printouts]# egrep -c \t testfile.txt
0
[root@dg printouts]# egrep -c \\t testfile.txt
0
[root@dg printouts]# egrep -c "\t" testfile.txt
0

Now, RH support gave me a way to do it - if I'm looking for, say, page breaks
in that file, type at the command line a ^v^L to insert a page break character
in my 'egrep'. And _that_ works:

[root@dg printouts]# egrep -c ^L testfile.txt
6
[root@dg printouts]# egrep -c ^M testfile.txt
12

But you can't do it in a script, and it won't work for, say, tabs or CRs.
.



Relevant Pages

  • Re: Formatting index entries
    ... replace each space with a tab character ... Copy the whole cell ... use Edit>Replace to replace every tab with a space. ... you want to rip all the Index tags out of the main document. ...
    (microsoft.public.mac.office.word)
  • Re: Calculations Result Sometimes Displays 000, Sometimes 444
    ... > when I was addressing your mention of strange formatting options, ... the fill character for each tab was set to 4 on one of the ... added to a layout (not duplicate or copy / paste) will automatically ...
    (comp.databases.filemaker)
  • Re: Renaming documents from text within the document using existin
    ... StrName = ActiveDocument.Paragraphs.Range.Text ... character was found because you did trim off the first six characters. ... You can use the replace command to remove tab characters from a string and I ...
    (microsoft.public.word.vba.general)
  • Re: How Tabs In Memos Work
    ... so your best option is to parse each line character ... > memo lines contain tab characters. ... width of the system font. ... while using div 2 does. ...
    (borland.public.delphi.language.objectpascal)
  • Re: Search & Replace
    ... beings at the BBEdit talk email list and see what they say. ... character constructs and not part of grep. ... these need to be escaped regardless of whether "Use Grep" is checked as ... To find a string containing "\f", you have to escape the backslash: ...
    (comp.sys.mac.apps)