Re: regex, negations, grep, find and replace (a few questions)



pk wrote:

(I'm changing some parts of the message due to spam filters on the
server rejecting the message: substitute "**" with "tt", and "@@"
with "a ")

jameshanley39@xxxxxxxxxxx wrote:

Here is a regex to match a link. Testing the regex using echo and
piping it
$ echo "abc<@@href=\"h**p://www.blah.com\">click here</a>" | grep -P
'<@@href="h**p://www.*>.*</a>'

a)
A problem is that grep works line by line. So
- it includes the abc before the link

Not if you use -o.



thanks for that one, it is perfect.
note- grep still works line by line though!

- if the pattern/link were broken over 2 lines, it would not match
(other tests suggest that)

I really don`t want it to include the abc at the beginning of the
line where the pattern matches.

Is there anything like grep that does not have this problem/feature
? or where that problem/feature can be turned off?

Use -o (man grep). For the multi-line problem, just remove the
newline characters before piping the text to grep, eg

echo $text | tr -d '\n' | grep 'your_pattern'


that is really interesting, thanks..
that -o is ideal (and that tr command also shows me how to convert
windows-->unix , since it just involves removing the CR '\r')

note- that command would return all $text or all the file, since you
missed the -o from grep. But I only know about -o since you mentioned
it! I think you intended to include it.



BTW, your pattern is not correct. First, it does not match links
whose url does not start with "h**p://www". Second, due to sed
"greedy" match behavior, if you had a text with more than one link,
it would match from the beginning of the first link to the end of the
last. Something like this would probably be a better pattern:

'<@@href="h**p://www[^>]*">[^<]*</a>'


I will give sed a go eventually.

I actually dealt with this find/replace or extraction problem by using
the regex I mentioned, but with grep -o , making a new file.
Then, since it was intented html code I used an old feature of MS Word,
which is probably in linux`s Open Office too. I made the font Courier
(a fixed width font). I held ALT and highlighted in a rectangle, what
I wanted to select, and removed characters i did not want from each
link in one go. I made the font size tiny and the zoom high, to
remove wordwrap.

I am sure that method will not last me forever, I will get to know sed
! (and awk). Despite the interesting workarounds one can use to avoid
them! (DOS users used to do 'bouncing off the command prompt' to
prepend text to the beginning of lines. I had heard of sed years ago..)
Just thus far I have managed without it, as I did now!



<snip>
.



Relevant Pages

  • Re: EOL Anchor under Windows
    ... PL has a regex library, ... If "abc" is a matched string, then grep may be trying to match to ... I would think that most greps consider the "system" EOL as EOL. ...
    (comp.lang.perl.misc)
  • Re: EOL Anchor under Windows
    ... PL has a regex library, ... If "abc" is a matched string, then grep may be trying to match to ... I would think that most greps consider the "system" EOL as EOL. ...
    (comp.programming)
  • Re: Do we have file usage command?
    ... man -k processes | grep using | grep files ... When vim editor is editing a file named ABC, ... ABC.swp not ABC itself. ... Now I feel lsof could be usable as a file-usage-report command. ...
    (comp.os.linux.misc)
  • Re: fgrep: search for only abc not aabc
    ... I am actually using grep to find all lines matching abc, ... pattern, so the 1st example above does not work as it is not the 1st ...
    (comp.unix.shell)
  • Re: sed problem
    ... grep versions would be better, but they won't do ... http://andor.dropbear.id.au/~paulcol Asking for technical help in newsgroups? ...
    (comp.os.linux.misc)