Re: regex, negations, grep, find and replace (a few questions)
- From: "jameshanley39@xxxxxxxxxxx" <jameshanley39@xxxxxxxxxxx>
- Date: 26 Nov 2007 14:45:27 GMT
pk wrote:
(I'm changing some parts of the message due to spam filters on the
server rejecting the message: substitute "**" with "tt", and "@@"
with "a ")
jameshanley39@xxxxxxxxxxx wrote:
Here is a regex to match a link. Testing the regex using echo and
piping it
$ echo "abc<@@href=\"h**p://www.blah.com\">click here</a>" | grep -P
'<@@href="h**p://www.*>.*</a>'
a)
A problem is that grep works line by line. So
- it includes the abc before the link
Not if you use -o.
thanks for that one, it is perfect.
note- grep still works line by line though!
- if the pattern/link were broken over 2 lines, it would not match
(other tests suggest that)
I really don`t want it to include the abc at the beginning of the
line where the pattern matches.
Is there anything like grep that does not have this problem/feature
? or where that problem/feature can be turned off?
Use -o (man grep). For the multi-line problem, just remove the
newline characters before piping the text to grep, eg
echo $text | tr -d '\n' | grep 'your_pattern'
that is really interesting, thanks..
that -o is ideal (and that tr command also shows me how to convert
windows-->unix , since it just involves removing the CR '\r')
note- that command would return all $text or all the file, since you
missed the -o from grep. But I only know about -o since you mentioned
it! I think you intended to include it.
BTW, your pattern is not correct. First, it does not match links
whose url does not start with "h**p://www". Second, due to sed
"greedy" match behavior, if you had a text with more than one link,
it would match from the beginning of the first link to the end of the
last. Something like this would probably be a better pattern:
'<@@href="h**p://www[^>]*">[^<]*</a>'
I will give sed a go eventually.
I actually dealt with this find/replace or extraction problem by using
the regex I mentioned, but with grep -o , making a new file.
Then, since it was intented html code I used an old feature of MS Word,
which is probably in linux`s Open Office too. I made the font Courier
(a fixed width font). I held ALT and highlighted in a rectangle, what
I wanted to select, and removed characters i did not want from each
link in one go. I made the font size tiny and the zoom high, to
remove wordwrap.
I am sure that method will not last me forever, I will get to know sed
! (and awk). Despite the interesting workarounds one can use to avoid
them! (DOS users used to do 'bouncing off the command prompt' to
prepend text to the beginning of lines. I had heard of sed years ago..)
Just thus far I have managed without it, as I did now!
<snip>
.
- Follow-Ups:
- References:
- regex, negations, grep, find and replace (a few questions)
- From: jameshanley39@xxxxxxxxxxx
- Re: regex, negations, grep, find and replace (a few questions)
- From: pk
- regex, negations, grep, find and replace (a few questions)
- Prev by Date: Re: Beagle chewing up CPU!
- Next by Date: Re: regex, negations, grep, find and replace (a few questions)
- Previous by thread: Re: regex, negations, grep, find and replace (a few questions)
- Next by thread: Re: regex, negations, grep, find and replace (a few questions)
- Index(es):
Relevant Pages
|