Re: sed and extended-character-set
- From: Bill Marcum <marcumbill@xxxxxxxxxxxxx>
- Date: Mon, 28 Jan 2008 14:23:38 -0500
On 2008-01-28, Koppe74 <koppe74@xxxxxxxxx> wrote:
I'm running Mandriva 2007 community (but have gotten the same
problem with other distros too), and have a problem with 'sed'.
I'm trying to use sed's s (substitute) command to manipulate
parts of HTML-documents, but run into a snag whenever i
encounter the specialized left and right single-and double-quote
characters the extended (Windows?) character-sets uses.
'sed' obviously chockes on these characters, and are unable
to match anything on lines containing these characters. As
the places these characters occure usually are specified with
a regexp matching any character (.*), it shouldn't matter, but
it obviously does. The lines are just left unaltered by 'sed'.
My question is therefore...
*Is there any enviromental-variables I can set that will make
'sed' work with an extended character-set? If there is, to
what value should it be set (I'm using Western European
coding -- ISO-????-1).
LANG=C
*Is there a command I can use to change/prune away theseYou can use iconv or recode to convert the quotes to ASCII quote
quotes -- perhaps the 'tr' comman? How?
characters, but to do this you most know or guess the character set in
which the document is written (windows-1252 is probably the most common,
but there is also windows-1251 (cyrillic), or utf-8, or whatever is the
windows equivalent of iso-8859-2).
.
- Follow-Ups:
- Re: sed and extended-character-set
- From: Peter J Ross
- Re: sed and extended-character-set
- Prev by Date: Re: (OT) Nokia Buys Trolltech
- Next by Date: Re: UGH, TOO MUCH SPAM
- Previous by thread: sar and swapping activity
- Next by thread: Re: sed and extended-character-set
- Index(es):
Relevant Pages
|