Re: sed and extended-character-set
- From: Peter J Ross <pjr@xxxxxxxxxxxxxxx>
- Date: Tue, 29 Jan 2008 01:26:55 +0000
In comp.os.linux.misc on Mon, 28 Jan 2008 14:23:38 -0500, Bill Marcum
<marcumbill@xxxxxxxxxxxxx> wrote:
On 2008-01-28, Koppe74 <koppe74@xxxxxxxxx> wrote:
I'm running Mandriva 2007 community (but have gotten the same
problem with other distros too), and have a problem with 'sed'.
I'm trying to use sed's s (substitute) command to manipulate
parts of HTML-documents, but run into a snag whenever i
encounter the specialized left and right single-and double-quote
characters the extended (Windows?) character-sets uses.
'sed' obviously chockes on these characters, and are unable
to match anything on lines containing these characters. As
the places these characters occure usually are specified with
a regexp matching any character (.*), it shouldn't matter, but
it obviously does. The lines are just left unaltered by 'sed'.
My question is therefore...
*Is there any enviromental-variables I can set that will make
'sed' work with an extended character-set? If there is, to
what value should it be set (I'm using Western European
coding -- ISO-????-1).
LANG=C
*Is there a command I can use to change/prune away theseYou can use iconv or recode to convert the quotes to ASCII quote
quotes -- perhaps the 'tr' comman? How?
characters, but to do this you most know or guess the character set in
which the document is written (windows-1252 is probably the most common,
but there is also windows-1251 (cyrillic), or utf-8, or whatever is the
windows equivalent of iso-8859-2).
I wonder if this would help.
<http://osdir.com/ml/editors.sed.user/2005-12/msg00003.html>
--
PJR :-)
.
- References:
- Re: sed and extended-character-set
- From: Bill Marcum
- Re: sed and extended-character-set
- Prev by Date: Mail Server
- Next by Date: Idiot's guide to Compiz-fusion
- Previous by thread: Re: sed and extended-character-set
- Next by thread: Pytick or Invest with Fedora 6
- Index(es):
Relevant Pages
|