Re: sed and extended-character-set



On 2008-01-28, Koppe74 <koppe74@xxxxxxxxx> wrote:


I'm running Mandriva 2007 community (but have gotten the same
problem with other distros too), and have a problem with 'sed'.

I'm trying to use sed's s (substitute) command to manipulate
parts of HTML-documents, but run into a snag whenever i
encounter the specialized left and right single-and double-quote
characters the extended (Windows?) character-sets uses.

'sed' obviously chockes on these characters, and are unable
to match anything on lines containing these characters. As
the places these characters occure usually are specified with
a regexp matching any character (.*), it shouldn't matter, but
it obviously does. The lines are just left unaltered by 'sed'.

My question is therefore...
*Is there any enviromental-variables I can set that will make
'sed' work with an extended character-set? If there is, to
what value should it be set (I'm using Western European
coding -- ISO-????-1).

LANG=C

*Is there a command I can use to change/prune away these
quotes -- perhaps the 'tr' comman? How?

You can use iconv or recode to convert the quotes to ASCII quote
characters, but to do this you most know or guess the character set in
which the document is written (windows-1252 is probably the most common,
but there is also windows-1251 (cyrillic), or utf-8, or whatever is the
windows equivalent of iso-8859-2).

.



Relevant Pages

  • Weird, Wise or Wacky. (MST character quotes)
    ... Below are listed 10 quotes from characters in MSTed movies. ... The sorcerer "Lodac" to Sir Branden as Lodac reneges ...
    (rec.arts.tv.mst3k.misc)
  • Re: Weird characters in a textbox
    ... It may be automatically replacing text as you type, e.g., Printers' Quotes for Regular Quotes. ... you'll have to look at the Character Map to see if the fancier quotes are available in that font. ... When I click the zoom box the>characters ...
    (microsoft.public.access.formscoding)
  • Re: Unique fields using AAAA instead of/as well as AutoNum
    ... Just use the method with QuoteID when entering quotes to generate ... the four character unique field. ... Keep your autonumber. ... >> characters based on the number of records entered since your last ...
    (microsoft.public.access.tablesdbdesign)
  • Re: Letter AND Merge by Category
    ... The one thing you do need to be sure of is that every pair of are not the ordinary characters you can type on the keyboard, but the special field code braces that you can insert using ctrl-F9 ... The instructions explain about the quotes, ... Example: SET Donor1 {Mergefield "Donor"} ... The constant in the mail merge is the Sales Manager name. ...
    (microsoft.public.word.mailmerge.fields)
  • Re: Quotes, Tilde, Caret Symbol appears only when clicked twice !
    ... Quotes and Carets are a common way of typing ... characters like e' and ... The strange thing is that when I type out any of these symbols twice ...
    (comp.databases.ms-access)