Re: manpages display a few strange characters



In article <Xns9A8C6517270736650A1FC0D7811DDBC81@xxxxxxxxxxxxx>,
Rahul wrote:

If I turned UTF-8 encoding on in SecureCRT the manpage displayed fine.
But that messed up other stuff: I have some commands that display
German translations on my console and to get those umlauts right I was
using export LANG=de_DE. (called before the script and reset to
en_US.UTF-8 after.)

The idea behind the LANG environment variable is that programs
shoukd respect the setting as much as possible. In the case of
"LANG=en_US.UTF-8" that means an application should display
it's messages in US-english if available, or else choose a
language that is more or less similar, for example UK-english.

Of course, the UTF-8 part of LANG should also be respected,
but unfortunately there are still some applications around
that completely disregard LANG (and LC_ALL). One reasonably
easy workaround is using recode in shell functions. If, for
example, you have a command named 'foo' that is limited to
producing output in ISO-8859-14 (aka latin9) encoding, you
could define a shell function:

foo () { /usr/local/bin/foo "$@" | recode l9..u8 }

Each time you run 'foo' the latin9 output is translated
to utf8 encoding by recode (a GNU utility). The error
messages of 'foo' will not be effected, but assuming
those will only occur infrequently, the above could be
an acceptable temporary solution, until an upgrade
is available.

Getting the $LANG drives me nuts! I think I got it till
I hit the next new program that messes it up again!

Just a *little* patience. All programs will support LANG,
.... in 50 or 60 years :-)

BTW, what's the merits / demerits of UTF-or-not-to-UTF?

Unicode is a so called 'wide character' standard. Instead of
only 7 or 8 bits, characters can be up to 31 bits wide. This
is enough to include characters from all languages worldwide.
So, if you write a document in english, but want to use
quotations from chinese text, unicode makes it possible.
Another example is text with mathematical formulae; many
math symbols are available in Unicode.

For storage (e.g. file) or transport (e.g. network) several
unicode transforms have been defined, UTF-8 is the most
efficient one for latin based languages, and therefore also
the most widespread/popular one. The unicode latin symbols
take only one or two bytes each in utf8, characters from
asian languages may need upto four bytes each.

Regards,
Marcel

.



Relevant Pages

  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)
  • Unicode ?!
    ... SCII is centralized on HINDI script. ... Indian government blindly gave the ISCII to UNICODE. ... Indian languages doesn't get the facility of binary sorting. ... The characters are not in order. ...
    (soc.culture.indian)
  • Unicode ?!
    ... ISCII is centralized on HINDI script. ... Indian government blindly gave the ISCII to UNICODE. ... Indian languages doesn't get the facility of binary sorting. ... The characters are not in order. ...
    (soc.culture.indian.gujarati)
  • Unicode ?!
    ... ISCII is centralized on HINDI script. ... Indian government blindly gave the ISCII to UNICODE. ... Indian languages doesn't get the facility of binary sorting. ... The characters are not in order. ...
    (soc.culture.indian.kerala)
  • Unicode ?!
    ... SCII is centralized on HINDI script. ... Indian government blindly gave the ISCII to UNICODE. ... Indian languages doesn't get the facility of binary sorting. ... The characters are not in order. ...
    (soc.culture.tamil)