Character set issues



Karl Auer wrote in gmane.linux.ubuntu.user
about: Re: Macros in OpenOffice 2

On Sat, 2006-04-22 at 01:05 -0500, Tommy Trussell wrote:
Viel SpaÃ?!

[The above line is readable as expected here, fwiw]

I'm butting in here to say the special characters showed up fine here
-- I'm using gmail. Whatever email program you are using may be set to
a different character set by default. I just looked at the headers of
his message, and his character set is charset=iso-8859-1

Selecting iso-8859-1 didn't help, but selecting UTF-8 did.

I have a hunch that my newsreader (slrn) is actually lying about my
articles being ISO-8859-1, but unfortunately, as far as I'm aware,
that's as advanced as it gets (as least configuration-wise). It's a case
of choosing either this or some other now-deprecated character sets,
Unicode isn't an option. The slrn FAQ suggests that slrn will support
Unicode when slang 2.0 is released, which implies it can't, yet.


I had previously dist-upgraded to Breezy and had no problems reading
European characters in news with slrn, but then I suffered a disk
problem that meant that I had to reinstall the OS from scratch.

Strangely, since then, all non-ASCII characters in ISO-8859-1 articles
have been replaced by hexadecimal character codes, and I've also noticed
that, as UTF-8 posting becomes more commonplace, characters in articles
posted in *UTF-8* *are* readable in slrn (including not just European
letters, but Japanese/Chinese as well). However, this only seems to work
if articles are sent as raw text, unencoded: articles which are
base64-encoded display as the base64-encoding, as slrn can't decode that
(because it's a text newsreader, and news wasn't designed for non-text
articles).

I guess the full reinstall of Breezy must have kicked in some Unicode
support somewhere in the system that a normal upgrade hadn't previously
included. I don't understand why this is screwing with
correctly-identified ISO-8859-1 articles, though: presumably my terminal
is treating _everything_ as UTF-8 whether the source likes it or not. I
guess this is a risk of decoupling character display (terminal) from
file reading (newsreader), until everything becomes Unicode-aware and
compliant?

I'm therefore surmising that my terminal (GNOME terminal) is handling
UTF-8 OK, that my editor (vim) is handling UTF-8 OK (and, I guess, must
be saving files as such, otherwise the article would have been correctly
readable by others as ISO-8859-1?), and because Unicode is
backwards-compatible, even if slrn can't handle UTF-8 itself, the
characters that it 'thinks' it 'displays' as ISO-8859-1 are being
correctly recognised and converted "at display time" by my terminal.
Newsreaders or mailers with fuller character set support see the article
claiming to be ISO-8859-1, and try to display it as such, resulting in
mangled characters as that is not what they are?


--
| David M, __________| replyto email valid <365 days | en, fr, (de) |
| Edinburgh, Scotland. | but on-list replies preferred | ________ |
Please trim quoted text & interleave reply comments for readability. <


--
ubuntu-users mailing list
ubuntu-users@xxxxxxxxxxxxxxxx
https://lists.ubuntu.com/mailman/listinfo/ubuntu-users



Relevant Pages

  • Re: VB - Ascii to Unicode and then Unicode to UTF-8 conversion (Very desperate!!)
    ... Latin together) then you have to use a Unicode column type. ... AscW returns the real Unicode character ... for Chinese characters, ... then the next thing to worry about is your CSV file. ...
    (microsoft.public.vb.general.discussion)
  • Re: Unicode Support
    ... if two Unicode strings are the same? ... UTF-16 is basically telling everyone "ok we all got to start ... character, and will likely support *both* endians. ... UTF-8 encodings are also easy to learn to ...
    (alt.lang.asm)
  • Re: Determining if a string is Unicode
    ... there's nothing magic about Unicode. ... where each character occupies 2 bytes, as opposed to a Single-Byte Character ... You could load up a string with rubbish, ... > INF file like so: ...
    (microsoft.public.vb.general.discussion)
  • Re: KANJD212
    ... >>Who decides the factors and what are their criteria, Unicode? ... But once a character is defined/get a codepoint in Unicode it ... standard modifies the codepoint of the kanji to a totally new ... I can use a code like JIS X0208 along with a font ...
    (sci.lang.japan)
  • Re: Enhanced Unicode support for "Go" tools
    ... the point to remember is that UNICODE is a _character ... It's the fonts, the OS and the application which work together ... society for the protection of French from English ...
    (alt.lang.asm)