Re: [kde] Character sets / encoding



Hi Anne,

On 2009-09-10 14:26, Anne Wilson wrote:
On Wednesday 09 September 2009 21:35:05 James Tyrer wrote:
[...]
The >=128 glyphs which I commonly user are: äëïöüñ. Since I am sending
this email in ISO 8859-1, these characters will not appear correctly if
viewed with UTF-8.

I have found that the only solution to this problem is to set the code
page for incoming mail to either ISO 8859-1 or IBM cp 1252.

Not sure what's happening James. If the characters you typed were umlauted,
as they seem to be, then they are reading correctly on this netbook (I'll
check on another machine later). Here KMail is set to use the following

utf-8
utf-8 (locale)
us-ascii
iso-8859-1

Now whether that means that if one doesn't fit it falls back to the next one, I
don't know. What do you think?

The real problem with charsets and encodings is, that you always have to tell
the interpreting program (Browser, Mail/News reader, ... whichever program
wants to show the bits from the net in a readable form) which Charset (and
encoding) has actually been used to encode the message, so that it can choose
the matching decoder.

If this information is not given, there is no other way than guessing. And
everybody knows that computers are not good at that. How would a computer know
how the string 'äëïöüñ' from James should actually look like, if he hadn't had
specified the encoding in the header (open the source code of his mail, and you
will see the following line: Content-Type: text/plain; charset="iso-8859-1").
The computer could then (for example) have guessed that those bits were
supposed to mean "潆秭" ("eddy billion" in Chinese)... Ok, I admit, I cheated a
bit on this one - it wouldn't have been a valid bit sequence for a GBK decoder,
which any sane guessing algorithm would have detected... but still, I think you
get the point.

So, people, use Unicode (the "universal charset") encoded as UTF-8 for
everything - and maybe in a few years we can all forget about all this
charset/encoding mess :)

Patrick.

P.S.: I used Unicode/UTF-8 in this mail (and of course it's specified in the
mail's header), otherwise it wouldn't even have been possible to put both
Chinese characters and umlauts in one mail.

--
Key ID: 0x86E346D4 http://patrick-nagel.net/key.asc
Fingerprint: 7745 E1BE FA8B FBAD 76AB 2BFC C981 E686 86E3 46D4

Attachment: signature.asc
Description: OpenPGP digital signature

___________________________________________________
This message is from the kde mailing list.
Account management: https://mail.kde.org/mailman/listinfo/kde.
Archives: http://lists.kde.org/.
More info: http://www.kde.org/faq.html.

Relevant Pages

  • Re: accentuation mark
    ... hang on to a prviously declared non-ISO-8859-1 charset. ... If you have read an e-mail that declares ISO-8859-2, ... screen as ISO-8859-1 characters, but if you then edit the ... read it fine in my UTF-8 news client. ...
    (comp.sys.acorn.misc)
  • Re: [kde] Character sets / encoding
    ... On Wednesday 09 September 2009 21:35:05 James Tyrer wrote: ... Thanks, but I do have the correct locale installed, and I use utf-8, ... also ISO 8859-1 has the same problem. ... When a text file composed in either code page which contains characters ...
    (KDE)
  • Re: Input Character Set Handling
    ... that compares a UTF-8 string to a string that a user has inputted into ... rather often if they have any clue at all about Unicode). ... Unicode is a *charset*: a set of characters where each character unit ...
    (comp.lang.javascript)
  • Re: [kde] Character sets / encoding
    ... also ISO 8859-1 has the same problem. ... When a text file composed in either code page which contains characters ... viewed with UTF-8. ... page for incoming mail to either ISO 8859-1 or IBM cp 1252. ...
    (KDE)
  • Re: HTML encoded or decoded?
    ... The reason I prefer UTF-8 is that with ISO you can't mix ... And in that case UTF-8 might use 2 or more bytes to encode the ... characters you use. ... editors (the editor has to know it's UTF-8 AFAIK, ...
    (alt.internet.search-engines)