Re: FC4 and accentued characters



jeanpca@xxxxxxx wrote:

I work on a fc4 [2.6.11-1.1369_FC4] and my system is speaking in english
My i18n file looks like to
cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
^^^^^

(which will only work if you're reading this in a fixed font...)

OK. This e-mail is written in what's known as "US-ASCII". "US-ASCII"
only supports the characters on an American keyboard. It uses character
values up to 127. Each character is usually stored in an eight-bit byte
(or octet), which can store values up to 255.

Then people started wanting to use accents ... and Greek letters ... and
Russian letters .. and all sorts of other symbols. So they created ways
to use those other values up to 255.

Unfortunately, there were way more than 128 different characters that
different nationalities wanted to use. So we ended up with dozens of
ways of extending ASCII. The ISO 8859-1 variant was most popular for
Western Europena languages -- until the Euro symbol was created. And it
still wasn't possible to include Greek and Russian in the same document.

And Chinese and Japanese users had to have their own standards anyway --
they have thousands of different characters.

So another standard was created -- Unicode. Unicode was originally
encoded as *two* octets -- with up to 65536 different characters. That
turned out to be (a) not enough for all the world's different languages,
and (b) rather complex to handle.

UTF-8 is a different way of encoding Unicode. US-ASCII letters are
encoded as one octet, just as they always have been. Accented letters,
and letters from other character sets, take up between two and four
characters.

And there is the promise of one standard for the whole world, and
everything being sweetness and light, and that anything that can be
written can be shown on any computer screen around the world.

In practice, UTF-8 is about as good as you can get.

On this system, when I create an accentued char from my keyboard, it is
written in two words:

Technical niggle -- "word" has a separate, different, technical meaning
in this context. I think you mean "byte" or "octet".

That's a two-octet UTF-8 character.

0000000 303 251 e e e \n
If i display this file my web server or send it by mail (php), i get some
strange chars

OK -- in this case you *need* to read up about MIME encoding and
content-type and charset headers. These are needed in any case for your
viewers / recipients to be able to understand accents, whether you send
them as a traditional ISO-8859 encoding or as UTF-8.

Because some, but not all, of your recipients will understand them the
way you meant them. Others will use different character sets as standard
and see something completely different. They might have a Greek letter
at the same "code point".

You need some way of convincing your recipients' computers that you are
sending data in *this* particular character encoding. And once you've
got that working, you might just as well go for the UTF-8 standard and
be able to send and receive in all sorts of different languages. And
MIME encodings are the way to do this.

Hope this helps,

James.

--
E-mail: james@ | [Bradford Cathedral] took 194 years to complete. A
aprilcottage.co.uk | construction period of nearly two centuries may seem
| ridiculous to us, but of course builders were a lot
| quicker in those days. -- "ISIHAC", BBC Radio 4

--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list



Relevant Pages

  • Re: DNA as a book
    ... Space is a character. ... Like the stem cell. ... In digital representations, spaces and letters have ... AB...The irreducability of sentence made me think of genes. ...
    (sci.bio.evolution)
  • Re: Random letter colors?
    ... the colors of individual letters in some text. ... you will need to edit the vaColors variable. ... ' Set each character in the selection to a different color ... Dim ilColorNext As Word.WdColorIndex 'Color Index property ...
    (microsoft.public.word.newusers)
  • Re: Heuristc to distinguish text and code
    ... Phil wrote: ... I have measured character fequencies in small corpuses of text and code; then for each paragraph I determine the correlation between its character frequency and those two references. ... The idea here is that it's the pattern of letters and punctuation that matters, not what the actual letters are. ...
    (comp.programming)
  • Variable names Was: Re: Is this math test too easy?
    ... I believe your Spanish spelling has ... one or two letters. ... character, or one character with subscripts, which also ... Without knowing Russian, I was able to follow a Russian linear ...
    (sci.math)
  • Re: Sorry, newbie question about generating a random string
    ... But my overall goal is to generate a random character, ... You declare myChar as a single-element array. ... trying to generate random English letters, you want rand% 26, ...
    (comp.lang.c.moderated)