Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: Linus Torvalds (torvalds_at_osdl.org)
Date: 02/17/04

  • Next message: Linus Torvalds: "Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)"
    Date:	Tue, 17 Feb 2004 12:59:34 -0800 (PST)
    To: John Bradford <john@grabjohn.com>
    
    

    On Tue, 17 Feb 2004, John Bradford wrote:
    >
    > Why not:

    I'll start with the first one. That already kills the rest.

    > * State that filenames are strings of 32-bit words. UCS-4 should be
    > the prefered format for storing text in them, but storing legacy
    > encodings in the low 8 bits is acceptable, (but a Bad Thing for new
    > installations).

    UCS-4 is as braindamaged as UCS-2 was, and for all the same reasons.

    It's bloated, non-expandable, and not backwards compatible.

    In contrast, UTF-8 doesn't measurably expand any normal text that didn't
    need it, is backwards compatible in the major ways that matter, and can be
    extended arbitrarily.

    UCS-4 has _zero_ advantages over UTF-8.

    Please. Give it up. Anybody who thinks that _any_ other encoding format
    than UTF-8 is valid is just _wrong_.

    (Now, I'll give that a lot of people don't like Unicode, so I'll allow
    that maybe you'd want to use the UTF-8 _encoding_scheme_ for some other
    mapping, but I don't see that that is worth the pain any more. Unicode may
    be a horrible enumeration, but in the end all font encodings are arbitrary
    anyway, so the unicode haters might as well start giving up).

    In short: even if you hate Unicode with a passion, and refuse to touch it
    and think standards are worthless, you should still use the same
    transformation that UTF-8 does to your idiotic character set of the day.
    Because the _transform_ makes sense regardless of character set encoding.

                    Linus
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Linus Torvalds: "Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)"

    Relevant Pages

    • Re: Unicode string libraries
      ... encoding negotiation. ... old languages which have adopted Unicode without much pain. ... compatibility with too many old programs; but char as a holder for UTF-8 ... The limitations of UTF-16 ...
      (comp.programming)
    • Re: New Years Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat foo
      ... Its still UTF-8, or rather, a mangled UTF-8, but recognizable to any ... Characters in the range 0-127 require a single byte, ... Unicode is a method of encoding characters with a enough variety to ...
      (rec.arts.sf.written)
    • Re: convert from utf-8 to unicode(excel)
      ... Is there a possibility to properly convert under Windows from utf-8 ... encoding to unicode ... There is no problem in conversion when I do it in Notepad. ... a file marking encoding as UTF-8 and then save it marking encoding as ...
      (comp.editors)
    • Re: Unicode string libraries
      ... UTF-8 is the encoding that must be used ... I initially thought that the variable-length characters ... but also that UTF-8 didn't break when Unicode got extended ...
      (comp.programming)
    • Re: Unicode Delphi Win32 - which approach
      ... I like the backwards compatibility aspects of UTF-8 vs UTF-16. ... The first 256 Unicode characters map to the ANSI character set. ... entire stream> but calling an API 100 times in a loop I can imagine. ... and explicitly contextualise every string. ...
      (borland.public.delphi.non-technical)