Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: Marc (pcg_at_goof.com)
Date: 02/17/04

  • Next message: jw schultz: "Re: UTF-8 practically vs. theoretically in the VFS API"
    Date:	Tue, 17 Feb 2004 08:14:48 +0100
    To: Linus Torvalds <torvalds@osdl.org>
    
    

    On Mon, Feb 16, 2004 at 02:40:25PM -0800, Linus Torvalds <torvalds@osdl.org> wrote:
    > Try it with a regular C locale. Do a simple
    >
    > echo > едц

    Just for your info, though. You can't even input these characters in a C
    locale, since your libc (and/or xlib) is unable to handle them (lots of SO
    C functions will barf on this one). C is 7 bit only.

    > Which, if you think about is, is 100% EXACTLY equivalent to what a UTF-8
    > program should do when it sees broken UTF-8.

    The problem is that the very common C language makes it a pain to use
    this in i18n programs. multibyte functions or iconv will no accept
    these, so programs wanting to do what you are expecting to do need to
    re-implement most if not all of the character handling of your typical
    libc.

    Yes, it's possible....

    > The two cases are 100% equivalent. We've gone through this before. There
    > is a bit of pain involved, but it's not something new, or something
    > fundamentally impossible. It's very straightforward indeed.

    The "bit" is enourmous, as you can't use your libc for text processing
    anymore.

    Yes, it works in non-i18n programms, but right now most programs get
    i18n support, which means they will all fail to properly handle
    non-locale characters.

    -- 
          -----==-                                             |
          ----==-- _                                           |
          ---==---(_)__  __ ____  __       Marc Lehmann      +--
          --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
          -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
        The choice of a GNU generation                       |
                                                             |
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: jw schultz: "Re: UTF-8 practically vs. theoretically in the VFS API"

    Relevant Pages

    • Re: UTF-8 practically vs. theoretically in the VFS API
      ... You can't even input these characters in a C ... in libc. ... Any program that wants to display filenames it got ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: JFS default behavior
      ... Should be a sequence of characters since humans are supposed to use them and ... it should be the same characters wheneve possible regardless of user's locale. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
      ... would create some files using non-ascii characters. ... locale B would access these files, but the characters in those names did ... The OS/2 kernel has locale information for each process, ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: Reg multilanguage support by gnuplot
      ... The "locale" setting is need in order to interpret 1-byte character ... It is not needed if you are using UTF-8. ... type the characters directly into your command string. ... set label 1 at screen 0.2, ...
      (comp.graphics.apps.gnuplot)
    • Re: MIDP MIDlet: which characters are supported in the phone font?
      ... by the locale where the phone is meant to be used. ... "Which unicode characters does a phone support? ... >> the font set on the phone. ... > I rather doubt any of them do not also display latin letters. ...
      (comp.lang.java.programmer)