Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)

From: Eduard Bloch (edi_at_gmx.de)
Date: 02/14/04

  • Next message: Anuradha Ratnaweera: "Implementing SQL on files"
    Date:	Sat, 14 Feb 2004 16:24:14 +0100
    To: John Bradford <john@grabjohn.com>
    
    

    #include <hallo.h>
    * John Bradford [Thu, Feb 12 2004, 07:08:06PM]:

    > Well, as long as every userspace implementation gets it correct, we'll
    > be OK. Personally, I doubt they all will, especially those that
    > convert from legacy encodings to Unicode, although quite possibly the
    > above scenario with combining characters is not likely to happen for
    > filenames. Or is it? What about copying a file from a filesystem
    > with a UTF-8 encoding to a filesystem with a legacy encoding, and then
    > back again?

    I always wondered why there is no "iocharset" option for unixoid
    filesystems. IMO there could be an easy migration path for existing
    installations to UTF-8:

     - convert all filenames to UTF-8 (or any other Unicode encoding)
     - mount the FS with "iocharset=UTF-8,charset=latin1" (for current
       Latin1 users). Users can continue to use their latin1 names while
       they are stored in Unicode on the disk (this is what currently
       happens with VFAT, a very nice solution IMHO)
     - when enough applications are ready for multibyte encodings, remove
       the charset/iocharset workaround and make people use .UTF-8 locales

    Though, the ultimate solution for the steps 2. and 3. would be the
    Microsoft-like way:

     - convert the filenames in libc (from $locale to UTF-8), depending on
       which locale the user has set

    This sounds like cheating but would allow to be most flexible and most
    compatible to encoding-ignoring applications.

    Eduard.

    -- 
    Wir sind nichts; was wir suchen ist alles.
    		-- Johann Christian Friedrich Hölderlin
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Anuradha Ratnaweera: "Implementing SQL on files"

    Relevant Pages

    • Re: Unicode string libraries
      ... encoding negotiation. ... old languages which have adopted Unicode without much pain. ... compatibility with too many old programs; but char as a holder for UTF-8 ... The limitations of UTF-16 ...
      (comp.programming)
    • Re: convert from utf-8 to unicode(excel)
      ... Is there a possibility to properly convert under Windows from utf-8 ... encoding to unicode ... There is no problem in conversion when I do it in Notepad. ... a file marking encoding as UTF-8 and then save it marking encoding as ...
      (comp.editors)
    • Re: Unicode string libraries
      ... UTF-8 is the encoding that must be used ... I initially thought that the variable-length characters ... but also that UTF-8 didn't break when Unicode got extended ...
      (comp.programming)
    • Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)
      ... > the prefered format for storing text in them, ... UTF-8 doesn't measurably expand any normal text that didn't ... (Now, I'll give that a lot of people don't like Unicode, so I'll allow ... Because the _transform_ makes sense regardless of character set encoding. ...
      (Linux-Kernel)
    • Re: Unicode string libraries
      ... I know that Perl uses UTF-8 as its internal string representation. ... characters defined within the BMP). ... search on UTF-8 encodings is equivalent to a search on Unicode ... it makes sense to choose other criteria for your internal encoding. ...
      (comp.programming)