Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)
From: Marc (pcg_at_goof.com)
Date: 02/17/04
- Previous message: Carl Thompson: "Re: hard lock using combination of devices"
- In reply to: Linus Torvalds: "Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)"
- Next in thread: Helge Hafting: "Re: UTF-8 practically vs. theoretically in the VFS API"
- Reply: Helge Hafting: "Re: UTF-8 practically vs. theoretically in the VFS API"
- Reply: Linus Torvalds: "Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 17 Feb 2004 08:14:48 +0100 To: Linus Torvalds <torvalds@osdl.org>
On Mon, Feb 16, 2004 at 02:40:25PM -0800, Linus Torvalds <torvalds@osdl.org> wrote:
> Try it with a regular C locale. Do a simple
>
> echo > едц
Just for your info, though. You can't even input these characters in a C
locale, since your libc (and/or xlib) is unable to handle them (lots of SO
C functions will barf on this one). C is 7 bit only.
> Which, if you think about is, is 100% EXACTLY equivalent to what a UTF-8
> program should do when it sees broken UTF-8.
The problem is that the very common C language makes it a pain to use
this in i18n programs. multibyte functions or iconv will no accept
these, so programs wanting to do what you are expecting to do need to
re-implement most if not all of the character handling of your typical
libc.
Yes, it's possible....
> The two cases are 100% equivalent. We've gone through this before. There
> is a bit of pain involved, but it's not something new, or something
> fundamentally impossible. It's very straightforward indeed.
The "bit" is enourmous, as you can't use your libc for text processing
anymore.
Yes, it works in non-i18n programms, but right now most programs get
i18n support, which means they will all fail to properly handle
non-locale characters.
--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@goof.com |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Previous message: Carl Thompson: "Re: hard lock using combination of devices"
- In reply to: Linus Torvalds: "Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)"
- Next in thread: Helge Hafting: "Re: UTF-8 practically vs. theoretically in the VFS API"
- Reply: Helge Hafting: "Re: UTF-8 practically vs. theoretically in the VFS API"
- Reply: Linus Torvalds: "Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|