Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)

From: Robin Rosenberg (robin.rosenberg.lists_at_dewire.com)
Date: 02/12/04

  • Next message: Robert White: "RE: ps/2 mouse problem with KVM switch"
    To: John Bradford <john@grabjohn.com>
    Date:	Thu, 12 Feb 2004 23:29:11 +0100
    
    

    On Thursday 12 February 2004 22.13, you wrote:
    > I know what you're saying, there is only one way to encode the data
    > correctly. I totally agree with that.
    >
    > However, we both know that UTF-8 provides escapes from the 7-bit
    > encoding, and although it goes against the standard to encode 7-bit
    > characters using such sequences, in the real world don't you think
    > that there will be a lot of decoders which decode the multi-byte
    > sequence back, rather than report an error? This is not something
    > that will be happening in the kernel - it will be up to userspace to
    > do it, so there may well be many different implementations.

    Oh, I wasn't thinking of fixing *every* application out there, but making
    the kernel api's convert between the user locale and the file system locale,
    thus restricting the problems to places that can be fixed.

    An alternative would be glibc since it's used by most apps, but then there
    could be funny and inefficient interactions with filesystems that already
    do the job. The "future" common case would be utf-utf conversion for all
    native file systems, i.e. no work.

    [... ]

    > I don't think that the issue with combining characters is likely to be
    > an issue, I only mentioned it as an example. As you pointed out a
    > single accented character, and a two character combination are
    > distinct, and converting the combination to the corresponding single
    > character in a filename would definitely be wrong, in my opinion.
    > However, that doesn't mean that software won't do it.

    Some applications break if I put any non-ascii characters, but they few
    enough that I can afford the loss. Most shell scripts break if I even have
    a space in a filename. This shouldn't be any worse than that. The space
    issue is really serious (but I don't think that can be fixed other than teaching
    people to program properly, and possibly improving bash's knowledge of the
    difference between a space and argument separator).

    -- robin
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Robert White: "RE: ps/2 mouse problem with KVM switch"

    Relevant Pages

    • Re: output ampersand using XML::Twig
      ... XML data, then it receives unescaped utf8 strings from the parser ... first 2 solutions) being to get the unicode character for   ... turn off XML escapes for the element content ... {# use an Encode output filter that encodes (using decimal ...
      (comp.lang.perl.modules)
    • Re: editing perl script through TEXTAREA
      ... not converted to the corresponding HTML entity. ... then browsers do strange things with them. ... As the Perl Encode ... character which cannot be represented in the prevailing character ...
      (comp.lang.perl.misc)
    • Re: unicode
      ... 'ascii' codec can't encode character u'\u9999' in ... it looks like when I try to display the string, ... If you try to print a Unicode string, then Python will attempt to first ... encode it using the default encoding for that file. ...
      (comp.lang.python)
    • Re: POSTing Chinese characters
      ... For the example string I mention, simply encode as ... the client locale could be anywhere... ... > The basic idea of %-encoding is to treat character encoding as a sequence ...
      (microsoft.public.inetserver.iis)
    • Re: URLScan and SQL Injection
      ... possible ways to encode a particular character. ... you want to disallow just one character: ... case-insensitive match, so that if you reject .asp, it doesn't matter if the ... > canonicalize the request for urlscan to scan? ...
      (microsoft.public.inetserver.iis.security)