Re: [Patch] Support UTF-8 scripts

From: QMartin_v=2E_L=F6wis=22?= (martin_at_v.loewis.de)
Date: 09/17/05

  • Next message: Martin Mares: "Re: [Patch] Support UTF-8 scripts"
    Date:	Sat, 17 Sep 2005 14:25:53 +0200
    To: Martin Mares <mj@ucw.cz>
    
    

    Martin Mares wrote:
    > This makes no sense. For a script, the shell does not care about the encoding
    > at all.

    I'm not (only) talking about /bin/sh. I'm primarily talking about
    /usr/bin/python, /usr/bin/perl, and /usr/bin/wish. In all these
    languages, the interpreter *does* care about the encoding.

    1. In Python, the syntax

       u"some data"

       denotes a Unicode literal (stored internally either in UCS-2 or
       UCS-4); the literals are converted from the source encoding to
       the internal representation. This requires knowledge of the source
       encoding.

    2. In Tcl, all strings are internally represented in UTF-8, and
       converted from the source encoding (which currently is inferred
       from the locale of the process executing the script).

    3. In Perl, 'use utf8' declares that the encoding of the script is
       UTF-8, meaning that non-ASCII can be used in string literals,
       identifiers, and regular expressions.

    > Also, currently, people use zillions of encodings, most of which have no
    > signature, so introducing a signature for UTF-8 does not win anything.

    This specific patch does win something: it allows to executed scripts
    which start with <utf8 signature>#!

    This is useful e.g. for Python, which recognizes the UTF-8 signature
    as declaring the source encoding of the Python module to be UTF-8.

    > In the future, most people will probably use only UTF-8, so the signature
    > carries no information.

    In the future, the signature *will* carry no information. But the future
    is, well, in the future.

    I just can't understand why (some) people are so opposed to this patch.
    It is a really trivial, straight-forward change. It introduces no
    policy, just a feature: you can put the UTF-8 signature in your script
    file, if you want to (and your scripting language supports it). By
    no means it forces you to put the UTF-8 signature in your all script
    files, let alone all your text files.

    Regards,
    Martin
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Martin Mares: "Re: [Patch] Support UTF-8 scripts"

    Relevant Pages

    • Re: URL mit param: utf8 oder percent encoding
      ... UTF-8 ist nur gemeinsam mit Percent-Encoding ... Das mag sein - ich sehe es im Browser als URL angezeigt und ich sehe ... damit das Script mit allen URL-Varianten ... bei POST-Request mit, welches Encoding sie verwenden, das muss der ...
      (de.comp.lang.perl.misc)
    • Re: Unicode BOM marks
      ... UTF-8 has no byte-order issues that a "byte order mark" would deal with. ... system that decides to add or leave out the UTF-8 signature, ... The BOM mark decodes as U+FEFF: ... > the UTF-8 encoding standard. ...
      (comp.lang.python)
    • RE: Page encoding and browsers (IE in particular)
      ... I think the workaround of this issue is to specify encoding in HTML page ... IE will interpret the script in UTF-8 ecnding. ...
      (microsoft.public.dotnet.framework.aspnet)
    • Re: File in UTF-8 or local encoding
      ... > I have only minor control over this script. ... > in basicly every possible encoding in the world. ... > script stored in UTF-8. ... > the conversion twice. ...
      (comp.lang.tcl)
    • Re: utf-8, was Re: Three questions: UTF-8, DBM, hash of lists, ...
      ... > encoding in your program source in order to process unicode data. ... comes out UTF-8 and is decoded as such going in. ... is at the beginning of the script. ...
      (comp.lang.perl.misc)