What's the internal charset of Linux?

From: Yin Ming (yinming_at_mdc-ds.com)
Date: 10/30/04

  • Next message: Bob Taylor: "Re: What's the internal charset of Linux?"
    Date: Sat, 30 Oct 2004 09:50:34 +0800
    To: General Red Hat Linux discussion list <redhat-list@redhat.com>
    
    

    Hi, all

    As a non-English user, I've got many problems of charset. Thank the god
    it's perfectly solved in RH9, but I wonder how did the problem happened
    in the core.

    I don't want to dive into core source deeply, but just want to know, how
    does linux handle strings, and why those "???" and "*(*@#&$(@" appears
    in the past?

    I know there are various way to handle strings. The worse one is dealing
    characters as 8bit chars. ( even 7bit ) . So, for Chinese and other
    multi-byte language, one character is separated into two or more byte,
    and many strange ASCII chars are displayed.

    Another way is MBCS as some WIN does. Characters are store in
    multi-bytes, and the OS remember their charset, displaying them in
    corresponding fonts with these mult-bytes.
    This approach cannot handle multi-charset at one time, I think, unless
    you convert strings from other charset into the defualt one.

    And the better way, I think, is unicode. Using correct charset to
    encoding multi-bytes into UNICODE strings, and handle thse UNICODE in
    the core, than, decoding them into external multi-bytes before output.
    This approach must only mantain a default IO charset, used for
    de/encodeing for IO.

    So, in the core, the type of string should be wchar_t.

    Right? How does linux handle string?

    -- 
    redhat-list mailing list
    unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
    https://www.redhat.com/mailman/listinfo/redhat-list
    

  • Next message: Bob Taylor: "Re: What's the internal charset of Linux?"

    Relevant Pages

    • potential changes to Locale-PO
      ... the patches to the maintainer Alan Schwartz. ... Documented quoting and newlines in strings passed to/from methods. ... PO files normally declare their charset. ... strings converted to Perl's internal Unicode representation. ...
      (comp.lang.perl.modules)
    • JAI, Korean Text and Windows UNICODE
      ... Korean directories and filenames. ... I was able to load the strings in ... JAI in order for it to recognize an alternate charset? ... problems when working with strings w/ Korean text in them -- if the ...
      (comp.lang.java)
    • JAI, Korean Text and Windows UNICODE
      ... Korean directories and filenames. ... I was able to load the strings in ... JAI in order for it to recognize an alternate charset? ... problems when working with strings w/ Korean text in them -- if the ...
      (comp.lang.java.programmer)
    • JAI, Korean Text and Windows UNICODE
      ... Korean directories and filenames. ... I was able to load the strings in ... JAI in order for it to recognize an alternate charset? ... problems when working with strings w/ Korean text in them -- if the ...
      (comp.lang.java.help)