Re: utf8 vs iso8859-1 speed/responsiveness

From: Hideki Hiura (hiura_at_openi18n.org)
Date: 06/30/04

  • Next message: Edward: "Re: Serial ATA and UDMA"
    Date: Tue, 29 Jun 2004 18:32:31 -0700 (PDT)
    To: fedora-list@redhat.com, webmaster@margo.bijoux.nom.br
    
    

    > From: Pedro Fernandes Macedo <webmaster@margo.bijoux.nom.br>
    > Not exactly... The issue is the input file.
    > In RH 9 , the input file probably was iso8859-1 and then it was
    > processed without any conversion , as the whole system was using
    > iso8859-1. Now in fedora , the input file has to be converted by the
    > app to UTF8 , so this extra step means that FC will be a bit slower

    In general this is true, but in this case, not exactly :-).

    Glibc internal encoding is UTF32/UCS4, and modern toolkits, thus
    major desktop apps as well, OOo, all internal encoding is UTF-8,
    on RH9 as well.

    Even you set the locale to *.iso8859-1, and keep file contents in
    iso8859-1 encoding on RH9, whenever apps open it, it is converted to
    UTF-8 for processing upon reading, and convert back to iso8859-1 when
    apps saves it, even though it appeared as the whole system was using
    iso8859-1.

    This architecture is unchanged between RH9 to FC2, so encoding
    conversion happens everywhere on the fly.

    So regardless of RH9 or FC2, and regardless with the locale
    you set(*.UTF-8 or *.iso8859-1 or whatever), at least the files
    encoded in UTF-8 takes less for I/O than other encodings.

    Regards,

    --
    hiura@{freestandards.org,OpenI18N.org,li18nux.org,unicode.org,sun.com} 
    Chair, OpenI18N.org/The Free Standards Group          http://www.OpenI18N.org
    Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA   eFAX: 509-693-8356
    -- 
    fedora-list mailing list
    fedora-list@redhat.com
    To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list
    

  • Next message: Edward: "Re: Serial ATA and UDMA"

    Relevant Pages

    • Re: Proposal: require 7-bit source strs
      ... I'm referring to a time when there was no encoding ... It would be possible to go back and find all strings ... That's why I specified to do this after conversion to ... make the assumption that the character set is ASCII-based, ...
      (comp.lang.python)
    • Re: Proposal to extend documentation about interop
      ... > utf-8 encoding of the character FF. ... > I solved it by doing the conversion of UTF-8 to bytes and when going back to ...
      (microsoft.public.dotnet.framework.interop)
    • Re: RfD: XCHAR wordset (Version 3)
      ... The only switch of encoding that's reasonable possible is from ASCII to some ASCII-compatible encoding, but going back is not a good idea. ... If you add conversion semantics to it, you will break A LOT of code. ... In my code xc is an Unicode code point, but a sequence of utf8 bytes backed into one cell would be a valid implementation, too. ...
      (comp.lang.forth)
    • Re: New keyword orif and its implications
      ... code showing that the conversion of an enumerated type to and from a ... the usage of conversion functions at either end of a connection is ... the necessary conditions for the conversion functions to dissolve away....as ... A binary encoding is by definition ...
      (comp.lang.vhdl)
    • nmap status question
      ... ('binary' encoding is not supported, ... i am new to linux but after getting my rh9 box running ... i have started to use nmap to do some scanning of my ...
      (Security-Basics)