Re: [PATCH] console UTF-8 fixes



Hi,


I just wanted to give my opinion on things...

(and enable utf8 to read this properly)

On Apr 7 2007 11:24, Egmont Koblinger wrote:

I strongly disagree. First of all, you're changing the semantics of a
13-year-old API. The semantics of the Linux console is that by
specifying U+FFFD SUBSTITUTION GLYPH in your unicode table, you have
specified the fallback glyph.

OK, I'm not against using U+FFFD for missing glyphs. In the mean time I
think it's still a good idea to clearly separate the two cases in the code
(that is, the case of invalid sequence from the case of missing glyph), but
we can still use the same replacement character in these two cases. I'll
send an updated patch after Easter if it sounds good for you.

I am quite ok with the way things are right now.

- vc displays <?> for illegal sequences

- vc displays e.g. "U" (latin capital U) in place when Û (latin capital
U with accent circumflex) is not available in this font
(determined by the unicodemap) (I do use an unicode map, because I
use a 4096-byte cp437 "DOS" font which requires one)

- vc displays <?> for sequences it does not know how to print

- xterm displays <?> for illegal sequences

- xterm seems to display <?> on undefined glyphs (U+DFFF for ex.,
using the "Unicode Best" font from the xterm menu)

- xterm seems to display nothing on undefined glyphs (U+E000 for ex.,
"Unicode Best" again)

What's worse, you've hard-coded the uses of specific visual
representations. That is completely unacceptable.

Now that we've dropped the idea of "dot" for missing glyphs, the other thing

[...]

Sorry, I wasn't clear enough and I think you misunderstood me. The symbol I
choose for fallback is still '?' (the ASCII question mark), I just invert
the color attributes of the cell where this is printed. This way it becomes
visually distinguisable from the literal question mark. Using the current
kernel you just cannot know whether the character printed is a real question
mark, or a replacement glyph. Still, should you stongly disagree with this
decision, the color inverting part can easily be removed.

Please, no dot, and no inverse color.
Imagine someone had the following bitmap for <unknown glyph/illegal sequence>:

################
################
################
####........####
##....####....##
##....####....##
########....####
######....######
######....######
################
######....######
######....######
################
################
################
################

Then inverting that again would be susceptible to confusion with
the regular '?' at 0x3F.

(cp437 for example maps unknown/illegal to 0xFD which happens to be the
block graphic '■', but YMMV depending on font.)

I think I've (mostly) described it above. Set everything to UTF-8, load a
latin2 font (containing 256 glyphs, e.g. "setfont lat2-16"), make an
application print U+00FB (alt + numpad 251 is one trivial way), you'll see
an "u with double accent", though the symbol to be displayed is "u with
circumflex". This isn't present in the current font, so the replacement
character should appear, not a different letter.

I blame your latin2 unicode map. (See above about 'Û'.)
It should perhaps display a regular 'u' if it cannot display 'û',
but definitely not 'ü' (which is not called a double accent, btw).

To be able to do CJK you need something like Kon anyway. This feels
like bloat.

I don't want CJK support. All that I want is to be able to edit English
words within a file that contains mixture of English and CJK, with a text
editor like vim or joe.

+1 for this one :)

xterm## echo "韓国と日本にようこそ!" >/tmp/foobar.txt
vc## cat foobar.txt

currently gets things not so right, because multibyte characters are not
displayed with as many <?> as they are wide.


Jan
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Unicode font support for all languages/characters?
    ... many of my own databases that store the word translations in unicode. ... They don't display correctly. ... Also I need a compact font, ... native support for Unicode so I used a third party software ...
    (microsoft.public.pocketpc.developer)
  • Re: Test of Unicode APL characters with Mozilla
    ... >> It's a single question mark enclosed in parens. ... >> will display APL if I get a font or something? ... > to a givent subset of Unicode *Arial*. ...
    (comp.lang.apl)
  • Re: Fonts
    ... In my Unicode App, I determine glyph ranges with GetFontUnicodeRanges. ... Display every printable normally and display all other like "undefined" glyph. ... exist in the selected font, ...
    (microsoft.public.vc.mfc)
  • Re: New tutorial on APL Wiki web site
    ... struggle with displaying code correctly when other browsers work fine. ... The choice of font is determined by the Wiki's style sheet. ... set to try APLX Upright first, then APL385 Unicode, and then Arial. ... But I'm having display problems too, ...
    (comp.lang.apl)
  • Re: Missing characters/unusable font in Word 2004
    ... character number of a glyph is by its position in the table. ... very small font that contains few glyphs, so several of those positions are ... we're only interested in its Unicode value. ...
    (microsoft.public.mac.office.word)