Re: [Kind of OT] Why's this look like gibberish to me?



"Douglas A. Tutty" <dtutty@xxxxxxxxxxxxx> writes:

What gets me is when a man page is written in english and "'" gets
translated as "?", as in can?t or "'" is a square white blob (on a
regular VT). Why couldn't whoever wrote it in english have used the
standard english "'" glyph instead of a UTF thingy?

The problem isn't the manpage author, it's your setup.

Specifically, you're using a locale that sports UTF-8 encoding, but
you're using a terminal/font combination that is not capable of
correctly rendering UTF-8-encoded common typographical symbols used
for English language text, like the right single quote / apostrophe.
If you use a locale based on ASCII encoding instead, those manpages
will render more correctly (for example, substituting the unsightly
ASCII vertical apostrophe for its more urbane cousin or writing (C) in
place of the copyright symbol). See the bottom of this post if LANG=C
isn't good enough for you.

Unlike some people here, I couldn't give a σθιτ if you, S. Keeling, or
anyone else wants to use UTF-8 or not---I'm not on any crusade---but
an environment variable setting of "LANG=en_US.UTF-8" is basically an
announcement to applications that your terminal is UTF-8 capable. You
don't have to run a UTF-8-capable terminal if you don't want to, but
you shouldn't lie to your applications and then whine about those damn
foreigners writing manpages incorrectly (just a joke, just a joke).

In truth, if you look at the manpage source, you'll probably find that
the manpage authors *have* used the ASCII "'" character for
apostrophes and right single quotes. That's because this is the
encoding convention used in the typesetting language "roff" in which
manpages are written. You write `stuff like this' knowing that a
correctly configured manpage rendering pipeline will convert those
ASCII backticks and apostrophes into the correct English typographical
symbols (if the manpage is being printed or being displayed on a
sophisticated terminal) or at least do the best it can (if it's being
delivered to an ASCII-only terminal). If manpage writers were really
on the ball, they'd use \(lqleft and right double-quotes\(rq too, but
you don't see too much of that.

To clarify further, there's nothing English about "'". If it's
anything, it's ASCII, not English. I'm not sure that the ASCII
standard actually specifies what printable characters, including "'",
are supposed to look like, but in most fonts with ASCII-compatible
encoding, the "'" character is rendered as an undirected,
typewriter-style apostrophe, like a vertical tickmark, and I believe
this is pretty much universally accepted as the "correct" rendering of
this character, among those who care about these things. In
particular, it is *not* the character used in typeset English text as
an apostrophe or right single quote. It's rarely used in English text
at all, except in historically ASCII contents like email and computer
plain text files. It's about as un-English as you can get. It's very
ASCII, though.

Anyway, to really take a stand on this UTF-8 crap and announce to the
world that 7 bits were good enough for cavemen so, by God, they're
good enough for you too, you can simply use a preexisting ASCII-only
locale (like LANG=C) or you can generate one. Add this line to
"/etc/locale.gen":

en_US ANSI_X3.4-1968

run "/usr/sbin/locale-gen" as root, and find some way to set
"LANG=en_US" or "LC_ALL=en_US". ANSI_X3.4-1968 is another name for
ASCII, so your new "en_US" locale shouldn't bother you with heretical
characters. Some applications will still give up and print a "?" for
non-ASCII characters, but "man" should do an excellent job displaying
a pure ASCII rendering of your manpages for you.

--
Kevin Buhr <buhr+debian@xxxxxxxxxxx>


--
To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx
with a subject of "unsubscribe". Trouble? Contact listmaster@xxxxxxxxxxxxxxxx



Relevant Pages

  • Re: Xmodmap configuration
    ... If one looks as man ascii you can see the ... left of the A to Z column is the characters you used to ... send on old terminals if you pressed the same letter with the CTRL key. ... physical keys whether I'm in English, Dvorak, or Hebrew layout as I ...
    (Fedora)
  • Re: Unicode Support
    ... consider:)...but, you know, a file is still just a "stream of characters" ... "escape sequence" but accessing an ordinary ASCII character) are considered ... English, while all your identifiers are in "Romanji" Japanese or something ... NASM appears already to do so with strings and comments in ...
    (alt.lang.asm)
  • Re: binary i/o files
    ... Ron Shepard wrote: ... and ascii still has only 128 characters in it. ... english (and to be pedantic, English english still has the extremely ...
    (comp.lang.fortran)
  • Re: Easy as ABG
    ... >>> We do not know how people read English. ... >>> numbers of people who could not use the alphabet properly. ... >>> English in ASCII, but it is slow. ... >>> that one can read mathematical Russian by learning the ...
    (sci.lang)
  • Re: Expansion/Contraction
    ... ASCII, e.g. the European languages". ... substituted by "English alphabetical", if that avoids misunderstanding. ... as ASCII) but about alphabets (in the broad sense of the term, ... Japanese to/from English expansion/contraction, ...
    (sci.lang.translation)