Re: [kde-linux] Encoding questions (Chusslove Illich)




On Monday 09 June 2008 08:14 am, Emanoil Kotsev
wrote:
The encoding for the merriam-webster page seems to
be
iso8859-1.

The site is definitely is8559-1 encoded.


One thing I noticed the other day, but forgot to
mention: Yes, there is
something on that page that seems to say the page is
encoded in iso8859-1:

<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1" />

But elsewhere on the same page there are lines that
suggest that at least some
part of it might be encoded in utf-8:

google_afs_ie = 'utf8'; //
select input encoding scheme
google_afs_oe = 'utf8'; //
select output encoding scheme

these are properties/arguments of the googleads
function. I guess it is used to convert the encoding
of the data into the one you are using


My guess is that the definition is fetched from some
database and displayed
using utf-8. (On the other hand, maybe the utf-8 is
only for google ads or
similar displayed on that page?) (A further guess
is that, if the
pronunciation key were displayed in iso-8859-1 ...

Well, wait--the one clue I have is that if I C&P the
definition from konqueror
to kate, with kate changed to a font that can
display the correct glyphs (the
upside down e, for example), the pronunciation key
is displayed correctly in
kate. Would that work if the encoding on the
konqueror page was iso-8859-1,
or only if it was utf-8? I'm not sure, and don't
desperately need to know at
the moment. ;-)

Just for reference, here is a C&P of the
pronunciation "key" from one m-w page

(http://www.merriam-webster.com/dictionary/intelligent):

Pronunciation: \in-?te-l?-j?nt\

I guess I just wanted to note that there is some
uncertaintly, at least in my
mind, as to whether the definition on the m-w.com
pages is encoded in
iso-8859-1 or utf-8. If it is encoded in
iso-8859-1, could it be displayed
properly if C&P'd into kate?


Look at the source code of the page and you'l find the
secret:

<dt class="pron">Pronunciation:</dt>

<dd class="pron">
<span class="pronchars">\in-<span
class="unicode">&#712;</span>te-l&#601;-j&#601;nt\</span>
</dd>

this means they use the W3C recomendation for encoding
characters in html from the unicode definition.

Welcome to the encodings hell!

regards






___________________________________________________
This message is from the kde-linux mailing list.
Account management: https://mail.kde.org/mailman/listinfo/kde-linux.
Archives: http://lists.kde.org/.
More info: http://www.kde.org/faq.html.



Relevant Pages

  • Re: [kde-linux] Encoding questions (Chusslove Illich)
    ... The encoding for the merriam-webster page seems to be ... Switching to utf-8 ... when I copy and paste stuff like that into kate ... displayed properly in konqueror, maybe when I get ...
    (KDE)
  • Re: Special Characters in Query String
    ... I've had numerous problems with utf-8, ... in common characters in spanish not geting displayed. ... > available for encoding of characters. ... > If you can display your characters with ISO-8859-1, ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: L10N: UTF-8 TTY as a port or a build option?
    ... > Darwin seems to insist on having them in UTF-8. ... My German NetHack uses ISO-8859 on Linux and on W32, but I can't get that encoding to display correctly in the OS X terminal. ...
    (rec.games.roguelike.nethack)
  • RE: Login scripts?
    ... Anything that you can display in ... iso-8859-1 is also a part of UTF-8. ... Yes, but the encoding is quite different, even in the lower 256 bytes, where ... is using ISO8859, it is not possible to change it directly to UTF8, and vice ...
    (Fedora)
  • Single Quotes in gcc
    ... correclty even when i change the encoding to UTF-8. ... Anyone know anything I can do to get the single quotes to display ...
    (comp.terminals)