Re: Document encoding
- From: Dances With Crows <danSPANceswithTRAPcrows@xxxxxxxxx>
- Date: Mon, 28 May 2007 11:31:42 -0500
sebzzz@xxxxxxxxx staggered into the Black Sun and said:
I didn't know that when I save a simple text document, it [gets saved]
with an encoding.
New to computers?
I encountered problems when I was designing [a] web page with Nvu
telling Nvu that my documents were in iso-8859-1, and [then I] changed
the charset to utf-8 after that in [another] editor. All the special
characters [were] missing [after that].
I came to the conclusion that depending on what encoding the file is
saved [in], there [are] special codes inside the file invisible to us
Who's "us", white man? :-) It's easy to tell what encoding is being
used if you're using a real text editor, or a hex editor. utf-8 is
going to be the winner in the long run for various reasons. But
encoding is less important if you're using HTML or XML, since non-ASCII
chars in those formats are represented by entities like é anyway.
What are the differences between iso-8859-1 and utf-8?
ISO8859-1 defines chars 0-127 and 160-255; it's the Western Europe code
page and has practically every char you need to write in western
European languages. UTF-8 defines ... well, almost every char that
exists. Chars that aren't ASCII chars are preceded by 1 or 2 special
marker bytes. Check out utf-8 on PickyWeedia for the full scoop.
How can I check the encoding of a file?
/usr/bin/file generally works.
How can I change the encoding of a file? What encoding I should use to
write text files?
/usr/bin/recode can convert among tons of text encodings. UTF-8 is
probably the most future-proof format.
What about those CR LF end of line [markers]? What does this mean?
Unix: \n means EOL
old Mac: \r means EOL (not used in the modern world AFAICT)
DOS: \r\n means EOL
....this is mostly historical, unless you have a program that only
expects a certain type of EOL and will barf if it sees the other.
What should be used and what difference it makes in a document?
In general, \r\n is better, because while Unix programs can handle DOS
EOL, DOS programs may not handle Unix EOL properly.
--
You have me mixed up with more creative ways of being stupid.
--MegaHAL, trained on random gibberish
Matt G|There is no Darkness in Eternity/But only Light too dim for us to see
.
- Follow-Ups:
- Re: Document encoding
- From: Robert M. Riches Jr.
- Re: Document encoding
- From: Bob Hauck
- Re: Document encoding
- References:
- Document encoding
- From: sebzzz
- Document encoding
- Prev by Date: Re: Using a Socket dual and Quad serial I/O card
- Next by Date: Re: Easy email filtering server disti ?
- Previous by thread: Document encoding
- Next by thread: Re: Document encoding
- Index(es):
Relevant Pages
|