Re: determine default filename encoding

From: Peter T. Breuer (ptb_at_lab.it.uc3m.es)
Date: 12/29/04


Date: Wed, 29 Dec 2004 20:31:35 +0100

cent <centREMOVE@u.washington.edu> wrote:
>
> > If you told us what you were talking about someone might be able to give
> > you an answer. "default filename encoding" makes no sense here. filenames
> > are not encoded.
>
>
> Thanks for the reply, "default filename encoding" is terminology I got from
> Suse tech support for the way filename characters are represented, like UTF8

This makes no sense either. Characters are characters - just bytes.
What you THINK those bytes mean to you is something else, and what they
LOOK like when you print them on the screen is another, and so is what
byte you get when you TYPE a symbol on your keyboard.

But once you have the bytes, they get put in a file name just as they
are. You then see them later as you like/choose to see them. It's
unimportant and irrelevant. The only exception I could think of is if
you had a FS which declares a specific coding for ALL filenames in it
(some msdos filesystem?), thus giving you a hint how the filenames are
_intended_ to be seen. Then if that coding differed from the LANG coding
you are currently using for typing and seeing bytes, maybe some
translation of symbol X in your LANG coding into symbol X in the fs
coding could be made on the fly as you write or read flenames? But why
would the same visual symbol have different codes in different codings?
Don't they take care that that does not happen? Oh well, I suppose it
could.

Anyway, I don't see what the problem is - if you want to visualize
filenames that were written in encoding X as they were written, then
visualize them in encoding A. Otherise put up with the weird
representation in your local encding B. I don't see hwo you can
generally expect to do meaningful translation!

> or latin1. Since filenames can be represented with different character sets

Eh? What? That does not parse. Filenames are filenames - a string of
characters. The printed representation of those characters is smething
else and is up to you!

> it is very important when displaying filenames containing extended
> characters like the degree symbol which is a legal character for Macintosh

No - it's not important. It matters not one little bit. It's only a
question of how the symbol appears on your screen, and that is a
question of your personal taste and what laguage encoding you have set
for printout as to how the symbol is presented. Factually it's not of
interest to anyone.

> files through the Netatalk protocol. In case it is important, I am
> currently using Suse Enterprise v9 with Netatalk 1.6.4-51. The files I am

What's netatalk? Something to do with Novell? Or apple? The latter, I
guess.

Anyway, I don't understand what you are on about. Characters are
characters. How you choose to print them is up to you.

> trying properly represent are on a hard drive that came from a system that
> was Redhat ES 2.1 using Netatalk 1.6.4. I am working with the convmv
> utility.

What? Never heard of "convmv"!

I suspect netatalk is something like apple nfs/smb. Network protocols
make more obvious but do not create the problem that two different
people can see the same characters in different ways, if they choose.
Unless they know what coding the other is using, they won't see what
the other intended. So they need to communicate the encoding. If you
think they need to "translate" into a common encoding, and then out
again at the other side, in order to resolve the problem (the "common"
encoding being what the filename is "written" in), you would seem to
be labouring under several misapreensions. That sounds like you are
imagining that it is like translating from local cpu order to network
byte order and out again at the other side.

No it isn't. You can't represent symbols of coding A in some common
coding that then can then be converted to coding B. They don't have
codes for each others symbols! All you can do is make sure that
either

   (1) the intended (what it was written using) charset is held on the
   FS, or

   (2) you use a previously agreed meta-coding to register the charset
   used as PART of the file name - that's a metacode, not a code.

Now, (1) would be what one would use for MSODS systems. I suspect you
are talking about (2), and you want us to encode the LANG used (A or B)
when writing the filename as part of the filename, using say utf8
metacoding (which is ascii, no? I don't know!).

Then what one could do is also use (1) to indicate when you don't have
to say in the filename what code is being used. It would be the
"default" encoding. That would only be a useful optimization, though,
since nothing wouuld stop one from starting every filename with
<ascii> (or whatever the trick sequence is :-) if one wanted to!

<ascii>This is a filename with an </ascii> <european>&umlaut;</european>

Or whatever things like that look like!

However, what you are actually on about I don't know. This is just
conjecture sparked by your cryptic comments.

Peter



Relevant Pages

  • Re: UTF8 file handling APIs
    ... Linux allows filenames in any 8-bit encoding. ... Not any encoding, but utf8, yes. ... of) any filename character, everything is fine. ... they do not interpret the bytes as characters. ...
    (comp.os.linux.development.apps)
  • Customized field
    ... I currently have inserted the field "filename". ... first 3 characters of the file name. ... something that must be done using coding (VBA)? ...
    (microsoft.public.word.docmanagement)
  • Re: unix filename restriction
    ... underscore _ and the dot. ... You could use other characters, ... never met a *human* defined path of more than 80 chars (most of the ... If you want to be very sure, limit your filename to the DOS limit of 8 ...
    (comp.os.linux.development.apps)
  • Re: Error Message: "The File Name is Not Valid or Too Long...."
    ... I, then, proceed to name that file using only valid characters. ... the destination folder. ... That is not counting the drive letter and slashes or the filename itself. ... (Again - not counting the extension/dot.) ...
    (microsoft.public.windowsxp.perform_maintain)
  • Re: Cannot delete files
    ... There HAS to be an 8.3 filename! ... I've not been brave enough to tell it Yes on THIS folder, ... characters don't look quite the same in this email font; ... but the SFN was "NOD~1.EE". ...
    (microsoft.public.windowsxp.perform_maintain)