Re: [opensuse] uncompessing zip files and accented characters



Philipp Thomas wrote:
* Dave Howorth (dhoworth@xxxxxxxxxxxxxxxxx) [20100713 12:50]:

Exactly, as far as I know filenames are stored in the filesystem as
octets.

Correct so far.

There's no notion of characters or encodings.

That's not correct.

OK, I hold my hand up. Please just give me a reference to the place
where the encodings are defined so I can learn.

Neither does the kernel care what the octet sequence represents.

Wrong! File system drivers like ntfs or vfat explicitely use specific
encodings.

Hmm, so Microsoft break software layering abstractions. Why does that
not surprise me? Isn't ntfs-3g a user-level driver though?

Talk of encoding in the filenames themselves is muddled thinking.

I tend to disagree.

These articles are quite old so I thought at first my beliefs were just
out of date:
<http://lwn.net/Articles/71472/>
<http://www.win.tue.nl/~aeb/linux/lk/lk-6.html>

But this is 2010-05-23
<http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html>

"Yet because you can’t know the character encoding of a given filename,
in theory you can’t display filenames at all today. Why? Because then
you don’t know how to translate the bytes of a filename into displayable
characters (!)."

As has been suggested, convmv is a way to do that.

Convmv can help may be able to convert the file name on disk but it won't
change unzip's display.

Indeed. Setting the appropriate environment, specifically locale, in
which to run unzip is the way to do that.

Cheers, Dave

PS I'm not trying to argue that the filename architecture is the best
way to design it, just what it is.
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx



Relevant Pages

  • Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
    ... some encodings, such as UTF-8, are simply _not_ compatible with the ... Maybe we should forget filename encoding altogether, ... thinking of filenames as arbitrary sequences of _32-bit words_. ... Existing applications can store their arbitrary byte sequences in the ...
    (Linux-Kernel)
  • Re: Access not showing image on the form just the filename
    ... It is an unbound Image ... the existing OLE Server) - the 'OLE Server' is needed to display JPEG in a Bound ... Most likely you just need to install 'MS Photo Editor' from the Office XP CD on your Office 2003 machine. ... Whenever I link an image all it shows is the filename. ...
    (comp.databases.ms-access)
  • Re: CONFIG_VFAT_FS_DUALNAMES regressions
    ... current windows versions behave, which is to try and put it in the 8.3 ... even if it is all lowercase. ... filename directly. ... Assuming it has a way to see the filename, does it display the ...
    (Linux-Kernel)
  • Re: Automate version creation? and comparison?
    ... filename that gets incremented by 1 each time if the filename already ... you could have code to display a userform or InputBox into ... I am attempting to automate the saving of different versions. ...
    (microsoft.public.word.vba.general)
  • Re: Error 52: Bad File Name (Chinese characters)
    ... I can display your file name, and can open and act on ... Access can store and display it properly in a table, and VBA can ... But, debug.print filename can't display the non-Japanese DBCS characters, ... It seems that it fails when there's no native way of displaying the filename ...
    (microsoft.public.access.modulesdaovba)