[opensuse] A SOLUTION to delivering archived file to Windows user with proper charset conversion/handling



WE cannot avoid it: some day you need to send an archive to a Windows
user, or you are on Windows and need to send an archive to Linux user.
In most cases your Linux is using a different charset than your Windows
receipt, because usually Linux use UTF-8 and non of known version of
Windows by default use UTF-8 on file system. So, you need to choose an
archive format that can deliver the file without corrupting file name.
Unfortunately there is only one known way to do this on Linux and only
one known way to do this on Windows. Which will be explained later in
this email. FYI: the famous ZIP format cannot do it.

(after searched for all possibility for weeks and discussed on this
list...)

The only one known way to deliver archived files to Windows user with
proper charset conversion/handling is to use my patched version of
patched lcab (originally from http://coding.wooyayhoopla.be/lcab/) and
my patched version only work correctly if your current locale is using
UTF-8. (the patch already sent to maintainer of lcab). (My patch is not
well programmed because I am perl idiot. I just wish to solve problems)

The only one known way to deliver archived files from Windows to Linux
with proper charset conversion is to use winRAR, as detailed below.

P.S. someone might suggest to use convmv before packing the package,
this only work if you know the receipt's Windows computer's
character-set. Here I suppose you don't know receipt's Windows version
and want to deliver the file safely whatever the format is, which is the
case I often met.

Detail:

I have tested a lot different formats:
WATCH FOR 'YES' IN THIS FIELD
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Format Created by Opened by compressible? Filename readable after unpackaged?
zip 7-zip_win32 FileRoller yes no
zip FileRoller 7-zip_win32 yes no
rar WinRAR 3.6 FileRoller yes yes!
rar FileRoller WinRAR 3.6 yes no
iso mkisofs -J WinRAR 3.6 no partial
jar FileRoller 7-zip yes no
tar GnuTar WinRAR 3.6 no no
tar 7-zip GnuTar no no
cab IExpress cabextract yes no
cab lcab(patched) Explorer no yes!

Note about partial filename readable:

Although the created ISO image have Joliet extension and all
filenames in UTF16LE, winRAR (incorrectly) produce junk text for
the first two ideographs, while following ideographs are
correct. This should be winRAR's bug.

The same ISO image, once burnt on CDR, works fine (no junk
filename) on Windows.

Note on software being used:
* WinRAR 3.6 is actually WinRAR 3.60-Beta
* 7-zip_win32 is a Windows software, the open-source alternative
to WinRAR, can unpackage zip/tar/rar/7z formats and can create
zip/tar/7z archives
* IExpress is a Microsoft utility pre-installed on most Windows
computers. This utility can create CAB format archive
* FileRoller is SuSE 10.2's default gnome archive manager
* in all above archive formats, TAR and ISO are non-compressing
format, e.g. they only archive (collect files into one file). To
compress them you can use gzip.

So the conclusion is:

1. To send an archive file to other windows user, the only known
way that can ensure file names are NOT corrupted due to locale
difference is to make the archive with my patched version of lcab,
unfortunately no very easy graphical user interface for this
purpose and not compressed. If compression is needed, compress
with gzip and Windows user who have WinRAR installed can open it.
It shouldn't be difficult at all to add DEFLATE compression using
zlib to lcab to produce compressed CAB archives, provide someone
have time and knowledge to do it.
2. To send an archive file from Windows to SuSE Linux user, the
only known way that can ensure filenames are not corrupted due
to locale difference is to make RAR archive with WinRAR (only
tested v3.60-beta).
3. Non of known Linux archive formats can handle character-set
conversion. Only 3 archive formats have known support for
unicode thus are able to handle character-set conversion,
unfortunately all are non-free formats. They are
i. Joliet ISO image format which is a Microsoft format.
Unfortunately opens partially correct with WinRAR;
ii. Microsoft CAB format. Unfortunately CAB archives made
on Windows using IEXPRESS do not use unicode filenames.
iii. RAR format, unfortunately RAR format made on Linux do
not support character conversion.
4. 7z format is not tested because most Windows & Linux users
cannot directly create/open 7z format archive without install
special software so it cannot be used as a general solution.

--
Zhang Weiwu
Real Softservice
http://www.realss.com
+86 592 2091112

--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx



Relevant Pages

  • Re: [kde] [BULK] Re: Clock
    ... know how 2 run linux & windows on the same machine? ... # Sample crontab entry fires off the cuckoo clock ... Archives: http://lists.kde.org/. ...
    (KDE)
  • Re: Non-Windows version of IBM Softcopy Librarian?
    ... John, ... >of) Windows under Linux as an application? ... >Search the archives at http://bama.ua.edu/archives/ibm-main.html ...
    (bit.listserv.ibm-main)
  • Re: FAT32 vs NTFS
    ... >>What about backing up the Linux files to a zip archive of some type? ... browsers, but Konqueror browses through most archives, just as if they were a ... > then windows will just ignore the ext3 partition, ... then check to make sure Windows sees the FAT32 partition before I use ...
    (Debian-User)
  • [SLE] How to answer questions was Re: [SLE] Watching DVD on SuSE 10
    ... People on these lists know you are supposed to ... > read the archives, and probably have the archives on their hard drive/gmail ... True, Windows doesn't come with this capability, but when someone ...
    (SuSE)
  • PeaZip 1.8 released
    ... Windows and Linux. ... compression and authenticated ... Open Office file types, PAK/PK3/PK4, RAR, Windows installers (NSIS, ... Added read support for ACE archives; ...
    (comp.os.linux.announce)