fixing international characters with samba and apache




Living in Sweden I have finally configured my Debian server to
handle the swedish alphabet (with å, ä, ö) correctly. Below is
how I did, for anyone who's interested.

***

PROBLEM 1: Filenames on a Samba share with correct å/ä/ö in
Windows do not display correctly in Debian.

Solution: Switch to UTF-8 encoding in Debian (more and more
tools use this as default for filenames including f ex Gnome
Nautilus).

Set UTF-8 by assigning LANG to a UTF-8-based locale for all
processes, f ex:
/etc/environment:
export LANG=sv_SE.utf8

If you want to keep English language in Debian (not switching
to your local language) also add this:
export LC_MESSAGES=POSIX

(If needed, create /etc/environment and source it from
/etc/profile.)

If a suitable UTF-8 locale is not available on your system then
add it. Check available locales:

# locale -a

Add locale to "gen" file:
/etc/locale.gen:
...
sv_SE UTF-8

Then run:

# locale-gen.

***

PROBLEM 2: Apache's automatic file listings (autoindex) display
å/ä/ö incorrectly.

Solution: Let Apache use UTF-8 as default encoding.

Set default encoding:
/etc/apache2/apache2.conf:
...
AddDefaultCharset UTF-8

***

PROBLEM 3: Some clients send request URLs incompatible with
UTF-8. An interesting (and confusing) example is the combination
of Internet Explorer (IE6) and Adobe Reader when opening a pdf
file. First, Apache receives a GET requests with a correctly
formed UTF-8 URL, but after that there is a GET request with
binary (not URL-encoded!) 8-bit characters according to
ISO8859-1 encoding. The latter request of course fails.

Solution: Use Apache mod_rewrite to convert illegal characters
to valid URL-encoded UTF-8 (which is the convention to use for
URLs).

Enable mod_rewrite:

# cd /etc/apache2/mods-enabled/
# ln -s ../mods-available/rewrite.load

For lowercase å/ä/ö add these rewrites:
/etc/apache2/httpd.conf:
RewriteEngine On
RewriteRule (.*)å(.*) $1%C3%A5$2
RewriteRule (.*)ä(.*) $1%C3%A4$2
RewriteRule (.*)ö(.*) $1%C3%B6$2
(for some reason I haven't been able to get uppercase Å/Ä/Ö
working...)

Make sure that å/ä/ö in the rules are saved in the ISO8859-1
encoding as this needs to match exactly what arrives in the
request. You can check this with octal dump:

# od -t c /etc/apache2/httpd.conf:
...
0001120 R u l e ( . * ) 345 ( . * ) $
0001140 1 % C 3 % A 5 $ 2 \n R e w r i t
0001160 e R u l e ( . * ) 344 ( . * )
0001200 $ 1 % C 3 % A 4 $ 2 \n R e w r i
0001220 t e R u l e ( . * ) 366 ( . * )

(notice the "eight-bit" characters 345/344/366)

***

Good luck with your own configuration!
Mike Wilson
--
View this message in context: http://www.nabble.com/fixing-international-characters-with-samba-and-apache-tf3870912.html#a10966956
Sent from the Debian User mailing list archive at Nabble.com.



Relevant Pages

  • Re: xterm, screen and emacs
    ... UTF-8 characters are displayed in my screen (debian ... utf8 resource. ... I'll never understand what this locale stuff is good for... ...
    (comp.emacs)
  • Re: What is the right way to fix LANG or locale such that quotes appear properly?
    ... Most likely the charset/encoding used by the en_IN locale is not what your ... and many charset/encoding combinations are compatible with ASCII. ... This results in en_US rendering the vast majority of characters correctly no ...
    (Debian-User)
  • Re: UTF-8
    ... For instance, if your UTF-8 represents Chinese characters, and your locale ... Here's a bit of code that will convert between UTF-8 from a file and Unicode ...
    (microsoft.public.vb.general.discussion)
  • Re: ISO-8859-1 and gnome-terminal in Ubuntu
    ... UTF-8 by default and the mutt wiki suggests we should use 8859. ... Since almost every mail I get comes with 8859 charset and I was having ... Locale settings, what results do you get if you type "locale" in ...
    (comp.mail.mutt)
  • setenv locale / utf-8
    ... UTF-8 mails by another server. ... I checked my installed locales with locale -a. ... Now I got the following warnings if I'm trying to execute a perl script: ... perl: warning: Setting locale failed. ...
    (comp.unix.bsd.freebsd.misc)