Re: unix filename restriction

From: Basile Starynkevitch [news] (basile-news_at_starynkevitch.net)
Date: 09/24/04


Date: Fri, 24 Sep 2004 16:38:28 +0000 (UTC)

On 2004-09-24, ilias <kuku@ruku.mu> wrote:
> Is there any restriction on the characters used in a filename in unix
> systems?

In addition of previous answer, I wouls suggest you to restrict
yourself (if possible) to english letters A-Z a-z digits 0-9
underscore _ and the dot . (and perhaps other punctuation like comma,
percent % tilde ~) and to have a conventional "extension" after the
dot. All these are conventional. You can have any non slash and non
null char in paths, and the slash is the directory separator.

You could use other characters, but it might annoy people. For
example, some people don't know how to safely remove a file named -rf
(yes, the hyphen -, then the lowercase letter r, then the lowercase
letter f). Other people might have problems in removing a file named
with two control characters (such as a newline and a linefeed).

If your files are supposed to be visible thru a usual FTP or HTTP
server, the above guide lines are almost mandatory.

Yes, I do suggest avoiding the space character in filenames.

> The maximum size is 255 characters?

For a path, use pathconf(dirpath, _PC_PATH_MAX). But note that some
old utilities (like some old variant of tar) limit path names to 100
characters, and (more importantly) that some filesystems also put a
limit (eg a FAT filesystem).

An after about 20 years of software practice (mostly developping), I
never met a *human* defined path of more than 80 chars (most of the
absolute paths I deal with are much shorter, usually 20-40 char
wide). I do know that some software generate longer paths.

If you want to be very sure, limit your filename to the DOS limit of 8
chars, a dot, and at most 3 chars of extension (for a filename) If
you want to be more liberal, consider limiting it to 14 chars. In real
practive, a limit of 80 chars is more realistic.

As usual, be liberal in what you accept (so accept long paths with up
to 4096 bizarre chararacters) but be strict in what you generate (so
prefer to generate absolute paths shorter than 80 or 100 chars, with only
letters, digits, dots and slashes).

The 4096 limit is in /usr/include/linux/limits.h as PATH_MAX

Above all, your maximal path limit should (at least) be a compile-time
configurable limit. Better use run-time limits (ie sysconf or
pathconf system calls).

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France


Relevant Pages

  • Re: Random filenames
    ... > extension must remain the same for each filename returned. ... naming must follow the most stringent rules of all of them. ... than 8 chars. ... to draw the characters. ...
    (comp.lang.pascal.misc)
  • Re: How to trim cell information
    ... you really can't tell whether the characters after the last dot are ... file type extension, or part of the file name. ... you can't be sure if the part after a dot is part of the filename ...
    (microsoft.public.excel.programming)
  • Re: Mysterious directory / IRIX 6.5.14
    ... chars in it. ... only those characters in the filename, ... Tony 'Nicoya' Mantler wrote: ... > whose name consists entirely of unprintable characters. ...
    (comp.sys.sgi.admin)
  • Re: determine default filename encoding
    ... > Suse tech support for the way filename characters are represented, ... Characters are characters - just bytes. ... Then if that coding differed from the LANG coding ... filenames that were written in encoding X as they were written, ...
    (comp.os.linux.setup)
  • Re: Predefined Cell Character Length
    ... =LEFTreturns up to 10 characters. ... If 3 chars in a cell just those 3 will be returned. ... Gord Dibben MS Excel MVP ... cells in clumn A can only be 10 characters at the most, ...
    (microsoft.public.excel.worksheet.functions)