Re: detecting file types
- From: Joe Pfeiffer <pfeiffer@xxxxxxxxxxx>
- Date: 08 Feb 2006 15:07:16 -0700
Grant Edwards <grante@xxxxxxxx> writes:
On 2006-02-08, Joe Pfeiffer <pfeiffer@xxxxxxxxxxx> wrote:
I read it quickly, but in fact this was not what I was looking
for. I needed a way to detect, in a program made in C/C++, if
a file is a binary one or a simple ascii one.
There is no quick and reliable way to do it. All you can really do is
scan the file looking for non-printing characters, and if you find
enough of them decide it's not ASCII (do you really mean ASCII, by the
way, or an eight-bit extension like ISO-8859-1?);
If he wants to allow something like ISO-8859-1, then he's going
to need to build a table containing the file's byte
distribution frequencies and do a "fuzzy" compare to the
distributions of known language/charset pairs. Not a
particularly easy/simple thing to do.
Not easy at all -- but lots of people say "ASCII" these days when they
don't really mean it.
or, you can use the "system" call from inside your program to
Or he can trust that the user knows what he's doing and just
process the file he's been told to. ;)
Yeah -- "do the contents of this file conform to the syntax my program
is expecting?" is both a lot simpler and a lot more useful than "is
this file a binary file or a text file?"
Joseph J. Pfeiffer, Jr., Ph.D. Phone -- (505) 646-1605
Department of Computer Science FAX -- (505) 646-1002
New Mexico State University http://www.cs.nmsu.edu/~pfeiffer