Re: detecting file types



On 2006-02-08, Joe Pfeiffer <pfeiffer@xxxxxxxxxxx> wrote:

I read it quickly, but in fact this was not what I was looking
for. I needed a way to detect, in a program made in C/C++, if
a file is a binary one or a simple ascii one.

There is no quick and reliable way to do it. All you can really do is
scan the file looking for non-printing characters, and if you find
enough of them decide it's not ASCII (do you really mean ASCII, by the
way, or an eight-bit extension like ISO-8859-1?);

If he wants to allow something like ISO-8859-1, then he's going
to need to build a table containing the file's byte
distribution frequencies and do a "fuzzy" compare to the
distributions of known language/charset pairs. Not a
particularly easy/simple thing to do.

or, you can use the "system" call from inside your program to
execute file.

Or he can trust that the user knows what he's doing and just
process the file he's been told to. ;)

--
Grant Edwards grante Yow! Four thousand
at different MAGNATES, MOGULS
visi.com & NABOBS are romping in my
gothic solarium!!
.



Relevant Pages

  • Re: detecting file types
    ... Grant Edwards writes: ... I needed a way to detect, in a program made in C/C++, if ... a file is a binary one or a simple ascii one. ... distribution frequencies and do a "fuzzy" compare to the ...
    (comp.os.linux.development.apps)
  • Re: The new google-groups beta doesnt recognise alt.lang.asm !
    ... >> enjoy your ascii boobs. ... They ARE spectacular. ... Never trust a man's opinion of his own talent; ...
    (alt.lang.asm)
  • Re: You should do the right thing the next time.
    ... still playing with ASCII animations??? ... you just gotta get back to basics... ... ..NET: It's About Trust! ...
    (microsoft.public.vb.general.discussion)
  • Re: detecting file types
    ... Grant Edwards writes: ... I needed a way to detect, in a program made in C/C++, if ... If bit 7 is set in the result, it's not an ASCII file. ... Looking for non-printable characters < 0x20 is also a good idea. ...
    (comp.os.linux.development.apps)
  • Re: detecting file types
    ... dagecko writes: ... it's an ascii or if it's a link to another file. ... If you type "man file" at a command prompt, ... I needed a way to detect, in a program made in C/C++, if ...
    (comp.os.linux.development.apps)