Re: detecting file types
- From: Grant Edwards <grante@xxxxxxxx>
- Date: Wed, 08 Feb 2006 21:46:35 -0000
On 2006-02-08, Måns Rullgård <mru@xxxxxxxxxxxxx> wrote:
Grant Edwards <grante@xxxxxxxx> writes:
On 2006-02-08, dagecko <dagecko@xxxxxxx> wrote:
I read it quickly, but in fact this was not what I was looking
for. I needed a way to detect, in a program made in C/C++, if
a file is a binary one or a simple ascii one.
1) Do a bitwise or of all the bytes in the file.
2) If bit 7 is set in the result, it's not an ASCII file.
3) If bit 7 is not set in the result, it _might_ be an ASCII
file. Or it might be a binay file that doesn't have any
bytes with bit 7 set.
If you know what language the ASCII is supposed to be, you
could look at the frequency distributions of individual
characters to give you a better idea if a file is really ASCII
or if it's a degenerate binary file.
Looking for non-printable characters < 0x20 is also a good idea.
Those non-printible characters are all prefectly legal ASCII.
However, he's what he's looking for is a typical ASCII _text_
file, then I wouldn't expect to find too many non-printible
characters other than form-feed, line-feed, carriage-return,
horizontal-tab, and maybe backspace.
Grant Edwards grante Yow! As President I
at have to go vacuum my coin
- Prev by Date: Help a newbie :) (prob with 'for')
- Next by Date: Re: detecting file types
- Previous by thread: Re: detecting file types
- Next by thread: Re: detecting file types