Re: How to search several directories for duplicate files?
- From: Andy Axnot <andy1@xxxxxxxxxxxxxxxxx>
- Date: Mon, 09 Oct 2006 13:25:50 GMT
On Sun, 08 Oct 2006 22:06:54 -0400, Dan Espen wrote:
Andy Axnot <andy1@xxxxxxxxxxxxxxxxx> writes:
I wish to search several directories for duplicate files. This could...
involve several thousands of files.
It works by first finding identical sized files and then running md5sum
on those of the same filesize. I have no idea how 'samefile' works.
Does anyone have any experience with these or other utilities or
scripts?
Not here.
Any thoughts on the likelihood of errors using size and md5sum vs cmp or
something similar?
The odds of md5 giving a false positive are very low. After finding the
dups with md5, running cmp to verify can't hurt.
Any info or advice on time required with large files or large numbers of
files?
The time it takes would depend on the number of same sized files. Doing
the size comparisions would be very fast. The md5 is going to require a
read of the whole file but then it can be compared very quickly to other
files. If you tried to cmp each file to every other file of the same
size, that could be very slow.
Is a script too slow for something like this?
All the time is in reading the files to get the md5 sum. A script isn't
going to slow it down.
OK, thanks much for your input. I'll run some tests on increasingly
larger test samples to see if times are reasonable. Whatever reasonable
is :-)
Andy
.
- References:
- How to search several directories for duplicate files?
- From: Andy Axnot
- Re: How to search several directories for duplicate files?
- From: Dan Espen
- How to search several directories for duplicate files?
- Prev by Date: Re: external HDD
- Next by Date: Re: How to search several directories for duplicate files?
- Previous by thread: Re: How to search several directories for duplicate files?
- Next by thread: Re: How to search several directories for duplicate files?
- Index(es):