Re: find script?



On Thu, 07 Aug 2008 15:06:01 -0500, Moe Trin typed this message:

On Wed, 06 Aug 2008, in the Usenet newsgroup linux.redhat, in article
<ck1k94l4661d5np0e8akgc2ofofc9p4kfe@xxxxxxx>, P1 wrote:

You're the second person that stated that this sounds like homework.
Maybe it does, I've never taken a Linux class, been a Windows admin for
many years, just starting to scratch the surface of Linux.

If you look through the archives of this, and more usefully, the Usenet
group 'comp.unix.shell', you'll see this situation quite frequently. My
neighbor is an instructor at a nearby college, teaching three courses.
The first is CS-101A Intro To UNIX, and the homework given is often
artificial, but is intended to expand the skills of the student. By the
eighth week, some of the problems are posed to cause the student to
think of rather long "one-liners".

Most accounts work fine, but a few hundred don't and I found that they
don't because the INBOX.MBX file case doesn't match the INBOX.IDX file
case, so the system can't find the index for the mailbox file and hence
shows the mailbox as empty. I need to find all directories where these
two files have a different case so that I can fix that problem.

'manually' fixing the problem is a good technique if you are not very
experienced - you avoid automatically shooting yourself in the foot.

OK - the tool you are looking for is 'uniq' (man uniq)

uniq prints the unique lines in a sorted file, discarding all but
one of a run of matching lines. It can optionally show only lines
that appear exactly once, or lines that appear more than once.
uniq requires sorted input because it compares only consecutive
lines.

and the problem then becomes one of creating this list. Now, your
original post says the files are in the same directories, which suggests
we don't have to test the path for differences:

/this/part/is/the/same/INDEX.MBX
/this/part/is/the/same/iNdEx.IDX

OK - first tool is 'find' and you tell it to start searching at the
appropriate directory level - perhaps /var/spool/mbox (you don't mention
the arrangement of the file system), searching for files, with names
that contain the string ending in .MBX or .IDX, (printing the full file
name relative to the search starting point is the default action):

find /var/spool/mbox -type f \( -name \*.IDX -o -name \*.MBX \)

If the _pairs_ of files were NOT located in the same directory, but
rather something like

/path/to/index/files/INDEX.MBX
/path/to/mailbox/files/iNdEx.IDX

you need only tell 'find' to start searching at the appropriate common
point (/path/to) and add a '-printf "%f \n"' option on the end of the
'find' command to have it only print 'filename.extension' rather than
the '/path/to/directory/filename.extension' names.

Pipe the output to 'cut to eliminate the extension

cut -d'.' -f1

and then do a caseless sort, and pass the result to 'uniq -u'

sort -f | uniq -u

So the "one-liner" actually looks like

find /var/spool/mbox -type f \( -name \*.IDX -o -name \*.MBX \) | cut
-d'.' -f1 | sort -f | uniq -u

That's all one line. Now, hit the man pages for those four commands and
see what I've done. One _other_ thing I'd look at is to see that there
are no case errors in the file extensions

find /var/spool/mbox -type f \( -iname \*.IDX -o -iname \*.MBX \) |
grep -v .IDX | grep -v .MBX


Ordinarily, I wouldn't comment but ....
above | grep v .IDX | grep -v .MBX would only eliminate *.IDX and
*.MBX but I think the OP actually wanted to be inclusive *Mbx, *MBX,
*mBX, *mbX, for examples.


Also the cut "." f1 would produce /home/moetrim/file, /home/moetrim/
file1, /home/moetrim/file2/not, etc.
for files
/home/moetrim/file.is.longer.MBx, /home/moetrim/file1.not.that.1.mbx
and /home/moetrim/file2/not.MbX
and I think the OP wanted just
/home/moetrim/
/home/moetrim/file2

which looks for those same file extensions in a caseless manner, then
culls out those with upper case extensions only.

A minor caution - running these commands MAY take a lot of resources,
depending on how large an area you have to search. Try to schedule this
for when the system isn't overly busy.

In Windows, this didn't matter because there is no case sensitivity, but
obviously in Linux this broke the mailbox.

Can't imagine why the authors would have decided mixing case would have
been a good idea just because you can get away with it - but what-ever.

Gordano support is trying to figure out how to fix this programmatically
also, but they're slow as hell so I was trying a different avenue...

I hate to tell you this, but we have a cooperative program with several
universities/colleges providing "work experience" positions - the
"summer intern" type of job. For the *nix related positions, I expect
the interns to be able to whip out that 'one-liner' in under five
minutes. Graduates of that 'CS-101A Intro' class would be _aware_ of the
solution, but it might take them an hour or two to get it right. By the
way, this is only one of several ways the problem could have been solved
- that's a problem and a feature of *nix.

Old guy

.



Relevant Pages

  • Re: find script?
    ... Maybe it does, I've never taken a Linux class, been a Windows admin ... OK - the tool you are looking for is 'uniq' ... OK - first tool is 'find' and you tell it to start searching at the ... which looks for those same file extensions in a caseless manner, ...
    (linux.redhat)
  • Email client recommendation?
    ... Linux and I ... which made searching using operating system tools possible. ... recommends Thunderbird on Linux, but I'm not sure how up to date it ...
    (uk.comp.os.linux)
  • Re: Cable modem poops out on me
    ... Now, a modern and well-done Linux distribution (say, Debian ... > or Rootkit Hunter, which check for known penetration techniques. ... > Searching for sniffer's logs, ... > Searching for suspicious files and dirs, ...
    (comp.os.linux.networking)
  • Re: Thanks and sorry!
    ... Rajev Mhasawade writes: ... > First of all I am very sorry coz accidently i posted the msg ... > But searching the archive seems to be the best idea. ... them covering just about all apects of linux. ...
    (Fedora)
  • hs_err_pidxxx.log
    ... I haven't found much on usenet, in comp.lang.java.*, regarding this, ... but am still searching. ... I'm running Fedora Core 3 linux. ...
    (comp.lang.java.help)