Re: The argument for fs assistance in handling archives (was: silent semantic changes with reiser4)

From: Christer Weinigel (christer_at_weinigel.se)
Date: 09/06/04

  • Next message: Emmanuel Fleury: "Re: [Transmeta hardware] Update of the CMS under Linux ?"
    To: Jamie Lokier <jamie@shareable.org>
    Date:	06 Sep 2004 17:55:21 +0200
    
    

    Jamie Lokier <jamie@shareable.org> writes:

    > Christer Weinigel wrote:
    > > Can be done with dnotify/inotify and a cache daemon keeping track of
    > > mtime. Yes, this will need a kernel change to make sure mtime always
    > > changed when the file changes, but it does not require anything else.
    >
    > - Can the daemon keep track of _every_ file on my disk like this?
    > That's more than a million files, and about 10^5 directories.
    > dnotify would require the daemon to open all the directories.
    > I'm not sure what inotify offers.

    I don't think dnotify/inotify handles subdirectories well yet, so I
    suppose that they would have to be extended.

    > - What happens at reboot - I guess the daemon has to call stat()
    > on every file to verify its indexes? Have you any idea how long
    > it takes to call stat() on every file in my home directory?

    The daemon saves state before it shuts down and reloads the state
    after a reboot. You have to make sure that it is started first and
    stopped last during the boot process. How would a kernel plugin
    handle things that happen before or after the plugin module has been
    loaded? It's the same problem.

    > - The ordering problem: I write to a file, then the program
    > returns. System is very busy compiling. 2 minutes later, I
    > execute a search query. The file I wrote two minute ago doesn't
    > appear in the search results. What's wrong?
    >
    > Due to scheduling, the daemon hasn't caught up yet. Ok, we can
    > accept that's just hard life. Sometimes it takes a while for
    > something I write to appear in search results.
    >
    > But! That means I can't use these optimised queries as drop-in
    > replacements for calling grep and find, or for making Make-like
    > programs run faster (by eliminating parsing and stat() calls).
    > That's a shame, it would have been nice to have a mechanism that
    > could transparently optimise prorgrams that do calculations....

    Sure you can. First of all, you can just wait for the daemon to
    finish indexing any files that it has been notified about changes in.
    This is no different from you having to wait for the kernel to finish
    indexing the files. Or are you suggesting that the kernel should stop
    all other processes until the indexing is done?

    > Do you see what I'm getting at? There's building some nice GUI
    > and search engine like functionality, where changes made by one
    > program _eventually_ show up in another (i.e. not synchronously).
    >
    > That's easy.
    >
    > And then there's optimising things like grep, find, perl, gcc,
    > make, httpd, rsync, in a way that's semantically transparent, but
    > executes faster _as if_ they had recalculated everything they
    > need to every time. That's harder.

    If you have a good notify, it's not harder.

    > No, not 3, 4 or 6. For correct behaviour those require synchronous
    > query results. Think about 6, where one important cached query is
    > "what is the MD5 sum of this file", and another critical one, which
    > can only work through indexing, is "give me the name of any file whose
    > MD5 sum matches $A_SPECIFIC_MD5". Trusting the async results for
    > those kind of queries from your daemon would occasionally result in
    > data loss due to race conditions. So you wouldn't trust the async
    > results, and you fail to get those CPU-saving and bandwidth-saving
    > optimisations.

    So how do you calculate the MD5 sum of a file that is in the process
    of being modified? It's not possible to do that unless you block all
    other access to that file and recalculate the MD5 sum after each
    write. With a notifier that tells the daemon that it has stale data
    and needs to reprocess the file, it's no different.

      /Christer

    -- 
    "Just how much can I get away with and still go to heaven?"
    Freelance consultant specializing in device driver programming for Linux 
    Christer Weinigel <christer@weinigel.se>  http://www.weinigel.se
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Emmanuel Fleury: "Re: [Transmeta hardware] Update of the CMS under Linux ?"

    Relevant Pages

    • Two modest kernel features I wish I had
      ... intelligent shim/proxy between SMTP clients and some user-designated SMTP ... While writing this daemon program, a couple of ideas came to mind relating ... to kernel calls which, as far as I know, do not exist, but which would have ... Let's call it `mconnect' ...
      (freebsd-net)
    • Re: /dev/random on Linux
      ... Various things like memory timing, power initialization, BIOS tests, etc are all sufficiently variable in terms of CPU clock cycles that the value of the cycle counter at the first bootloader instruction executed has several bits of randomness. ... By the time the bootloader has optionally waited for user input and loaded the kernel off the disk, chaotic variability in the disk access for HDDs, CDROMs, etc will make many bits of the cycle counter sufficiently random. ... I believe there was a paper that discussed how air turbulence in a disk drive was sufficient on a several hundered MHz CPU to provide lots of entropy per interrupt from the cycle counter alone. ... User consumption of entropy can be controlled by conventional file permissions, acls and SELinux already, or by a policy daemon or combinations thereof. ...
      (Linux-Kernel)
    • Userspace memory and kernel access
      ... userspace daemon instead of in-kernel code. ... sys_acl_connectto connect to the kernel, ... mmapsome ram for it, initialize the structure, pass it back, ... When an ACL request is made, ...
      (Linux-Kernel)
    • Re: Real-time pre-emption kswapd/kscand questions
      ... Michael Blanc wrote: ... > has anything to do with launching the offending daemon. ... What I can tell you, is that if kscand is still active, ... vanilla kernel, then you won't have kscand in there ...
      (comp.os.linux.development.system)
    • Re: Compiling C++ modules
      ... performance was the number one priority because ... not what it should do depending on how the compiler might optimise it. ... majority of code in a monolithic kernel. ...
      (Linux-Kernel)