Re: Re: more git updates..

From: Andrea Arcangeli (andrea_at_suse.de)
Date: 04/13/05

  • Next message: Joel Becker: "[PATCH] hangcheck-timer: Update to 0.9.0."
    Date:	Wed, 13 Apr 2005 01:40:05 +0200
    To: Linus Torvalds <torvalds@osdl.org>
    
    

    On Tue, Apr 12, 2005 at 02:21:58PM -0700, Linus Torvalds wrote:
    > The full .git archive for 199 versions of the kernel (the 2.6.12-rc2 one
    > and a test-run of 198 patches from Andrew) is 111MB. In other words,
    > adding 198 "full" new kernels only grew the archive by 9MB (that's all
    > "actual disk usage" btw - the files themselves are smaller, but since they
    > all end up taking up a full disk block..)

    reiserfs can do tail packing, plus the disk block is meaningless when
    fetching the data from the network which is the real cost to worry about
    when synchronizing and downloading (disk cost isn't a big deal).

    The pagecache cost sounds a very minor one too, since you don't need
    the whole data in ram, not even all dentries need to be in cache. This
    is one of the reasons why you don't need to run readdir, and why you can
    discard the old trees anytime.

    At the rate of 9M for every 198 changeset checkins, that means I'll have
    to download 2.7G _uncompressible_ (i.e. already compressed with a bad
    per-file ratio due the too-small files) for a whole pack including all
    changesets without accounting the original 111MB of the original tree,
    with rsync -z of git. That compares with 514M _compressible_ with CVS
    format on-disk, and with ~79M of the CVS-network download with rsync -z of
    the CVS repository (assuming default gzip compression level).

    What BKCVS provided with 79M of rsync -z, now is provided with 2.8G of
    rsync -z, with a network-bound slowdown of -97.2%. Similar slowdowns
    should be expected for synchronizations over time while fetching new
    blobs etc...

    Ok, BKCVS has less than 60000 checkins due the linearization and
    coalescing of pulls that couldn't be represented losslessy in CVS, so
    the network-bound slowdown is less than -97.2%, my math is
    approximative, but the order of magnitude should remain the same.

    Clearly one can write an ad-hoc network protocol instead of using
    rsync/wget, but the server will need quite a bit of cpu and ram to do a
    checkout/update/sync efficiently to unpack all data and create all
    changesets to gzip and transfer.

    Anyway git simplicity and immutable hashes robustness certainly makes it
    an ideal interim format (and it may even be a very pratical local
    live format on-disk, except for the backups), I'm only unsure if it's a
    wise idea to build an SCM on top of the current git format or if it's
    better to use something like SCCS or CVS to coalesce all diffs of a
    single file together and to save space and make rsync -z very efficient
    too (or an approach like arch and darcs that stores changesets per file,
    i.e. patches).
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Joel Becker: "[PATCH] hangcheck-timer: Update to 0.9.0."

    Relevant Pages