Compression to a fit a space...

From: Fredderic (ciredderf_at_sumirpi.is_backwards_at.com.au)
Date: 09/02/04


Date: Fri, 03 Sep 2004 04:39:41 +1000

Greetings peoples...

I'm trying to write a program which needs to fill a particular space
with a tar.gz (or even better, a tar.bz2) of a collection of files. The
files it needs to pack in are log mostly files, and the space it needs to
fill is usually about 20-30mb left by a CD image that's being built by
another package.

Most importantly, I need to be able to fill as close to but no more than a
given space with a tar.gz containing as many of a set of files as I can
get to fit.

My current implementation simply builds a tar one file at a time,
compressing it after each file and checking the size of the resultant
compressed version. However these logs compress quite substantially, so
the uncompressed tar could conceivably grow quite large if I were to feed
it an empty CD instead of one that was mostly used (the important logs get
put on first, followed by as many others as will fit -- the others are
included just for convenience and don't really matter if they don't make
it in). Not to mention how incredibly slow the process is.

I was thinking of building a tar archive myself, one file at a time;
generate the necessary tar header, attach the file, then compress the
whole chunk. If it will fit, tag it onto the end of the archive. gunzip
will extract each part as it encounters it in the file, outputting them
all as a single stream which can be fed to tar for extraction. But a
collection of individual gz'd files concatenated together won't be as
small as a gz's tar, and may in fact be bigger in the case of a large
number of small files.

Archive programs don't really have that problem, because they just split
the last file across as many disks as it needs. I however, need each
tar.gz to be complete on its own. So I need to know, as I'm building it,
how big it will be if I add a given file. And the best way to do that is
if I can "rewind" the archival process any time I add a "too big" file.
Does anyone know of a compression library where this is possible, or of a
better way of doing what I need to do...

Any help would be muchly appreciated.

Fredderic



Relevant Pages

  • Re: Compression to a fit a space...
    ... However these logs compress quite substantially, ... > the uncompressed tar could conceivably grow quite large if I were to feed ... > will extract each part as it encounters it in the file, ...
    (comp.os.linux.development.apps)
  • Re: FTPing a folder across to remote webspace
    ... > lot of repeated test it will compress a lot. ... > gzip only processes single files, so yes you should tar first. ... > In your message you mentioned databases, but not which sort of database. ... MySQL backup. ...
    (uk.comp.os.linux)
  • Re: FTPing a folder across to remote webspace
    ... >How do I find the size of the main folder, sub folders and files before and ... lot of repeated test it will compress a lot. ... so yes you should tar first. ... backup of a database. ...
    (uk.comp.os.linux)
  • Re: Compressing folders/sub-folders with TAR : how to ?
    ... > What's the syntax to compress all folders and sub-folders found with TAR? ... that's an archiver. ... bzip2 whose role is to take an arbitrary stream of data as input ...
    (comp.unix.questions)
  • tee to mail and file
    ... tar: Removing leading / from absolute path names in the archive ... In tar.out file is logs and orig file ... ...
    (comp.unix.shell)