Re: dd question



On 10Dec2010 14:28, stan <gryt2@xxxxx> wrote:
| On Fri, 10 Dec 2010 03:11:25 +0000 (UTC)
| "Amadeus W.M." <amadeus84@xxxxxxxxxxx> wrote:
| > I have a binary file with data. Each block of 48 bytes is a record. I
| > want to extract the first 8 bytes within each record. I'm thinking
| > this should be possible with dd, but gawk, perl - anything goes. It
| > just has to be fast, because the data files are ~ 1Gb.
| >
| > I can do this in C++ but I was just wondering if it can be done with
| > existing well tested tools.
|
| The binary aspect makes it tricky. If they were EOL delimited records,
| lots of tools could do this.
|
| Here's a python function, not checked though. It does require that you
| have enough memory to slurp the file into memory. Put it in a file,
| edit for the filenames, and run it as python <filename>. I guess it
| should take less than a minute, but not sure, should be fine for one
| off.
|
| def extract (filename1 = None, filename2 = None):
| if filename1 != None and filename2 != None:

I'd not bother with this check - it is a special purpose function that
will not be misused, and if is _is_ misused it will fail silently, which
is not good.

| infile = open (filename1, "rb")
| slurp = infile.read () # at least as much memory as the file size
| infile.close ()
| outfile = open (filename2, "wb")
| while len (slurp) > 0:
| record = slurp [:48] # extract a record
| first8 = record [:8] # slice off first 8 positions
| outfile.write (first8) # write them out, no separator
| slurp = slurp [48:] # chop them off the file

This step is Very Expensive. Don't reallocate a 1GB string every 48
bytes, just pull out the pieces you need.

| outfile.close ()
|
| extract (filename1 = "your input filename with path",
| filename2 = "your output filename with path")

Untested example:

def get8of48(fp):
while True:
chunk = fp.read(48)
if len(chunk) == 0:
break
yield chunk[:8]
if (len(chunk) != 48:
print >>sys.stderr, "warning: short read from %s (%d bytes)" % (fp, len(chunk))

for chunk8 in get8of48(open("your filename here", "rb")):
... do something with chunk8, the 8-byte chunk ...

Shorter and faster and using less memory.

Cheers,
--
Cameron Simpson <cs@xxxxxxxxxx> DoD#743
http://www.cskk.ezoshosting.com/cs/

The general consensus on covered [litter] boxes is that they are a Good
Thing, and having bought one myself now and tried it out for a couple of
weeks, I agree. No more litter sprayed halfway across the city.
- krw@xxxxxxxxxxxx (Kenneth Wood)
--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines



Relevant Pages

  • Re: Strange error possibly because of code in driver entry
    ... The assignment of zero to NumberOfNames is a bad idea. ... Do you need to create the keys or must they be present? ... PUNICODE_STRING *FileNames, ... Invalid system memory was referenced. ...
    (microsoft.public.development.device.drivers)
  • Re: replace contents of file
    ... preferably short) will that solution work on multiple files too: ... find -print outputs a separated list of filenames, ... entire file in memory, it may cause havoc on your system. ... will not necessarily write anything to disk, ...
    (comp.unix.shell)
  • Re: Strange error possibly because of code in driver entry
    ... PUNICODE_STRING *FileNames, ... NumberOfNames = 0; ... Use!analyze -v to get detailed debugging information. ... Invalid system memory was referenced. ...
    (microsoft.public.development.device.drivers)
  • Re: Directories with 2million files
    ... >filenames in memory in order to sort them. ... If you really do need a sorted list of filenames, ... eats up the same amount of memory before it prints the list. ... Eric Anderson Sr. ...
    (freebsd-current)
  • Re: dd question
    ... I'd not bother with this check - it is a special purpose function that ... Shorter and faster and using less memory. ... You are obviously a higher class python coder than I am. ... To unsubscribe or change subscription options: ...
    (Fedora)