Re: need fastest way to write 2gig array to disk file
From: Eric Taylor (et1_at_rocketship1.com)
Date: 09/12/05
- Next message: Eric Taylor: "Re: need fastest way ... found a problem on the disk"
- Previous message: Eric Taylor: "Re: need fastest way to write 2gig array to disk file"
- In reply to: Basile Starynkevitch [news]: "Re: need fastest way to write 2gig array to disk file"
- Next in thread: Eric Taylor: "Re: need fastest way ... found a problem on the disk"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 11 Sep 2005 16:51:03 -0700
Actually, part of my problem is that the program which needs to
do this is already using nearly 3.8 gigs of r/w memory. The rest
is program code and libraries, plus the stack. We do need the entire
32 bit address space. By doing writes, to the cache, I get to actually
use approx 8 gigs of memory w/o doing anything especially clever.
But, in order to use mmap, I would have to map and unmap over and over. I probably
have say, 25 megs to play with. Will linux let me map/unmap w/o forcing me
to wait for the unmap to flush to disk? If so, I can't really wait for that.
I somewhat simplified the problem, for the sake of making a request.
I probably should have just stated my case. So, here goes:
In order to position memory exactly in my 32 bit virtual address space,
I use assembly allocation statements since C code cannot allocate
more than 2 gigs in one contiguous chunk:
asm(" .comm hugebss,0xf4000000,0x1000" );
This works becasue of the hugebss patch which has been incorporated
into the linux loader. Knowing where this memory needs to be located,
I can guarentee that about 3.6 gigs of memory will be available to me
in the center of the virtual address space. I need this to implement a
checkpoint of a simulation. I have no control over how the simulation
was coded, I am replacing it's checkpoint code so we can use all
the memory (it would die at 1 gig from a poor use of sbrk).
This simulation runs for a week or two and must be checkpointed
every 30 minutes, sometimes even less. I broke it up into 2 gig files
hence my original request. It is ok that the file hasn't been flushed yet, as that
can happen while the simulation continues. No other disk i/o is
done, and all the machines are dual processors.So, the simulation can
crunch while the write caches flushes.
Our systems all have over 12 gigs and we've depended on this going
right to the cache and so we can checkpoint all of memory (3.6 gigs)
in 1 minute or less. Well, we could before rhel4 with it's 2.6 kernel.
Now it's acting strange and I can't quite figure out why. I suspect
it's something that can be tuned. And that's what I really need help doing.
"Basile Starynkevitch [news]" wrote:
> On 2005-09-09, Eric Taylor <et1@rocketship1.com> wrote:
> > I have a two gigabtye array. From a nothing special C program
> > what is the fastest way to write this to a disk file.
>
> You might consider using mmap on this array. Basically, instead of
> allocating the array with malloc, you open a file for writing, then
> mmap it, then msync; something like (untested code, and you really
> should add all the error checks)
>
> // the array size in byte has to be a multiple of the page size
> #define BIGSIZE 16384*1024 /* number of doubles in your array */
> int fd = open("your_big_file", O_RDWR|O_CREAT, 0640);
> if (fd<0) errored("failed to open");
> size_t arraysize = sizeof(double)*BIGSIZE;
> // grow the file if needed by truncating it
> ftruncate(fd, arraysize);
> // map the array in memory
> void* arrayaddr = mmap((void*)0, arraysize, PROT_READ|PROT_WRITE,
> MAP_SHARED, fd, (off_t)0);
> if (arrayaddr==MAP_FAILED) errored("failed to mmap");
> double* array = arrayaddr;
> // compute & fill the array
> compute_your_big_array(array, BIGSIZE);
> // sync the array to disk
> msync(ad, arraysize, MS_ASYNC);
> // unmap memory and close file
> munmap(arrayaddr, arraysize);
> close(fd);
>
> Read a good book on Linux or Posix system programming, and the man pages for
> open(2), mmap(2), msync(2), munmap(2), close(2) system calls
>
> Regards
>
> --
> Basile STARYNKEVITCH http://starynkevitch.net/Basile/
> email: basile(at)starynkevitch(dot)net
> 8, rue de la Faïencerie, 92340 Bourg La Reine, France
- Next message: Eric Taylor: "Re: need fastest way ... found a problem on the disk"
- Previous message: Eric Taylor: "Re: need fastest way to write 2gig array to disk file"
- In reply to: Basile Starynkevitch [news]: "Re: need fastest way to write 2gig array to disk file"
- Next in thread: Eric Taylor: "Re: need fastest way ... found a problem on the disk"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|