Re: How to know when file data has been flushed into disk?



This answered all my questions! Many thanks! Will check the phase 2 code.

Xin


On 4/7/06, Zach Brown <zab@xxxxxxxxx> wrote:

If a program access data like this:

1. open the file
2. write a lot of data into this file

You don't say if this is an extending write or overwriting existing file
data. I'm going to assume extending writes so that data=ordered kicks in.

3. close the file

So my questions are:
1. How will the file system be notified after all data has been
flushed into disk?

Look at phase 2 in journal_commit_transaction(). The kjournald thread
issues the writeback of the file data by walking t_sync_datalist and
then waits for the writeback to complete by using wait_on_buffer()
before committing the transaction.

2. Unlike data=journal mode, in data=order mode, the data could be
lost if system crashes when data is being flushed to disk. When system
reboots, does journal contains the old meta data for undo?

No, ext3 isn't roll-backward. It doesn't store the *old* data in the
journal and undo the change if it fails halfway through. It's
roll-forward. It stores the *new* data in the journal and replays
complete transactions in the journal that weren't moved out to their
final place on disk at the time of the crash.

So if the machine reboots during the writeback phase then the
transaction won't be committed yet and recovery won't replay that
transaction from the journal. From the metadata's point of view the
file extension will never have happened.

3. Does sys_close() have to be blocked until all data and metadata
are committed?

No, and neither does sys_getpid() :)

to take subsequent operation. However, data flush could be failed. In
this case, file system seems to mislead the application. Is this true?

No. The application has no grounds for assuming that a successful
close() has synced previous operations to disk. It's simply not part of
the API.

If so, any solutions?

The application should rely on tools like fsync(), fdatasync(), O_SYNC,
mount -o sync, etc. Whatever suits it best.

- z

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: How to know when file data has been flushed into disk?
    ... You don't say if this is an extending write or overwriting existing file ... How will the file system be notified after all data has been ... before committing the transaction. ... lost if system crashes when data is being flushed to disk. ...
    (Linux-Kernel)
  • Re: New filesystem for Linux
    ... transaction number n, C and D get a different one, n+1 if nothing else ... And when you sync, with one write of memory_cct to disk, you make old entries permanently invalid and new entries permanently valid. ... transaction without committing the first is not possible. ...
    (Linux-Kernel)
  • disktab for VPC Disk
    ... I'm trying to optimize the disk access of NEXTSTEP on Virtual PC 5.0.4. ... Maximum size of NeXT file system is 2GB. ... programs in the initial 8 sectors of the device. ... The number of cylinders per cylinder group in a file system. ...
    (microsoft.public.mac.virtualpc)
  • Re: [GIT PULL] Ext3 latency fixes
    ... which unplugged the write queue after every page write. ... In guarded mode the on disk i_size is not updated until after the data ... The end_io handler puts the buffer onto the per-sb list of guarded ... One big gotcha is that we starting a transaction while a page is ...
    (Linux-Kernel)
  • Re: partition naming - newbie
    ... hardware address that is assigned to each disk, tape, or CD-ROM. ... Using the prtconf command ... In addition to managing these directories, the devfsadm command also ... Berkeley fast file system. ...
    (comp.unix.solaris)