[2.6.7-bk patch] Update Documentation/filesystems/Locking

From: Anton Altaparmakov (aia21_at_cam.ac.uk)
Date: 06/09/04

  • Next message: Andi Kleen: "Re: Announce: kdb v4.4 x86-64 updates for for kernel 2.6.6"
    Date:	Wed, 9 Jun 2004 10:12:08 +0100 (BST)
    To: Andrew Morton <akpm@osdl.org>, Linus Torvalds <torvalds@osdl.org>
    
    

    Hi Andrew, hi Linus,

    As I discovered while working on NTFS and as agreed by Andrew, a
    filesystem's ->writepage() implementation nowadays must run either
    redirty_page_for_writepage() or the combination of set_page_writeback()/
    end_page_writeback(). Failure to do so leaves the page itself marked
    clean but it is tagged as dirty in the radix tree (PAGECACHE_TAG_DIRTY).
    This incoherency can lead to all sorts of hard-to-debug problems in the
    filesystem like having dirty inodes at umount and losing written data.

    Please apply the below patch which updates
    Documentation/filesystems/Locking to reflect this requirement.

    Best regards,

            Anton

    -- 
    Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
    Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
    Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
    WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
    --- bklinux-2.6/Documentation/filesystems/Locking.old	2004-06-09 09:34:23.808663656 +0100
    +++ bklinux-2.6/Documentation/filesystems/Locking	2004-06-09 09:57:52.315538064 +0100
    @@ -203,20 +203,34 @@ currently-in-progress I/O.
     If the filesystem is not called for "sync" and it determines that it
     would need to block against in-progress I/O to be able to start new I/O
    -against the page the filesystem shoud redirty the page (usually with
    -__set_page_dirty_nobuffers()), then unlock the page and return zero.
    +against the page the filesystem should redirty the page with
    +redirty_page_for_writepage(), then unlock the page and return zero.
     This may also be done to avoid internal deadlocks, but rarely.
     If the filesytem is called for sync then it must wait on any
     in-progress I/O and then start new I/O.
     The filesystem should unlock the page synchronously, before returning
    -to the caller.  If the page has write I/O underway against it,
    -writepage() should run SetPageWriteback() against the page prior to
    -unlocking it.  The write I/O completion handler should run
    -end_page_writeback() against the page.
    +to the caller.
    -That is: after 2.5.12, pages which are under writeout are *not* locked.
    +Unless the filesystem is going to redirty_page_for_writepage(), unlock the page
    +and return zero, writepage *must* run set_page_writeback() against the page,
    +followed by unlocking it.  Once set_page_writeback() has been run against the
    +page, write I/O can be submitted and the write I/O completion handler must run
    +end_page_writeback() once the I/O is complete.  If no I/O is submitted, the
    +filesystem must run end_page_writeback() against the page before returning from
    +writepage.
    +
    +That is: after 2.5.12, pages which are under writeout are *not* locked.  Note,
    +if the filesystem needs the page to be locked during writeout, that is ok, too,
    +the page is allowed to be unlocked at any point in time between the calls to
    +set_page_writeback() and end_page_writeback().
    +
    +Note, failure to run either redirty_page_for_writepage() or the combination of
    +set_page_writeback()/end_page_writeback() on a page submitted to writepage
    +will leave the page itself marked clean but it will be tagged as dirty in the
    +radix tree.  This incoherency can lead to all sorts of hard-to-debug problems
    +in the filesystem like having dirty inodes at umount and losing written data.
     	->sync_page() locking rules are not well-defined - usually it is called
     with lock on page, but that is not guaranteed. Considering the currently
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Andi Kleen: "Re: Announce: kdb v4.4 x86-64 updates for for kernel 2.6.6"

    Relevant Pages

    • Re: get_fs_excl/put_fs_excl/has_fs_excl
      ... I/O request is processed? ... I tend to think of it more as an I/O priority issue, and specifically, ... filesystem in a ionice'd process, I could imagine a badly written ... FREEZE ioctl, where the filesystem has been frozen --- the very ...
      (Linux-Kernel)
    • Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex
      ... I'd really prefer it to be fixed by 'freezeable workqueues'. ... I'd prefer that you just freeze the filesystem and let the ... its the I/O completion workqueues that are causing problems. ... metadata to disk after the sync. ...
      (Linux-Kernel)
    • Re: new special filesystem for consideration in 2.6/2.7
      ... PRAMFS Overview ... Many embedded systems have a block of non-volatile RAM seperate from ... fast read/write filesystem over this "I/O memory", ...
      (Linux-Kernel)
    • Re: Measuring file copy speed
      ... clock its rate for data transfer. ... sums up all I/O rates in its I/O metrics, but I want to add filesystem ... if you put a UFS filesystem on them. ... Just for reference can you give the url or reference to the ...
      (comp.unix.solaris)
    • Re: Windows and Maildir
      ... Windows NT with NTFS has both file streams and extended ... > NT whose performance with large directories is better than that of FAT. ... And there won't be much of a difference between FAT and NTFS. ... because the problem is in the filesystem format and not in the ...
      (microsoft.public.win32.programmer.kernel)