Re: 2.6.19 file content corruption on ext3



Linus Torvalds wrote:
[ Replying to myself - a sure sign that I don't get out enough ]

On Sun, 17 Dec 2006, Linus Torvalds wrote:

So I don't actually see any serialization at all that would keep a random page from being paged back in.


We do actually serialize, but we do it _after_ the page has already been mapped back. Ie we do it for the dirty page case at rthe end of do_wp_page() and do_no_page() when we do the "set_page_dirty_balance()", but that's potentially too late - we've already mapped the page read-write into the address space.

I can't see how that's exactly a problem -- so long as the page does not
get reclaimed (it won't, because we have a ref on it) then all that matters
is that the page eventually gets marked dirty.

That said, this means that only threaded apps should ever trigger any problems, which would seem to make it unlikely that this is the issue.

But Andrew: I don't think it's necessarily true that "try_to_free_buffers()" callers have all unmapped the page.

That seems to be true for vmscan.c (ie the shrink_page_list -> try_to_release_page -> try_to_release_buffers callchain), but what about the other callchains (through filesystems, or through "pagevec_strip()" or similar?) That pagevec_strip() is called from shrink_active_list(), I don't see that unmapping the pages..

Right. But would it really matter whether they are currently mapped or
not, given that we agree it may become mapped at any point?

I think the problem Andrew identified is real.

The issue is the disconnect between the pte dirtiness and a filesystem
bringing buffers clean. But I disagree with his fix, because we don't
actually want to just throw out that pte dirtiness information: we're
just trying to get the PG_dirty bit into synch with what the buffers are
telling us, not actually clean or dirty anything, as such.

Can we clear the page dirty bit, then run set_page_dirty afterwards, if
any dirty ptes are found?

The other thing we might be able to do is to skip doing the
clear_page_dirty if the page is uptodate. This feels more hackish but it
might be faster?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: 2.6.19 file content corruption on ext3
    ... On Mon, 18 Dec 2006, Linus Torvalds wrote: ... to re-add the dirty bit. ... Please read the FAQ at http://www.tux.org/lkml/ ...
    (Linux-Kernel)
  • Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
    ... On Wed, 20 Dec 2006, Linus Torvalds wrote: ... You can always dirty them. ... If your point was that the filesystem had better be able to take care of ... The dirty state is being transferred to the new page. ...
    (Linux-Kernel)
  • Re: [PATCH] ppc64: Fix possible race with set_pte on a present PTE
    ... Linus Torvalds wrote: ... Note that the reason we can do the "dirty and accessed bit both ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: VM: Fix nasty and subtle race in shared mmaped page writeback
    ... and I think that at least on SMP, we had a race with another CPU doing the "mark page dirty if it was dirty in the PTE" at the same time. ... Ie the race, I think, existed where that crappy comment was. ... But that much older race would only trigger on SMP. ... Send instant messages to your online friends http://au.messenger.yahoo.com - ...
    (Linux-Kernel)
  • Re: 2.6.19 file content corruption on ext3
    ... Andrew Morton wrote: ... but THE VERY SAME FUNCTION actually very much on purpose DOES drop the dirty bit for when it's not in the page tables. ... If a write-fault races with a read-fault and the write-fault loses, ... Send instant messages to your online friends http://au.messenger.yahoo.com - ...
    (Linux-Kernel)