Re: [RFC] ext3/jbd race: releasing in-use journal_heads

From: Jan Kara (jack_at_suse.cz)
Date: 03/09/05

  • Next message: Mauricio Lin: "Re: oom with 2.6.11"
    Date:	Wed, 9 Mar 2005 14:28:38 +0100
    To: "Stephen C. Tweedie" <sct@redhat.com>
    
    

    > On Tue, 2005-03-08 at 15:12, Jan Kara wrote:
    >
    > > Isn't also the following scenario dangerous?
    > >
    > > __journal_unfile_buffer(jh);
    > > journal_remove_journal_head(bh);
    >
    > It depends. I think the biggest problem here is that there's really no
    > written rule protecting this stuff universally. But in testing, we just
    > don't see this triggering.
    >
    > Why not? We actually have a raft of other locks protecting us in
    > various places. As long as there's some overlap in the locks, we're
    > OK. But because it's not written down, not everybody relies on the same
    > set of locks; it's only because journal_unmap_buffer() was dropping the
    > jh without enough locks that we saw a problem there.
    >
    > So the scenario:
    >
    > __journal_unfile_buffer(jh);
    > journal_remove_journal_head(bh);
    >
    > *might* be dangerous, if called carelessly; but in practice it works out
    > OK. Remember, that journal_remove_journal_head() call still takes the
    > bh_journal_head lock and still checks b_transaction with that held.
      Hmm. I see for example a place at jbd/commit.c, line 287 (which you
    did not change in your patch) which does this and doesn't seem to be
    protected against journal_unmap_buffer() (but maybe I miss something).
    Not that I'd find that race probable but in theory...

    > I think it's time I went and worked out *why* it works out OK, though,
    > so that we can write it down and make sure it stays working! And we may
    > well be able to simplify the locking when we do this; collapsing the two
    > bh state locks, for example, may help improve the robustness as well as
    > improving performance through removing some redundant locking
    > operations.
    >
    > Fortunately, there are really only three classes of operation that can
    > remove a jh. There are the normal VFS/VM operations, like writepage,
    > try_to_free_buffer, etc; these are all called with the page lock. There
    > are metadata updates, which take a liberal dose of buffer, bh_state and
    > journal locks.
    >
    > Finally there are the commit/checkpoint functions, which run
    > asynchronously to the rest of the journaling and which don't relate to
    > the current transaction, but to previous ones. This is the danger area,
    > I think. But as long as at least the bh_state lock is held, we can
    > protect these operations against the data/metadata update operations.
      I agree that only the metadata/data updates can race with checkpoint/commit
    because other combinations are already serialized by 'higher-level'
    locks. But I think we agree that the simplier (and 'local') the locking rules
    are the less probable the bugs are..

                                                                            Honza

    -- 
    Jan Kara <jack@suse.cz>
    SuSE CR Labs
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Mauricio Lin: "Re: oom with 2.6.11"

    Relevant Pages

    • Re: [RFC] ext3/jbd race: releasing in-use journal_heads
      ... I think the biggest problem here is that there's really no ... written rule protecting this stuff universally. ... As long as there's some overlap in the locks, ...
      (Linux-Kernel)
    • Re: [PATCH] x86: sparse_irq need spin_lock in alloc
      ... It would be nice to add a comment explaining what they are protecting, ... ok - i moved the locks next to the data structure they protect (the free ... int count = 0; ... while (cfg) { ...
      (Linux-Kernel)
    • Re: deadlocks
      ... Most software I've worked on is not multiple identical threads working with the same locks. ... modify A 2.acquire modify B 3.acquire modify C 1.release at this point another thread can modify A <- potential damage 3.release 2.release ... The potential damage is because the A should be protected. ... But 3 and 2 are not protecting A. ...
      (comp.lang.java.programmer)
    • Re: deadlocks
      ... >complex than several identical threads using the same locks. ... But 3 and 2 are not protecting A. ... I wouldn't call that a problem with releasing ...
      (comp.lang.java.programmer)
    • Re: Problems WITH IPAQ 6340 (6300 series)
      ... Mine locks up but not with this frequency - but certainly at least once a ... My biggest problem is that people say that the call quality is really poor ... > Ben Blaney ...
      (microsoft.public.pocketpc.phone_edition)