[PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

From: Steven Rostedt (rostedt_at_goodmis.org)
Date: 03/18/05

  • Next message: Ingo Molnar: "Re: Real-Time Preemption and RCU"
    Date:	Fri, 18 Mar 2005 04:23:24 -0500 (EST)
    To: Andrew Morton <akpm@osdl.org>
    
    

    Andrew,

    Since I haven't gotten a response from you, I'd figure that you may have
    missed this, since the subject didn't change. So I changed the subject to
    get your attention, and I've resent this. Here's the patch to get rid of
    the the lame schedule that was in fs/jbd/commit.c. Let me know if this
    patch is appropriate.

    Thanks,

    -- Steve

    On Thu, 17 Mar 2005, Steven Rostedt wrote:

    >
    >
    > On Wed, 16 Mar 2005, Andrew Morton wrote:
    >
    > > > I guess one way to solve this is to add a wait queue here (before
    > > > schedule()), and have the one holding the lock to wake up all on the
    > > > waitqueue when they release it.
    > >
    > > yup. A patch against mainline would be appropriate, please.
    > >
    >
    > Hi Andrew,
    >
    > Here's the patch against 2.6.11. I tested it, by adding (after making the
    > patch) global spinlocks for jbd_lock_bh_state and jbd_lock_bh_journalhead.
    > That way I have same scenerio as with Ingo's kernel, and I turned on
    > NEED_JOURNAL_STATE_WAIT. I'm still running that kernel so it looks like
    > it works. Making those two locks global causes this deadlock on kjournal
    > much quicker, and I don't need to run on an SMP machine (since my SMP
    > machines are currently being used for other tasks).
    >
    > Some comments on my patch. I only implement the wait queue when
    > bit_spin_trylock is an actual lock (thus creating the problem). I didn't
    > want to add this code if it was needed (ie. !(CONFIG_SMP &&
    > CONFIG_DEBUG_SPINLOCKS)). So in bit_spin_trylock, I define
    > NEED_JOURNAL_STATE_WAIT if bit_spin_trylock is really a lock. When
    > NEED_JOURNAL_STATE_WAIT is set, then the wait queue is set up in the
    > journal code.
    >
    > Now the question is, should we make those two locks global? It would help
    > Ingo's cause (and mine as well). But I don't know the impact on a large
    > SMP configuration. Andrew, since you have a 16xSMP machine, could you (if
    > you have time) try out the effect of that. If you do have time, then I'll
    > send you a patch that goes on top of this one to change the two locks into
    > global spin locks.
    >
    > Ingo, where do you want to go from here? I guess we need to wait on what
    > Andrew decides.
    >
    > -- Steve
    >
    >

    diff -ur linux-2.6.11.orig/fs/jbd/commit.c linux-2.6.11/fs/jbd/commit.c
    --- linux-2.6.11.orig/fs/jbd/commit.c 2005-03-02 02:38:25.000000000 -0500
    +++ linux-2.6.11/fs/jbd/commit.c 2005-03-17 03:40:06.000000000 -0500
    @@ -80,15 +80,33 @@

     /*
      * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is
    - * held. For ranking reasons we must trylock. If we lose, schedule away and
    - * return 0. j_list_lock is dropped in this case.
    + * held. For ranking reasons we must trylock. If we lose put ourselves on a
    + * state wait queue and we'll be woken up when it is unlocked. Then we return
    + * 0 to try this again. j_list_lock is dropped in this case.
      */
     static int inverted_lock(journal_t *journal, struct buffer_head *bh)
     {
             if (!jbd_trylock_bh_state(bh)) {
    + /*
    + * jbd_trylock_bh_state always returns true unless CONFIG_SMP or
    + * CONFIG_DEBUG_SPINLOCK, so the wait queue is not needed there.
    + * The bit_spin_locks in jbd_lock_bh_state need to be removed anyway.
    + */
    +#ifdef NEED_JOURNAL_STATE_WAIT
    + DECLARE_WAITQUEUE(wait, current);
                     spin_unlock(&journal->j_list_lock);
    - schedule();
    + add_wait_queue_exclusive(&journal_state_wait,&wait);
    + set_current_state(TASK_UNINTERRUPTIBLE);
    + /* Check to see if the lock has been unlocked in this short time */
    + if (jbd_is_locked_bh_state(bh))
    + schedule();
    + set_current_state(TASK_RUNNING);
    + remove_wait_queue(&journal_state_wait,&wait);
                     return 0;
    +#else
    + /* This should never be hit */
    + BUG();
    +#endif
             }
             return 1;
     }
    diff -ur linux-2.6.11.orig/fs/jbd/journal.c linux-2.6.11/fs/jbd/journal.c
    --- linux-2.6.11.orig/fs/jbd/journal.c 2005-03-02 02:37:49.000000000 -0500
    +++ linux-2.6.11/fs/jbd/journal.c 2005-03-17 03:47:40.000000000 -0500
    @@ -80,6 +80,11 @@
     EXPORT_SYMBOL(journal_try_to_free_buffers);
     EXPORT_SYMBOL(journal_force_commit);

    +#ifdef NEED_JOURNAL_STATE_WAIT
    +EXPORT_SYMBOL(journal_state_wait);
    +DECLARE_WAIT_QUEUE_HEAD(journal_state_wait);
    +#endif
    +
     static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

     /*
    diff -ur linux-2.6.11.orig/include/linux/jbd.h linux-2.6.11/include/linux/jbd.h
    --- linux-2.6.11.orig/include/linux/jbd.h 2005-03-02 02:38:19.000000000 -0500
    +++ linux-2.6.11/include/linux/jbd.h 2005-03-17 03:48:18.000000000 -0500
    @@ -324,6 +324,20 @@
             return bh->b_private;
     }

    +#ifdef NEED_JOURNAL_STATE_WAIT
    +/*
    + * The journal_state_wait is a wait queue that tasks will wait on
    + * if they fail to get the jbd_lock_bh_state while holding the j_list_lock.
    + * Instead of spinning on schedule, the task now adds itself to this wait queue
    + * and will be woken up when the jbd_lock_bh_state is released.
    + *
    + * Since the bit_spin_locks are only locks under CONFIG_SMP and
    + * CONFIG_DEBUG_SPINLOCK, this wait queue is only needed in those
    + * cases.
    + */
    +extern wait_queue_head_t journal_state_wait;
    +#endif
    +
     static inline void jbd_lock_bh_state(struct buffer_head *bh)
     {
             bit_spin_lock(BH_State, &bh->b_state);
    @@ -342,6 +356,13 @@
     static inline void jbd_unlock_bh_state(struct buffer_head *bh)
     {
             bit_spin_unlock(BH_State, &bh->b_state);
    +#ifdef NEED_JOURNAL_STATE_WAIT
    + /*
    + * There may be a task sleeping, and waiting to be woken up
    + * when this is unlocked.
    + */
    + wake_up(&journal_state_wait);
    +#endif
     }

     static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
    diff -ur linux-2.6.11.orig/include/linux/spinlock.h linux-2.6.11/include/linux/spinlock.h
    --- linux-2.6.11.orig/include/linux/spinlock.h 2005-03-02 02:38:09.000000000 -0500
    +++ linux-2.6.11/include/linux/spinlock.h 2005-03-17 03:39:13.024466071 -0500
    @@ -527,6 +527,9 @@
      *
      * Don't use this unless you really need to: spin_lock() and spin_unlock()
      * are significantly faster.
    + *
    + * FIXME: These are evil and need to be removed. They are currently only
    + * used by the journal code of ext3.
      */
     static inline void bit_spin_lock(int bitnum, unsigned long *addr)
     {
    @@ -557,6 +560,13 @@
     {
             preempt_disable();
     #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
    + /*
    + * This is only used by the journal code of ext3 and if this
    + * is set then we need to tell the journal code that it needs
    + * a wait queue to keep kjournald from spinning on a lock.
    + */
    +#define NEED_JOURNAL_STATE_WAIT
    +
             if (test_and_set_bit(bitnum, addr)) {
                     preempt_enable();
                     return 0;
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Ingo Molnar: "Re: Real-Time Preemption and RCU"

    Relevant Pages