Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

From: Steven Rostedt (rostedt_at_goodmis.org)
Date: 03/18/05

  • Next message: Robin Holt: "Re: vm_dirty_ratio seems a bit large."
    Date:	Fri, 18 Mar 2005 07:10:12 -0500 (EST)
    To: Andrew Morton <akpm@osdl.org>
    
    

    On Fri, 18 Mar 2005, Andrew Morton wrote:
    > Steven Rostedt <rostedt@goodmis.org> wrote:
    > >
    > > I wanted to keep the wait logic out when it wasn't a problem. Basically,
    > > the problem only occurs when bit_spin_trylock is defined as an actual
    > > trylock. So I put in a define there to enable the wait queues. I didn't
    > > want to waste cycles checking the wait queue in jbd_unlock_bh_state when
    > > there would never be anything on it. Heck, I figured why even have the
    > > wait queue wasting memory if it wasn't needed. So that added the
    > > ifdeffery complexity.
    >
    > No, that code's just a problem. For ranking reasons it's essentially doing
    > this:
    >
    > repeat:
    > cond_resched();
    > spin_lock(j_list_lock);
    > ....
    > if (!bit_spin_trylock(bh)) {
    > spin_unlock(j_list_lock);
    > schedule();
    > goto repeat;
    > }
    >

    Yep, that I understand.

    > Now imagine that some other CPU holds the bit_spin_lock and is spinning,
    > trying to get the spin_lock(). The above code assumes that the schedule()
    > and cond_resched() will take "long enough" for the other CPU to get the
    > spinlock, do its business then release the locks.
    >
    > So all the schedule() is really doing is "blow a few cycles so the other
    > CPU can get in and grab the spinlock". That'll work OK on normal SMP but I
    > suspect that on NUMA setups with really big latencies we could end up
    > starving the other CPU: this CPU would keep on grabbing the lock. It
    > depends on how the interconnect cache and all that goop works.
    >
    > So what to do?
    >
    > One approach would be to spin on the bit_spin_trylock after having dropped
    > j_list_lock. That'll tell us when the other CPU has moved on.
    >

    This is probably the best for mainline, since, as you mentioned, the
    abover code is just bad.

    > Another approach would be to sleep on a waitqueue somewhere. But that
    > means that jbd_unlock_bh_state() needs to do wakeups all the time - costly.
    >

    That's the approach that my patch made.

    > Another approach would be to simply whack an msleep(1) in there. That
    > might be OK - it should be very rare.
    >

    This approach is not much better than the current implementation.

    > Probably the first approach would be the one to use. That's for mainline.
    > I don't know what the super-duper-RT fix would be. Why did we start
    > discussing this anyway?
    >
    > Oh, SCHED_FIFO. kjournald doesn't run SCHED_FIFO, but someone may decide
    > to make it do so. But even then I don't see a problem for the mainline
    > kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from
    > running.
    >

    So this comes down to just a problem with Ingo's PREEPMT_RT. This means
    that the latency of kjournald, even without SCHED_FIFO will be large. If
    it preempts a process that has one of these bit spinlocks, (Ingo's RT
    kernel takes out the preempt_disable in them), then the kjournal thread
    will spin till its quota is free, causing problems for other processes.
    Even a process with a higher priority than kjournal if it blocks on one of
    the other locks that kjournal can have while attempting to get the bit
    locks.

    I know Ingo wants to get his patch eventually into the mainline without
    too much drag. But this problem needs to be solved in the mainline to
    accomplish this.

    What do you recommend?

    -- Steve

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Robin Holt: "Re: vm_dirty_ratio seems a bit large."

    Relevant Pages

    • Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list
      ... Now imagine that some other CPU holds the bit_spin_lock and is spinning, ... do its business then release the locks. ... But even then I don't see a problem for the mainline ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: RFC for a new Scheduling policy/class in the Linux-kernel
      ... at least in terms of supporting priority ... inheritance for pthread mutexes. ... CPU to avoid bus contention on the global spin-lock variable. ... but such global locks had a lot more overhead than ...
      (Linux-Kernel)
    • Re: [PATCH] netfilter: use per-cpu recursive lock (v11)
      ... access and read_can_lockplus per cpu locks in the fastpath? ... this cpu already called local_read_lock, ... void global_write_lock ... atomic operation per lock/unlock pair in fastpath (this was the reason we ...
      (Linux-Kernel)
    • Re: Locking fundamentals
      ... It's a bad idea to use locks if you can achieve the same thing locklessly, ... Simple statistics are about the only place they're safe to use without very careful thinking. ... They can be used instead of locks when you know your data will only be accessed on one CPU, and you only need to protect it from interrupt handlers. ... they work alright if the thread is dedicated to a single purpose and written entirely with that synchronization model in mind, or if the period of pinning is brief. ...
      (freebsd-hackers)
    • Re: [patch 1/4] x86: FIFO ticket spinlocks
      ... only 2 cores (on the same CPU) could get the ... The other 6 cores on the system were ... the locks also drop the locks and do quite a bit of work before taking ... not mean that the process is letting go of the cacheline. ...
      (Linux-Kernel)