Re: About a change to the implementation of spin lock in 2.6.12 kernel.

multisyncfe991_at_hotmail.com
Date: 07/14/05

  • Next message: Elias Kesh: "[PATCH] RealTimeSync Patch"
    To: <linux-kernel@vger.kernel.org>
    Date:	Thu, 14 Jul 2005 09:21:51 -0700
    
    

    Hi Willy,

    I think at least I can remove the LOCK instruction when the lock is already
    held by someone else and enter the spinning wait directly, right?
    0: cmpb $0, slp

        jle 2f # lock is not available, then spinning
    directly without locking the bus

    1: lock; decb slp # lock the bus and atomically decrement

        jns 3f # if clear sign bit jump forward to 3

    2: pause # spin - wait

        cmpb $0,slp # spin - compare to 0

        jle 2b # spin - go back to 2 if <= 0 (locked)

        jmp 1b # unlocked; go back to 1 to try to lock again

    3: # we have acquired the lock .

    But based on the Lockmeter report, the lock success is dominant 99.8%, so
    maybe this will not make much change.
    Thanks,

    Liang

    ----- Original Message -----
    From: "Willy Tarreau" <willy@w.ods.org>
    To: <multisyncfe991@hotmail.com>
    Cc: <linux-kernel@vger.kernel.org>
    Sent: Wednesday, July 13, 2005 10:16 PM
    Subject: Re: About a change to the implementation of spin lock in 2.6.12
    kernel.

    > Hi,
    >
    > On Wed, Jul 13, 2005 at 07:20:06PM -0700, multisyncfe991@hotmail.com
    > wrote:
    >> Hi,
    >>
    >> I found _spin_lock used a LOCK instruction to make the following
    >> operation "decb %0" atomic. As you know, LOCK instruction alone takes
    >> almost 70 clock cycles to finish and this add lots of cost to the
    >> _spin_lock. However _spin_unlock does not use this LOCK instruction and
    >> it uses "movb $1,%0" instead since 4-byte writes on 4-byte aligned
    >> addresses are atomic.
    >
    > _spin_unlock does not need locked operations because when it is run, the
    > code is already known to be the only one to hold the lock, so it can
    > release it without checking what others do.
    >
    >> So I want rewrite the _spin_lock defined spinlock.h
    >> (/linux/include/asm-i386) as follows to reduce the overhead of _spin_lock
    >> and make it more efficient.
    >
    > It does not work. You cannot write an inter-cpu atomic test-and-set with
    > several unlocked instructions.
    >
    >> #define spin_lock_string \
    >> "\n1:\t" \
    >> "cmpb $0,%0\n\t" \
    >> "jle 2f\n\t" \
    >
    > ==> here, another thread or CPU can get the lock simultaneously.
    >
    >> "movb $0, %0\n\t" \
    >> "jmp 3f\n" \
    >> "2:\t" \
    >> "rep;nop\n\t" \
    >> "cmpb $0, %0\n\t" \
    >> "jle 2b\n\t" \
    >> "jmp 1b\n" \
    >> "3:\n\t"
    >>
    >> Compared with the original version as follows, LOCK instruction is
    >> removed. I rebuilt the Intel e1000 Gigabit driver with this _spin_lock.
    >> There is about 2% throughput improvement.
    >> #define spin_lock_string \
    >> "\n1:\t" \
    >> "lock ; decb %0\n\t" \
    >> "jns 3f\n" \
    >> "2:\t" \
    >> "rep;nop\n\t" \
    >> "cmpb $0,%0\n\t" \
    >> "jle 2b\n\t" \
    >> "jmp 1b\n" \
    >> "3:\n\t"
    >>
    >> Do you think I can get a better performance if I dig further?
    >>
    >> Any ideas will be greatly appreciated,
    >
    > well, of course with those methods you can improve performance, but you
    > lose the warranty that you're alone to get a lock, and that's bad.
    >
    > another similar method to get a lock in some very controlled environment
    > is as follows :
    >
    > 1: cmp $0, %0
    > jne 1b
    > mov $CPUID, %0
    > membar
    > cmp $CPUID, %0
    > jne 1b
    >
    > This only works with same speed CPUs and interrupts disabled. But in
    > todays
    > environments, this is very risky (hyperthreaded CPUs, etc...). However,
    > this
    > is often OK for more deterministic CPUs such as microcontrollers.
    >
    > Regards,
    > Willy
    >
    > -
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Elias Kesh: "[PATCH] RealTimeSync Patch"

    Relevant Pages

    • RE: 4.7 vs 5.2.1 SMP/UP bridging performance
      ... > lock is contested. ... > bus for the SMP case always costs about 20+ cycles, ... > resolution profiling is used, ... This means that on the Xeon, each lock instruction take 120 cycles! ...
      (freebsd-current)
    • Re: About a change to the implementation of spin lock in 2.6.12 kernel.
      ... > I think at least I can remove the LOCK instruction when the lock is already ... send the line "unsubscribe linux-kernel" in ... Please read the FAQ at http://www.tux.org/lkml/ ...
      (Linux-Kernel)
    • [PATCH] NULL pointer deref in tcp_do_twkill_work()
      ... Shouldn't the loop always restart from the beginning instead of using the ... The alternative is to not drop the lock, but I'm guessing we need to do ... Proposed patch is attached. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07
      ... With kjournald spinning on a bit lock until it finishes it's ... Although it only deadlocks on your system if it was a real-time task, ... highest priority task on the system, ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [RFC][PATCH] O(1) Entitlement Based Scheduler
      ... >> of a single CPU's capacity that the task may use. ... The task is removed from the runqueue and a timer is scheduled to put it ... > A holds lock that can not be unlocked, ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)