Re: [PATCH 1/1] (v3) SYSVIPC - Fix the ipc structures initialization



On Mon, Oct 27, 2008 at 4:42 PM, Nadia Derbey <Nadia.Derbey@xxxxxxxx> wrote:
On Mon, 2008-10-27 at 12:04 +0100, Nadia Derbey wrote:
On Mon, 2008-10-27 at 11:32 +0100, cboulte@xxxxxxxxx wrote:
On Mon, Oct 27, 2008 at 8:28 AM, <Nadia.Derbey@xxxxxxxx> wrote:

This patch is a fix for Bugzilla bug
http://bugzilla.kernel.org/show_bug.cgi?id=11796.

To summarize, a simple testcase is concurrently running an infinite loop on
msgctl(IPC_STAT) and a call to msgget():

while (1)
msgctl(id, IPC_STAT) 1
|
|
|
2 id = msgget(key, IPC_CREAT)
|
|
|

In the interval [1-2], the id doesn't exist yet.

In that test, the problem is the following:
When we are calling ipc_addid() from msgget() the msq structure is not
completely initialized. So idr_get_new() is inserting a pointer into the
idr tree, and the structure which is pointed to has, among other fields,
its lock uninitialized.

Since msgctl(IPC_STAT) is looping while (1), idr_find() returns the
pointer as soon as it is inserted into the IDR tree. And ipc_lock()
calls spin_lock(&mqs->lock), while we have not initialized that lock
yet.

This patch moves the spin_lock_init() before the call to ipc_addid().
It also sets the "deleted" flag to 1 in the window between msg structure
allocation and msg structure locking in ipc_addid().


Regards,
Nadia


Signed-off-by: Nadia Derbey <Nadia.Derbey@xxxxxxxx>

---
ipc/util.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

Index: linux-2.6.27/ipc/util.c
===================================================================
--- linux-2.6.27.orig/ipc/util.c 2008-10-23 15:20:46.000000000 +0200
+++ linux-2.6.27/ipc/util.c 2008-10-24 17:48:33.000000000 +0200
@@ -266,6 +266,17 @@ int ipc_addid(struct ipc_ids* ids, struc
if (ids->in_use >= size)
return -ENOSPC;

+ spin_lock_init(&new->lock);
+
+ /*
+ * We have a window between the time new is inserted into the idr
+ * tree and the time it is actually locked.
+ * In order to be safe during that window set the new ipc structure
+ * as deleted: a concurrent ipc_lock() will see it as not present
+ * until the initialization phase is complete.
+ */
+ new->deleted = 1;
+
err = idr_get_new(&ids->ipcs_idr, new, &id);
if (err)
return err;
@@ -280,10 +291,11 @@ int ipc_addid(struct ipc_ids* ids, struc
ids->seq = 0;

new->id = ipc_buildid(id, new->seq);
- spin_lock_init(&new->lock);
- new->deleted = 0;
rcu_read_lock();
spin_lock(&new->lock);
+
+ new->deleted = 0;
+
return id;
}


--


Still got the lock... I'm using a 4 cpus node: Intel Xeon @ 2.8GHz...
don't know if it has an impact.
???
The bad new, is that it becomes unreprodicible on my side.
For my part, I've got 2 2.8 GHz Xeon CPUs.

Will review the code once more.

Thanks!
Nadia

The only way I found to have no lock, it's to spin_lock the ipc
_before_ inserting it into the idr.

Best regards, c.


I agree with you that it's more logical and correct to take the lock
before inserting the ipc structure (i.e. making it visible to readers).

But I wanted to understand what's wrong with
1. new->lock init
2. new->deleted = 1
3. insert(new)

I've been looking at the code again and again and the only thing I see
could have happened, is that instructions have been reordered and the
insertion done before the lock actually being initialized.
This means that a memory barrier is missing (this would explain why your
fix works: the spin_lock acts as a barrier).
But this memory barrier is supposed to be invoked by
rcu_assign_pointer() in idr_get_new(). So may be there's a problem with
the idr code.
Before going into a review of this code, I'd like to confirm what I'm
saying, doing the following (I'm sorry to ask you do it, but I can't
reproduce the problem in my side anymore): would you mind adding a
smp_wmb() just before the idr_get_new in ipc_addid() and tell me if this
solves the problem.
(BTW, I didn't ask you before, but I guess you're getting the same call
trace?)

Regards,
Nadia

--
Nadia Derbey <Nadia.Derbey@xxxxxxxx>



I tried this patch:
Index: bug-sysv/ipc/util.c
===================================================================
--- bug-sysv.orig/ipc/util.c 2008-10-27 09:21:44.000000000 +0100
+++ bug-sysv/ipc/util.c 2008-10-27 19:04:33.000000000 +0100
@@ -266,6 +266,19 @@ int ipc_addid(struct ipc_ids* ids, struc
if (ids->in_use >= size)
return -ENOSPC;

+ spin_lock_init(&new->lock);
+
+ /*
+ * We have a window between the time new is inserted into the idr
+ * tree and the time it is actually locked.
+ * In order to be safe during that window set the new ipc structure
+ * as deleted: a concurrent ipc_lock() will see it as not present
+ * until the initialization phase is complete.
+ */
+ new->deleted = 1;
+
+ smp_wmb();
+
err = idr_get_new(&ids->ipcs_idr, new, &id);
if (err)
return err;
@@ -280,10 +293,11 @@ int ipc_addid(struct ipc_ids* ids, struc
ids->seq = 0;

new->id = ipc_buildid(id, new->seq);
- spin_lock_init(&new->lock);
- new->deleted = 0;
rcu_read_lock();
spin_lock(&new->lock);
+
+ new->deleted = 0;
+
return id;
}

And got the lock (the node is still usuable but I guess it's because
only 1 cpu out of 4 is locked):

[ 400.393024] INFO: trying to register non-static key.
[ 400.397005] the code is fine but needs lockdep annotation.
[ 400.397005] turning off the locking correctness validator.
[ 400.397005] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1
[ 400.397005]
[ 400.397005] Call Trace:
[ 400.397005] [<ffffffff80257055>] static_obj+0x60/0x77
[ 400.397005] [<ffffffff8025af59>] __lock_acquire+0x1c8/0x779
[ 400.397005] [<ffffffff8025b59f>] lock_acquire+0x95/0xc2
[ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 400.397005] [<ffffffff8045117d>] _spin_lock+0x2d/0x5a
[ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 400.397005] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
[ 400.397005] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
[ 400.397005] [<ffffffff803002c3>] sys_msgctl+0x188/0x461
[ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
[ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
[ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 400.397005] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
[ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 400.397005] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
[ 400.397005]
[ 464.933003] BUG: soft lockup - CPU#2 stuck for 61s! [sysv_test2:4207]
[ 464.933006] Modules linked in: ipv6 nfs lockd nfs_acl sunrpc button
battery ac loop dm_mod md_mod usbkbd usbhid hid ff_memless mptctl
evdev tg3 libphy iTCO_wdt e752x_edac edac_core uhci_hcd rng_core
shpchp i2c_i801 pci_hotplug i2c_core ehci_hcd reiserfs edd fan thermal
processor thermal_sys mptspi mptscsih sg mptbase scsi_transport_spi
sr_mod cdrom ata_piix libata dock sd_mod scsi_mod [last unloaded:
freq_table]
[ 464.933006] irq event stamp: 2136363
[ 464.933006] hardirqs last enabled at (2136363):
[<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 464.933006] hardirqs last disabled at (2136361):
[<ffffffff8023ea01>] __do_softirq+0xa3/0xf7
[ 464.933006] softirqs last enabled at (2136362):
[<ffffffff8020d9bc>] call_softirq+0x1c/0x28
[ 464.933006] softirqs last disabled at (2136357):
[<ffffffff8020d9bc>] call_softirq+0x1c/0x28
[ 464.933006] CPU 2:
[ 464.933006] Modules linked in: ipv6 nfs lockd nfs_acl sunrpc button
battery ac loop dm_mod md_mod usbkbd usbhid hid ff_memless mptctl
evdev tg3 libphy iTCO_wdt e752x_edac edac_core uhci_hcd rng_core
shpchp i2c_i801 pci_hotplug i2c_core ehci_hcd reiserfs edd fan thermal
processor thermal_sys mptspi mptscsih sg mptbase scsi_transport_spi
sr_mod cdrom ata_piix libata dock sd_mod scsi_mod [last unloaded:
freq_table]
[ 464.933006] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1
[ 464.933006] RIP: 0010:[<ffffffff8033dc6b>] [<ffffffff8033dc6b>]
_raw_spin_lock+0x98/0x100
[ 464.933006] RSP: 0018:ffff880145473e48 EFLAGS: 00000206
[ 464.933006] RAX: 00000000000000cb RBX: 000000001830d3f9 RCX:
00000000ffffffff[ 464.933006] RDX: 0000018500000000 RSI:
ffffffff8053d176 RDI: 0000000000000001[ 464.933006] RBP:
0000000000000000 R08: 0000000000000002 R09: 0000000000000000[
464.933006] R10: 0000000000000000 R11: ffffffff8033a6fe R12:
0000000000000000[ 464.933006] R13: ffffffff8033a6fe R14:
ffffffff8020c7ee R15: 0000000000000002[ 464.933006] FS:
00007f40899b86d0(0000) GS:ffff88014707f508(0000)
knlGS:0000000000000000
[ 464.933006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 464.933006] CR2: 00007f408974aae0 CR3: 0000000143003000 CR4:
00000000000006e0[ 464.933006] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000[ 464.933006] DR3:
0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400[
464.933006]
[ 464.933006] Call Trace:
[ 464.933006] [<ffffffff8033dc6b>] _raw_spin_lock+0x98/0x100
[ 464.933006] [<ffffffff8045119e>] _spin_lock+0x4e/0x5a
[ 464.933006] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 464.933006] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 464.933006] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
[ 464.933006] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
[ 464.933006] [<ffffffff803002c3>] sys_msgctl+0x188/0x461
[ 464.933006] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
[ 464.933006] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 464.933006] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
[ 464.933006] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 464.933006] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 464.933006] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
[ 464.933006] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 464.933006] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
[ 464.933006]

I checked it with two different distributions: Debian Lenny and Sles 10 SP 1.

Regards, c.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH 1/1] (v3) SYSVIPC - Fix the ipc structures initialization
    ... a simple testcase is concurrently running an infinite loop on ... So idr_get_newis inserting a pointer into the ... This patch moves the spin_lock_initbefore the call to ipc_addid. ... return err; ...
    (Linux-Kernel)
  • [PATCH] ipv6: get rid of ipip6_prl_lock
    ... That use of p looks like it is completely local to the for loop, ... not a full and tested patch:) ... could exit with success without the kzalloc ever being called. ... return err; ...
    (Linux-Kernel)
  • [x86_64 MCE] [RFC] mce.c race condition (or: when evil hacks are the only options)
    ... The race requires a large number of machine checks to be occurring in order ... In the normal case, the rest would get cleaned up by the subsequent loop, ... fact is waiting for all CPUs to be done, which could take up to a tick -- or ... I've come up with a patch that does this, ...
    (Linux-Kernel)
  • Re: Problem with inotify
    ... > Thanks for writing that patch, ... > inotify-test before unmounting results in a clean unmount. ... My analysis was that there is an infinite loop and this is what ... > loop when unmounting with inotify watches active. ...
    (Linux-Kernel)
  • Re: [PATCH 00/17] per device dirty throttling -v7
    ... Patches are against 2.6.22-rc4-mm2 ... writes in a setup involving a loop dev ... ext3 over loop over ext3 ... Without the patch, in all the cases I've seen deadlocks or long ...
    (Linux-Kernel)