Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()



On Thu, Oct 18, 2007 at 02:57:59PM -0400, George G. Davis wrote:
On Wed, Oct 17, 2007 at 02:51:57PM -0400, George G. Davis wrote:
---
Not sure if this is the correct fix but it does resolve the hangs we're
observing in posix_locks_deadlock().

Please disregard the previous patch, it's not quite right - causes occasional
segfaults and clearly did not retain the posix_same_owner() checks implemented
in the original code. Here's a new version which I believe retains the
intent of the original code:

diff --git a/fs/locks.c b/fs/locks.c
index 7f9a3ea..e012b27 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -702,14 +702,12 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
{
struct file_lock *fl;

-next_task:
if (posix_same_owner(caller_fl, block_fl))
return 1;
list_for_each_entry(fl, &blocked_list, fl_link) {
if (posix_same_owner(fl, block_fl)) {
- fl = fl->fl_next;
- block_fl = fl;
- goto next_task;
+ if (posix_same_owner(caller_fl, fl))
+ return 1;
}
}
return 0;

It may take multiple steps to identify a deadlock. With the above
you'll miss deadlocks like

process 1 is requesting a lock held by process 2
process 2 is blocking on a lock held by process 3
process 3 is blocking on a lock held by process 1.

Could you give more details about how you're causing
posix_locks_deadlock to hang? Is there a simple test-case you can post?

--b.



I'm not sure about those "fl = fl->fl_next; block_fl = fl;" statements,
first, the order of those statements seems reversed to me. Otherwise,
I think the intent was to advance the "fl" for loop variable to the next
entry in the list but it doesn't work out that way at all - the for
loop restarts from the beginning - this is where we get into an
infinite loop condition. Whether the test case I posted before is
valid or not, I reckon it shouldn't be possible for non-root Joe user
to contrive a test case which can hang the system as we're observing
with that test case. The above patch fixes the hang.

Comments greatly appreciated...

--
Regards,
George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [RFC][PATCH] Fix hang in posix_locks_deadlock()
    ... observing in posix_locks_deadlock. ... I think the intent was to advance the "fl" for loop variable to the next ... The above patch fixes the hang. ...
    (Linux-Kernel)
  • Re: rpc.lockd brokenness (2)
    ... I'm unable to verify if the patch ... I do have a lockd hang, ... Can anybody else verify if the system hangs on pidfile_open when: ...
    (freebsd-stable)
  • Re: Better news
    ... SO hang in there. ... I quit long-term once before for 5 yrs using the patch. ... Been on the verge of quitting again for awhile now (aren't we ... DO need to clean for other reasons though! ...
    (alt.support.arthritis)
  • Re: Better news
    ... SO hang in there. ... I quit long-term once before for 5 yrs using the patch. ... Been on the verge of quitting again for awhile now (aren't we ... Mike has no school today and I'm taking the day off to do the floor. ...
    (alt.support.arthritis)
  • Re: 2.6.23-rc6-mm1 -- mkfs stuck in D
    ... that patch does indeed seem to have fixed it. ... there does appear to be some other occasional slight slowdown - ... mm per-device dirty threshold fix ... Fix occasional hang when a task couldn't get out of balance_dirty_pages: ...
    (Linux-Kernel)