Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

From: Christopher Li (ext3-user_at_chrisli.org)
Date: 10/27/03

  • Next message: Bartlomiej Zolnierkiewicz: "Re: Patch for Promise PDC20276"
    Date:	Mon, 27 Oct 2003 01:20:02 -0500
    To: Neil Brown <neilb@cse.unsw.edu.au>
    
    

    On Mon, Oct 27, 2003 at 05:06:16PM +1100, Neil Brown wrote:
    >
    > The related kjournald is at:
    > kjournald Call Trace: [sleep_on+75/124]
    > [journal_commit_transaction+357/4044] [do_IRQ+221/236]
    > [.text.lock.sched+131/471] [kjournald+326/540]
    > [commit_timeout+0/12] [arch_kernel_thread+40/56]
    >
    > This sleep_on is at line 87 in commit.c (journal_commit_transaction)
    > where it is waiting for t_updates to be 0. At this point,
    > t_state is T_LOCKED, so presumably those nfsd threads above are
    > waiting on kjournald. But what is kjournald really waiting for?

    kjournald is wait for the current pending transaction to stop.

    > My first though was the two nfsd threads in:
    > nfsd Call Trace: [sleep_on+75/124]
    > [log_wait_commit+74/136] [journal_stop+408/432]
    > [journal_force_commit+78/128] [ext3_force_commit+66/112]
    > [ext3_sync_file+128/144] [nfsd_sync_dir+49/72]
    > [nfsd_unlink+455/480] [nfsd_proc_remove+122/140]
    > [nfsd_dispatch+207/406] [svc_process+655/1264]
    > [nfsd+566/944] [arch_kernel_thread+40/56]
    >
    > that are waiting on j_wait_done_commit. However they are doing that
    > from journal_stop *after* journal_stop has decremented t_updates, so
    > it doesn't seem likely that kjournald is waiting on that.

    That is right.

    >
    > Outside of nfsd, there is an rquotad program (locally written, not the
    > standard one) that is :
    >
    > rquotad Call Trace: [sleep_on+75/124]
    > [start_this_handle+205/368] [journal_start+149/196]
    > [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
    > [update_atime+75/80] [do_generic_file_read+1158/1172]
    > [generic_file_read+147/400] [file_read_actor+0/224]
    > [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
    > [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
    > [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
    > [do_compat_quotactl+417/688] [resolve_dev+89/108]
    > [sys_quotactl+166/275] [system_call+51/56]
    >
    > So it is trying to start a transaction to update the atime on the
    > quota file, and has a lock on some quota structures thanks to
    > "read_dqblk".

    This guy is waiting the journal commit to be finished, seems harmless
    to me.

    >
    > At the same time, "sync" is running:
    >
    > sync Call Trace: [__down+109/208] [__down_failed+8/12]
    > [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
    > [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
    > [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56]
    >
    > and has started an ext3 transaction (in ext3_sync_dquot) and is trying
    > to get the lock that rquotad has.

    That seems wrong to me. It should get the lock before it start the
    transasction. For the same reason that you can't lock_page inside
    journal transasction, it is a ranking error. BTW, it seems that
    current bk tree, truncate still do lock_page inside journal
    transasction.

    >
    > Presumably the transaction that sync has started is keeping t_updates
    > greater than 0, thus preventing kjournald from progressing, and this
    > preventing anyone else, including rquotad, from starting a new
    > transaction. Hence a deadlock.

    That is right.

    >
    > My guess is that ext3_sync_dquot doesn't need ext3_journal_start at
    > all but that isn't a well-informed guess.

    I think you want to put ext3_sync_dquot to be atomic on power failure.
    The journal handle can get from ext3_current_journal_handle, which
    used by writepage etc.

    Chris
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Bartlomiej Zolnierkiewicz: "Re: Patch for Promise PDC20276"

    Relevant Pages

    • EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?
      ... The related kjournald is at: ... where it is waiting for t_updates to be 0. ... Outside of nfsd, there is an rquotad program (locally written, not the ... Presumably the transaction that sync has started is keeping t_updates ...
      (Linux-Kernel)
    • Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
      ... I have marked the kjournald processes on my system ... waiting due to the IO scheduler. ... This can cause a lot of low IO (high IO priority) processes ... to indirectly get stuck behind a few high IO (low priority) ...
      (Linux-Kernel)
    • Unable to disconnect internet
      ... Timeout (30000 milliseconds) waiting for a transaction ... response from the ShellHWDetection service. ...
      (microsoft.public.windowsxp.network_web)
    • Re: Connection not timing out
      ... Are you waiting for the connection to close ... > I am using Connection.IsolationLevel set to adXactSerializable to achieve> this since I found no other way to do this (SQL server lock hints are not> supported by my provider). ... > My problem is that the waiting process seems to wait indefinitely no matter> what setting I use for ConnectionTimeout or the CommandTimeout. ... I am using> Connection.Execute to retrieve this mutex row at the beginning of the> transaction and hold it until the end of the transaction. ...
      (microsoft.public.data.ado)
    • Re: Lock Fairness
      ... waiting for lock) I was asked which of the TX waiting for a lock will ... I tried Concepts Guide, ... transaction obtaining the lock once the existing lock is released. ...
      (comp.databases.oracle.server)

    Loading