Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.43-00

From: Steven Rostedt (rostedt_at_goodmis.org)
Date: 04/03/05

  • Next message: Arun Srinivas: "sched /HT processor"
    To: Ingo Molnar <mingo@elte.hu>
    Date:	Sat, 02 Apr 2005 19:06:52 -0500
    
    
    

    On Sat, 2005-04-02 at 17:09 -0500, Steven Rostedt wrote:
    > On Sat, 2005-04-02 at 15:44 -0500, Steven Rostedt wrote:
    > > On Sat, 2005-04-02 at 22:35 +0200, Ingo Molnar wrote:

    > I tried the 2.6.12-rc1-RT-V0.7.43-06 kernel again, and I still have the
    > serial, to do sysrq. The console is using X which locks (even all the
    > ctrl-alt-X functions) and the ssh session that I run the test stops
    > after the processes try to grab the locks. It doesn't reply to ping. But
    > the sysrq from the serial shows constantly:
    >
    > s test3_rt: 2269 [cf304690, 89] (not blocked)
    > s test3_rt: 2270 [cf304050, 88] (not blocked)
    > s test3_rt: 2271 [cef86cb0, 87] (not blocked)
    > s test3_rt: 2272 [cef86670, 86] (not blocked)
    > R test3_rt: 2273 [cf28a050, 85] (not blocked)
    >
    >
    > So it seems that it's in a deadlock somewhere. Since the 43-06 gets much
    > further, this seems to be another problem. I'm not going to look at this
    > problem anymore, since it doesn't show up in the lastest.
    >
    > I'll run a few more tests to see if I can narrow things down on 43-06.

    Once again I spoke too soon. I got the same lockup with 43-06, it just
    took a little more to do so. All I did differently was to put the test
    into a while loop on the command line:

    while : ; do test3_rs ; done

    On the second iteration, it locked up in the same place as 36-02. But
    this is a different issue than what I showed with a NFS mounted system.
    That was a BUG where as this has something in a spinning loop somewhere.
    The attached is the gzipped minicom capture. After the system locked
    up, I was still able to get the serial input and do a few task dumps. I
    did several and it clearly shows that the system is locked. What you
    constantly see is:

    R test3_rt: 2263 [cbd6b9b0, 116] (not blocked)
    S test3_rt: 2264 [cc06d8d0, 89] (not blocked)
    S test3_rt: 2265 [cc06c090, 88] (not blocked)
    R test3_rt: 2266 [cc06ccb0, 87] (not blocked)
    D test3_rt: 2267 [cc1286c0, 86] blocked on: [ce127650] {(struct
    semaphore *)(&inode->i_sem)}
    .. held by: test3_rt: 2268 [ce822760, 85]
    ... acquired at: vfs_readdir+0x53/0xa0
    D test3_rt: 2268 [ce822760, 85] (not blocked)

    Process 2263 started the run and has just released the semaphore, it was
    preempted before getting to the wait and although still running, it is
    not really part of the picture.

    Process 2267 is blocked on 2268 which is sleeping at
    fs/jbd/transaction.c:636 which is in do_get_write_access waiting (I
    believe) for a transaction to finish. Probably for kjournald to run,
    since it is also in the running state.

    But the problem seems to be with 2266 where as it's trace shows:

    test3_rt [cc06ccb0]R C0371AD6 [on rq] 6824 2266 2263 2267 2265 (NOTLB)
    cadebc48 00000046 c04eea30 c0371ad6 00000000 00000001 00000016 c0371aa1
           c0102d00 00000000 00000000 11de7305 071e89c0 00000071 cc06ccb0 cc06cde8
           ffffffff cc06ccb0 cfae68d4 cadebc5c c0371ad6 10000000 cabc71ac cfae68d4
    Call Trace:
     [<c0371ad6>] preempt_schedule_irq+0x46/0x70 (20)
     [<c0102d00>] need_resched+0x20/0x24 (-7260)
    ---------------------------
    | preempt count: 00000002 ]
    | 2-level deep critical section nesting:
    ----------------------------------------
    .. [<c03710df>] .... __schedule+0x4f/0x840
    .....[<c0371ad6>] .. ( <= preempt_schedule_irq+0x46/0x70)
    .. [<c0371179>] .... __schedule+0xe9/0x840
    .....[<c0371ad6>] .. ( <= preempt_schedule_irq+0x46/0x70)

    Which is not very helpful.

    So it is probably stuck in some spinning "yield" loop, which was the
    reason I was writing this test to begin with! It's most likely also
    waiting for kjournald to do some work, and is starving it in a schedule
    or yield loop never actually going to sleep letting kjournald do the
    real work.

    -- Steve

    
    

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/



  • Next message: Arun Srinivas: "sched /HT processor"

    Relevant Pages

    • Re: 2.6.16-rc6-rt1
      ... includes releasing _all_ locks at each ... Or a higher priority task change the structures along the way... ... The basic loop is: ...
      (Linux-Kernel)
    • Re: Populate an ImageList from another thread
      ... Now, as for what happens when I run the code you posted: I do not find that the application locks up, nor do I get an hourglass cursor and no repainting. ... The other thing about the code is that it's in a fairly tight loop. ... Because this is timing related, it's possible that if your hardware is significantly different from mine, that could explain a perceived difference in responsiveness. ...
      (microsoft.public.dotnet.languages.csharp)
    • bad karma with loop?
      ... I use this Count Occurrences public sub with modifications ... in several VBA macros I've written. ... it or it locks up if I try to run any macro (including the ... basic one, below) containing the loop. ...
      (microsoft.public.word.vba.general)
    • Re: cant boot : Spurious ACK with kernel 2.6.19
      ... locks in a loop with the message: "atkbd.c: Spurious ACK on ... Some program might be trying access hardware ...
      (Linux-Kernel)
    • Persistent Schema-Stability (Sch-S) Locks
      ... the CPU usage on the box ... >This proc does a few selects into local variables then ... >loop, performs several selects. ... >multiple locks and has on several occasions resulted in ...
      (microsoft.public.sqlserver.programming)