RE: process starvation with 2.6 scheduler




I have verified that the starved tasks are in the runqueue (prio_array_t
array[0], active points to array[0]), the timestamp and last_ran
indicate that they have not run for a while.

The network traffic is of request response type.

Client (on an external box)3 ports ---- 3 cables ----3 ports Emulated Host

The netperf clients run on an external box, the emulated host (ppc440) runs
the servers. A client sends request to a server, the server returns the
reply, then the next request from the client goes to the server. There are 3
clients and 3 servers, one client-server pair for each connection
(3 connections: 3 ports on external box --3 connection
-- 3 ports on emulated host).

Since traffic is of request/response in nature and the packets reach
user space (to netserver) before turning around I do not think slow CPU is an issue.

-----Original Message-----
From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] On Behalf Of Stephen Hemminger
Sent: Tuesday, June 06, 2006 9:56 AM
To: linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: process starvation with 2.6 scheduler

On Tue, 06 Jun 2006 10:01:58 +0200
Mike Galbraith <efault@xxxxxx> wrote:

(please line wrap)

On Mon, 2006-06-05 at 12:48 -0700, Kallol Biswas wrote:
Hello,
We have a process starvation problem with our 2.6.11 kernel running on a ppc-440 based system.

We have a storage SOC based on PPC-440. The SOC is emulated on a system emulator called Palladium. It is from Cadence. The system runs at 400KHz speed. It has three Ethernet ports; they are connected to outside lab network with a speed bridge.

The netperf server netserver runs on the emulated system (2.6.11 kernel on Palladium). There are netperf linux clients running on a x86 box.

If netperf request response (TCP_RR) traffic is run on all three ports; after sometime only one port remains active, the application (netperf client) on other two ports wait for a long time and eventually time out.

The netserver code has been instrumented. For one of the starved netserver processes it has been found that the TCP_RR request from the netperf client on linux x86 box has been received by the server, it has issued send() call to send back reply but send() never returns.

With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process.


For Active Process:
Time_slice 84
Policy : SCHED_NORMAL
Dynamic priority: 118
Static priority: 120
Preempt_count: 0x20100
Flags = 0
State = 0 (TASK_RUNNING)

For Starved Process:
Time slice: 77
Policy: SCHED_NORMAL
Dynamic priority: 120
Static priority: 120
Preempt_count: 0x10000000 (PREEMPT_ACTIVE is set)
Flags = 0
State = 0 (TASK_RUNNING)

Any help to debug the problem is welcome.

I'm having difficulty understanding. Are you saying that the "starved"
tasks are runnable, but receiving _zero_ cpu? That's impossible with
only one other SCHED_NORMAL task afaik, which makes me think you may
mean they're not receiving cpu frequently enough to keep clients from
timing out? One task which has slept enough to acquire interactive
status (as above) can hold others off the cpu for quite a while if it
starts a burst of heavy cpu burning. If your netperf clients are
choking on this latency, running the servers at nice 19 should prevent
the problem.



Is the processor getting consumed by network traffic in soft irq?
If you are using non NAPI device driver, then it is easy to get soft irq
overwhelmed with packets.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: VB6 Winsock action on Server
    ... my usage of server app I meant the app on the server ... It is this piece 'dll' that i ... to delegate each request to a separate instance ... back to your Clients directly from *inside* the WorkerThreads, ...
    (microsoft.public.vb.general.discussion)
  • RE: process starvation with 2.6 scheduler
    ... We do not see the issue if every netserver's priority is set to 19 with setprioritycall. ... The netperf clients run on an external box, ... A client sends request to a server, ... With an ICE connected to the Palladium (emulator) I have dumped the kernel data structures of the starved process and the active process. ...
    (Linux-Kernel)
  • Re: Millions of Threads ?
    ... phones as clients. ... Of course, I need a server, too. ... turn-around time of 0.5 seconds, that suggests that you would have an average ... Notice that, on the above assumptions, you will be serving a request at ...
    (comp.lang.java.programmer)
  • Re: HTTP tunneling and Servlet communication
    ... almost definitely be behind a firewall/using a proxy server. ... I have come up with a class which acts as a Facade to clients and presents a course simpler request/response API to clients. ... I am not sure how to convert my method call into an HTTP request. ... Can I send request/responses AND data on the same tunnel?. ...
    (comp.lang.java.programmer)
  • Re: DHCP basic question - turning on scope of same range?
    ... The Clients always request the same number they ... I've juggled DHCP Servers around here a few times and I only ... It would> be a good idea after the change to have the clients release and renew their> addresses. ... I have configured a DHCP scope of the same IP>> range on a W2K server and would like to simply shut off ...
    (microsoft.public.win2000.networking)