Re: [PATCH 1/3] accounting: task counters for disk/network



On Tue, 8 Apr 2008 07:48:37 +0200 Gerlof Langeveld <gerlof@xxxxxxxxxxxxxx> wrote:

--- linux-2.6.24.4-vanilla/block/ll_rw_blk.c 2008-03-24 19:49:18.000000000 +0100
+++ linux-2.6.24.4-modified/block/ll_rw_blk.c 2008-03-25 13:52:14.000000000 +0100
@@ -2739,6 +2739,19 @@ static void drive_stat_acct(struct reque
disk_round_stats(rq->rq_disk);
rq->rq_disk->in_flight++;
}
+
+#ifdef CONFIG_TASK_IO_ACCOUNTING
+ switch (rw) {
+ case READ:
+ current->group_leader->ioac.dsk_rio += new_io;
+ current->group_leader->ioac.dsk_rsz += rq->nr_sectors;
+ break;
+ case WRITE:
+ current->group_leader->ioac.dsk_wio += new_io;
+ current->group_leader->ioac.dsk_wsz += rq->nr_sectors;
+ break;
+ }
+#endif

For many workloads, this will cause almost all writeout to be accounted to
pdflush and perhaps kswapd. This makes the per-task write accounting
largely unuseful.

There are several situations that writeouts are accounted to the user-process
itself, e.g. when issueing direct writes (open mode O_DIRECT) or synchronous
writes (open mode O_SYNC, syscall sync/fsync, synchronous file attribute,
synchronous mounted filesystem).

yup.

Apart from that, swapping out of process pages by kswapd is currently not
accounted at all as shown by the following snapshot of 'atop' on a heavily
swapping system:

Under heavy load, callers into alloc_pages() will themselves perform disk
writeout. So under the proposed scheme, process A will be accounted for
writeout which was in fact caused by process B.

So the extra counters can be considered as a useful addition to the I/O
counters that are currently maintained.

mmm, maybe. But if we implement a partial solution like this we really
should have a plan to finish it off.

There have been numerous attempts at this, which tend to involve adding
backpointers to the pageframe structure and such.

This sort of accounting will presumably be needed by a disk bandwidth
cgroup controller. Perhaps the containers/cgroup people have plans of code
already?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages