Re: kernel BUG at fs/direct-io.c:916!



On Mon, Mar 27, 2006 at 01:03:42PM +0200, Ralf Hildebrandt wrote:
* Nathan Scott <nathans@xxxxxxx>:
On Mon, Mar 27, 2006 at 01:03:59AM +0200, Ralf Hildebrandt wrote:
* Nathan Scott <nathans@xxxxxxx>:

Hmm, there were XFS patches in -mm last week, but they also got
merged to mainline last week, not clear whether your git kernel
had those changes or not. I think there's probably some direct
I/O (generic) changes in -mm too based on list traffic from the
last couple of weeks (I'm an -mm lamer, sorry, couldn't easily
tell you exactly what patches those might be) - could you retry
with todays git snapshot and see if mainline is affected? Else
we'll need to find and analyse any -mm fs/direct-io.c patches.

2.6.16-git12 also fails utterly:

Could you also try reverting this patch:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1d8fa7a2b9a39d18727acc5c468e870df606c852

and let me know if the problem still happens?

Reverting this particular patch does ELIMINATE the problem.
Excellent!

OK, I think I see whats gone wrong here now. Ralf, could you try
the patch below and check that it fixes your test case?

Badari, it looks like a regression from the "remove ->get_blocks()
support" patch - can you look over the fix below and confirm/deny
please?

I'm definately seeing block mapping requests that are smaller than
the filesystem block size coming into XFS from direct-io.c - and it
looks like that eventually blows up in do_direct_IO if dio_remainder
becomes set and we could only map one block (if dio->blocks_available
was 1 after get_more_blocks). We'll reduce that to zero right at the
end of the branch that calls get_more_blocks in do_direct_IO... and
mayhem ensues further on.

I have a couple of other .17 changes pending, if you could ACK this
I'll get it merged in for ya.

cheers.

--
Nathan


Index: xfs-linux-2.6/fs/direct-io.c
===================================================================
--- xfs-linux-2.6.orig/fs/direct-io.c
+++ xfs-linux-2.6/fs/direct-io.c
@@ -524,8 +524,6 @@ static int get_more_blocks(struct dio *d
*/
ret = dio->page_errors;
if (ret == 0) {
- map_bh->b_state = 0;
- map_bh->b_size = 0;
BUG_ON(dio->block_in_file >= dio->final_block_in_request);
fs_startblk = dio->block_in_file >> dio->blkfactor;
dio_count = dio->final_block_in_request - dio->block_in_file;
@@ -534,6 +532,9 @@ static int get_more_blocks(struct dio *d
if (dio_count & blkmask)
fs_count++;

+ map_bh->b_state = 0;
+ map_bh->b_size = fs_count << dio->inode->i_blkbits;
+
create = dio->rw == WRITE;
if (dio->lock_type == DIO_LOCKING) {
if (dio->block_in_file < (i_size_read(dio->inode) >>
@@ -542,13 +543,13 @@ static int get_more_blocks(struct dio *d
} else if (dio->lock_type == DIO_NO_LOCKING) {
create = 0;
}
+
/*
* For writes inside i_size we forbid block creations: only
* overwrites are permitted. We fall back to buffered writes
* at a higher level for inside-i_size block-instantiating
* writes.
*/
- map_bh->b_size = fs_count << dio->blkbits;
ret = (*dio->get_block)(dio->inode, fs_startblk,
map_bh, create);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: kernel BUG at fs/direct-io.c:916!
    ... merged to mainline last week, not clear whether your git kernel ... Reverting this particular patch does ELIMINATE the problem. ... it may not indicate the filesystem block size all the time". ...
    (Linux-Kernel)
  • request for v2.6.22.19-queue
    ... [PATCH] Be more robust about bad arguments in get_user_pages ... MAINLINE: 900cf086fd2fbad07f72f4575449e0d0958f860f ... This patch should fix the issue. ... in the struct nfs_server. ...
    (Linux-Kernel)
  • Re: spitz (zaurus sl-c3000) support
    ... so it should be spitz. ... write the patch and I think a reference to the battery device sneaked ... into mainline when it shouldn't have done. ... agreed some changes to enable it to stand a chance of making mainline. ...
    (Linux-Kernel)
  • Re: [PATCH] CRISv10 improve and bugfix fasttimer
    ... Jesper Nilsson wrote: ... that has gotten out of sync with mainline (although we've continued ... This means that the CRISv10 architecture has gotten broken in mainline, ... I'll submit a later patch that removes this. ...
    (Linux-Kernel)
  • Re: [linux-pm] [RFC] userland swsusp
    ... > into mainline as it is today. ... Suspend2 and part of those stats you saw in July. ... The patch currently ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)