Re: xfs problems (possibly after upgrading from linux kernel 2.6.27.10 to .14)



Hi Dave,

Dave Chinner schrieb:
On Tue, Feb 17, 2009 at 03:49:16PM +0100, Carsten Aulbert wrote:
Hi all,

within the past few days we hit many XFS internal errors like these. Are these
errors known (and possibly already fixed)? I checked the commits till
2.6.27.17 and there does not seem anything related to this.

.....

Feb 16 20:34:49 n0035 kernel: [275873.335916] Filesystem "sda6": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_

A transaction shutdown on create. That implies some kind of ENOSPC
issue.

Do you need more information or can I send these nodes into a re-install?

More information. Can you get a machine into a state where you can
trigger this condition reproducably by doing:

mount filesystem
touch /mnt/filesystem/some_new_file

If you can get it to that state, and you can provide an xfs_metadump
image of the filesystem when in that state, I can track down the
problem and fix it.

I can try doing that on a few machines, would a metadump help on a
machine where this corruption occurred some time ago and is still in
this state?


Feb 16 22:01:28 n0260 kernel: [1129250.851451] Filesystem "sda6": xfs_iflush: Bad inode 1176564060 magic number 0x36b5, ptr 0xffff8801a7c06c00

However, this implies some kind of memory corruption is occurring.
That is reading the inode out of the buffer before flushing the
in-memory state to disk. This implies someone has scribbled over
page cache pages.


Feb 17 05:57:44 n0463 kernel: [1156816.912129] Filesystem "sda6": XFS internal error xfs_btree_check_sblock at line 307 of file fs/xfs/xfs_btree.c. Caller 0xffffffff802dd15b

And that is another buffer that has been scribbled over.
Something is corrupting the page cache, I think. Whether the
original shutdown is caused by the some corruption, i don't
know.


At least on two nodes we ran memtest86+ overnight and so far no error.

plus a few more nodes showing the same characteristics

Hmmmm. Did this show up in 2.6.27.10? Or did it start occurring only
after you upgraded from .10 to .14?

As far as I can see this only happened after the upgrade about 14 days
ago. What strikes me odd is that we only had this occurring massively on
Monday and Tuesday this week.

I don't know if a certain access pattern could trigger this somehow.

Cheers

Carsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: OE crashes opening attachments, for one user
    ... Belarc advisor reports everything is up to date, but might not pick up some ... corruption. ... > install of the upgrade. ...
    (microsoft.public.windows.inetexplorer.ie6_outlookexpress)
  • Re: spell check
    ... I haven't seen them, but that's no guarantee, since I haven't put Word 2007 onto a system with the necessary upgrade sequence. ... Herb Tyson MS MVP ... >> If there really isn't any corruption, what about how the writing style ... I cannot add a header>> AND ...
    (microsoft.public.word.application.errors)
  • Re: OE crashes opening attachments, for one user
    ... Belarc advisor reports everything is up to date, ... some corruption. ... the install of the upgrade. ...
    (microsoft.public.windows.inetexplorer.ie6_outlookexpress)
  • Re: XP Update Versions
    ... It isn't necessarily the software program that was corrupted it's more likely that there was a corruption on the PC prior to installing the upgrade. ... If you have corruptions on the Windows ME machine prior to ...
    (microsoft.public.windowsupdate)
  • Re: XP needs repair
    ... How to Troubleshoot Registry Corruption Issues ... > upgrading a broken XP Home edition to XP Pro. ... > boot properly for the upgrade to occur? ...
    (microsoft.public.windowsxp.setup_deployment)