page_alloc.c bug and heavy I/O

From: Xiaogang Wang (xiaogang.wang_at_umontreal.ca)
Date: 08/08/03

  • Next message: David Woodhouse: "Re: [uClinux-dev] Kernel 2.6 size increase"
    Date:	Fri, 8 Aug 2003 10:39:45 -0400
    To: linux-kernel@vger.kernel.org
    
    

    Hi,

    My hardware and softare:

      Asus P4P800, 2GB memory, 2.8GHZ P4 with HT enabled.
      On-board 3com Giga bit network card
      1 parallel ata 160G maxtor disk
      Nvidia Gefore4 MX440-8x graphics card (Asus V9180)

      Redhat 7.3, original kernel 2.4.18-3
      Intel Fortran Compiler 7.1
      Intel Math Kernel Library 6.0

    My problem is that one of my fortran code always crashes after 10-24 hours.
    This code has a heavy IO. It writes out a 5MB binary file every 1 minute.

    The error message in /var/log/message is: (coulson is the name of the computer)

    Aug 7 21:11:29 coulson kernel: kernel BUG at page_alloc.c:226!
    Aug 7 21:11:29 coulson kernel: invalid operand: 0000
    Aug 7 21:11:29 coulson kernel: nfsd lockd sunrpc binfmt_misc sr_mod soundcore
    parport_pc lp parport autofs 3c
    ....

    the line number with page_alloc.c varies for different crases (not always 226).

    This code also had a heavy IO. Specifically, it writes out a 5MB file every
    1 minute.

    I have done a couple of tests to try to find the cause, but without success so
    far.

    1) I have rmmod the 3com2000.0 network driver. The driver source is downloaded
    from asus website, and compiled by me. The crash still occurs.

    2) I heard of Nvidia binary driver can cause page_alloc.c kernel bug.
    I do have a Nvidia Gefore4 MX440-8x graphics card (Asus V9180), but I did
    not use Nvidia binary driver. Instead I used the vesa driver coming with redhat
    7.3. Nevertheless, I changed to an old pci ATI rage graphics card. But the
    crash still occurs.

    3) when the crash occurs, my local X is up and running. I have not tried
    the case with local X shutdown.

    I also got the same crash when I run the code on a second computer with the
    same hw and sw. This makes the hw defect less likely to be the cause.

    I am thinking to recompile a new kernel. But now I focus on if it is caused
    by some uncompatible module drivers.

    I would appreciate your inputs on this. Please cc your answer to me. I am not on
    the list.

    Xiaogang

    ------------------------------------------------
    Dr Xiaogang Wang
    Departement de chimie
    Universite de Montreal
    C.P. 6128, succursale Centre-ville
    Montreal (Quebec) H3C 3J7

    Tel. (514) 3436111 ext 3947 (office)
    FAX (514) 3437586 (office)
    e-mail: xiaogang.wang@umontreal.ca
    homepage: http://www.esi.umontreal.ca/~wangx
    ------------------------------------------------

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: David Woodhouse: "Re: [uClinux-dev] Kernel 2.6 size increase"

    Relevant Pages

    • Re: 2.6.30-rc4 kernel
      ... I think there may be a problem with the 2.6.30 kernel that is ... # Generic Driver Options ... # PCI IDE chipsets support ... # Other IDE chipsets support ...
      (Linux-Kernel)
    • 2.6.30-rc4 kernel
      ... kernel panic - not syncing: ... # Generic Driver Options ... # PCI IDE chipsets support ... # Other IDE chipsets support ...
      (Linux-Kernel)
    • [PATCH 18-rc2] Fix typos in /Documentation : N-P
      ... Again, if you're not gonna do synchronization with disk drives (dang, ... -the kernel. ... There are two options specific to PSX driver portion. ... The driver uses the settings from the EEPROM set in the SCSI BIOS ...
      (Linux-Kernel)
    • Re: Help to evaluate the system crash resistance method
      ... Neither of these is a small task such as tweaking an IDT ... > Windows 2k/XP/2k3 Filesystem and Driver Consulting ... >> I just thought a approach to hold system crash caused by those tiny SW ... >> for a while, then kernel will re-schedule to other threads.In this case, ...
      (microsoft.public.development.device.drivers)
    • Re: Help to evaluate the system crash resistance method
      ... Neither of these is a small task such as tweaking an IDT ... > Windows 2k/XP/2k3 Filesystem and Driver Consulting ... >> I just thought a approach to hold system crash caused by those tiny SW ... >> while, then kernel will re-schedule to other threads.In this case, system ...
      (microsoft.public.windowsxp.device_driver.dev)