RH 7.3 system crashing

cfiene_at_nospam.iqueset.net
Date: 09/14/03


Date: Sat, 13 Sep 2003 23:32:39 -0500

I have a RH7.3 system that has been crashing consistently. The system
is used as a Samba server , cashing name server, internal web server
and nfs server. Has a combination of SCSI and EIDE, with
SCSI drives setup as RAID5.

kernel 2.4.20-20.7smp Samba 2.2.7-3.7.3

Here's the history.

Monday:. System was not responding rebooted and studied logs. Only
troubling errors were:

    kernel: Unable to handle kernel paging request at virtual address
XXXXXX

    After looking this up on deja, I suspected memory problems since we
just moved the system
    across town. Reseated the memory that evening and restarted the
weekend backups.

Tuesday: System had not crashed but nmbd had died and webpages would
not display.
    restarted samba , apache and backup script

    System crashed about 2 hours later. I then began to setup a new
system new MB , CPUs and RAM.
    Burned in new system with a spare drive until that evening. Moved
drives and cards to new MB that
    evening.

Wednesday: System was not responding . Still getting 'Unable to
handle paging request' in log.
    Suspected swap area may be bad on drive. Took swap off-line and
reinitialized it. then brought it
    back online. Loaded system heavily until all memory and all
virtual memory was in play. No problems
     during this time. Went home.

Thursday: System was not responding. 'Unable to handle paging
request' in log again. Rebooted.
     Setup spare SCSI drive to replace swap partition. That evening,
added new swap drive, enabled
     new swap drive, disabled old swap partition. Reran heavy load
test. Seemed OK.

Friday: System was not responding again. Rebooted. Re-enabled new
swap drive, disabled old. and
    ran 'badblocks' on old swap partition with no problems (many
times). Back to deja for more ideas.

    The one thing that was consistent throughout this is that the major
failures occurred when the backup
     script ran. A few of the tar files are greater than 2G. The tape
backup system pulls these files via
     nfs for archival. NFSV2 which is the default has a 2G limit.

    I first tried just switching to an smb mount, and while there was
no crash, the files were not backed-up
    successfully. There were some limit errors in the log, that lead
me to conclude it has a 2G limit as well.

    Next I tried to force the use of NFSV3, with the nfsvers=3 option to
mount. The system did not crash
    during backup, but I still got 'Unable to handle paging request' in
the log. I don't know if the backup was
    successful yet. Based on the number of tapes used, I doubt it.

I am sure I can come up with something that will work, but I figured
someone might find this info useful. I understand why there has been a
2 Gig limit and why many utilities may still have this limitation
(2^31). What I don't understand is how this can cause stability
problems to the whole system. This seems like a bug at the VM level to
me.

Thoughout the above, I occasionally had a task die for no apparent
reason, except that there was a 'Unable to handle paging request' error
in the log about the same time. Specifically I had an ssh session
die with a sigsegv while I was testing the backup transfer.

Any thoughts?



Relevant Pages

  • Re: NAS or USB Backup?
    ... Has anyone tried putting a second NIC in the server and putting the NAS on ... If I wasn't so cheap, I'd buy boxed drives instead of OEM, and I'd learn ... What you do to restore individual files is to "mount" the ... ShadowProtect backup - any full or incremental backup you choose. ...
    (microsoft.public.windows.server.sbs)
  • Re: NAS or USB Backup?
    ... I have considered doing it with a member server. ... If I wasn't so cheap, I'd buy boxed drives instead of OEM, and I'd learn ... What you do to restore individual files is to "mount" the ... ShadowProtect backup - any full or incremental backup you choose. ...
    (microsoft.public.windows.server.sbs)
  • Re: SBS 2008 Backup - restore utility?
    ... If you've already installed a fresh copy of SBS 2008 on another server, ... the Recovery Wizard in Windows Server Backup to recover files and folders ... On the Specify location type window, choose "Local drives" and clik ...
    (microsoft.public.windows.server.sbs)
  • Re: Several SBS services dead
    ... Backup running may or may not be related. ... old Dell PE 830 server with CERC SATA controller and RAID1 Seagate drives ... Logical Disk Manager, Shell Hardware Detection, and Automatic Updates ...
    (microsoft.public.windows.server.sbs)
  • Re: NAS or USB Backup?
    ... If I wasn't so cheap, I'd buy boxed drives instead of OEM, and I'd learn ... any noticeable impact on server performance. ... What you do to restore individual files is to "mount" the ... ShadowProtect backup - any full or incremental backup you choose. ...
    (microsoft.public.windows.server.sbs)