RH 7.3 system crashing
cfiene_at_nospam.iqueset.net
Date: 09/14/03
- Next message: Max Burke: "Re: M$ attack on Common Sense"
- Previous message: Sinister Midget: "Re: M$ attack on Common Sense"
- Next in thread: Michael Heiming: "Re: RH 7.3 system crashing"
- Reply: Michael Heiming: "Re: RH 7.3 system crashing"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 13 Sep 2003 23:32:39 -0500
I have a RH7.3 system that has been crashing consistently. The system
is used as a Samba server , cashing name server, internal web server
and nfs server. Has a combination of SCSI and EIDE, with
SCSI drives setup as RAID5.
kernel 2.4.20-20.7smp Samba 2.2.7-3.7.3
Here's the history.
Monday:. System was not responding rebooted and studied logs. Only
troubling errors were:
kernel: Unable to handle kernel paging request at virtual address
XXXXXX
After looking this up on deja, I suspected memory problems since we
just moved the system
across town. Reseated the memory that evening and restarted the
weekend backups.
Tuesday: System had not crashed but nmbd had died and webpages would
not display.
restarted samba , apache and backup script
System crashed about 2 hours later. I then began to setup a new
system new MB , CPUs and RAM.
Burned in new system with a spare drive until that evening. Moved
drives and cards to new MB that
evening.
Wednesday: System was not responding . Still getting 'Unable to
handle paging request' in log.
Suspected swap area may be bad on drive. Took swap off-line and
reinitialized it. then brought it
back online. Loaded system heavily until all memory and all
virtual memory was in play. No problems
during this time. Went home.
Thursday: System was not responding. 'Unable to handle paging
request' in log again. Rebooted.
Setup spare SCSI drive to replace swap partition. That evening,
added new swap drive, enabled
new swap drive, disabled old swap partition. Reran heavy load
test. Seemed OK.
Friday: System was not responding again. Rebooted. Re-enabled new
swap drive, disabled old. and
ran 'badblocks' on old swap partition with no problems (many
times). Back to deja for more ideas.
The one thing that was consistent throughout this is that the major
failures occurred when the backup
script ran. A few of the tar files are greater than 2G. The tape
backup system pulls these files via
nfs for archival. NFSV2 which is the default has a 2G limit.
I first tried just switching to an smb mount, and while there was
no crash, the files were not backed-up
successfully. There were some limit errors in the log, that lead
me to conclude it has a 2G limit as well.
Next I tried to force the use of NFSV3, with the nfsvers=3 option to
mount. The system did not crash
during backup, but I still got 'Unable to handle paging request' in
the log. I don't know if the backup was
successful yet. Based on the number of tapes used, I doubt it.
I am sure I can come up with something that will work, but I figured
someone might find this info useful. I understand why there has been a
2 Gig limit and why many utilities may still have this limitation
(2^31). What I don't understand is how this can cause stability
problems to the whole system. This seems like a bug at the VM level to
me.
Thoughout the above, I occasionally had a task die for no apparent
reason, except that there was a 'Unable to handle paging request' error
in the log about the same time. Specifically I had an ssh session
die with a sigsegv while I was testing the backup transfer.
Any thoughts?
- Next message: Max Burke: "Re: M$ attack on Common Sense"
- Previous message: Sinister Midget: "Re: M$ attack on Common Sense"
- Next in thread: Michael Heiming: "Re: RH 7.3 system crashing"
- Reply: Michael Heiming: "Re: RH 7.3 system crashing"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|