[SLE] SuSE 9.0 pro crashing
From: Damon Jebb (list_at_damonjebb.net)
Date: 03/15/04
- Previous message: Carl William Spitzer IV: "Re: [SLE] NIC Speed"
- Next in thread: Martin Mielke: "Re: [SLE] SuSE 9.0 pro crashing"
- Reply: Martin Mielke: "Re: [SLE] SuSE 9.0 pro crashing"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: <suse-linux-e@suse.com> Date: Mon, 15 Mar 2004 18:11:18 -0000
I am struggling with occasional crashes of my SuSE 9.0 pro server that
seem to be getting more frequent. I have been trying various google
searches for a while, but so far haven't come up with many clues. The
setup is an Athlon XP 1800+ running a clean install of SuSE 9.0. I have
LDAP, Cyrus, Postfix, Apache2 and Samba running as I use this mainly for
email and file serving to my XP workstation. All these are the standard
versions supplied with the DVD version, except that the kernel has been
updated using YOU. The disk is partitioned using the YaST partitioner
and all partitions are Ext3 and I'm using ACL's. The machine has a
single 160Gb ultra-ATA hard disk with a number of different partitions
including 512M swap space (physical RAM is also 512M).
Most often I come down in the morning to find the machine completely
locked solid, I can't get to a console using Ctrl/Alt/Fx and the network
is dead (doesn't respond to ping). After rebooting I have found entries
indicating that there is activity across the Samba network connection,
which I have traced to a nightly backup that I run on the XP workstation
(I'm currently migrating this to the server). I stopped the backup,
which helped to keep the server up over-night, but when I am doing
things that make heavy use of the server it tends to fail still. When I
am working with it I can sometimes catch it before it fails completely
and recover use of the system simply by restarting the network (~#
rcnework restart).
I have tried to get a better idea of what is happening by increasing the
logging level on Samba (level 1) and LDAP (level 384) as these seemed
most active about the times of failure. When I did this logging I ended
up with >45M of log data in a few hours, the vast majority of which is
just connect information from the LDAP server. I have turned off the
LDAP log and left the samba log running at level 1. After the last
crash I found the following entries in the log.
Mar 13 00:59:59 voyager smbd[6510]: [2004/03/13 00:59:59, 0]
rpc_server/srv_pipe.c:api_pipe_netsec_process(1299)
Mar 13 00:59:59 voyager smbd[6510]: failed to decode PDU
Mar 13 00:59:59 voyager smbd[6510]: [2004/03/13 00:59:59, 0]
rpc_server/srv_pipe_hnd.c:process_request_pdu(504)
Mar 13 00:59:59 voyager smbd[6510]: process_request_pdu: failed to do
schannel processing.
Mar 13 01:04:20 voyager smbd[6510]: [2004/03/13 01:04:20, 0]
smbd/oplock.c:oplock_brea (797)
Mar 13 01:04:20 voyager smbd[6510]: oplock_break: receive_smb timed
out after 30 seconds.
Mar 13 01:04:20 voyager smbd[6510]: oplock_break failed for file
Damon/damon.pst (dev = 306, inode = 97221, file_id = 1049).
Mar 13 01:04:20 voyager smbd[6510]: [2004/03/13 01:04:20, 0]
smbd/oplock.c:oplock_break(869)
Mar 13 01:04:20 voyager smbd[6510]: oplock_break: client failure in
oplock break in file Damon/damon.pst
Mar 13 01:05:45 voyager smbd[6510]: [2004/03/13 01:05:45, 0]
smbd/oplock.c:oplock_break(797)
Mar 13 01:05:45 voyager smbd[6510]: oplock_break: receive_smb timed
out after 30 seconds.
Mar 13 01:05:45 voyager smbd[6510]: oplock_break failed for file
Damon/Damon 00-01aSubmissionResponse.xml (dev = 306, inode = 97217,
file_id = 1050).
I found a reference to this error message that indicated that the
entries
oplocks = no
level2 oplocks = no
kernel oplocks = no
would help, though it was quite old and referred to an earlier version
of Samba. I tried the settings anyway, but with little change. I am
now seeing errors like this.
Mar 15 14:14:59 voyager imapd[2093]: Connection reset by peer, closing
connection
and
Mar 15 14:24:39 voyager postfix/smtp[2129]: 378E0134FBE:
to=<root@localhost.damonjebb.net>, orig_to=<root@localhost>, relay=none,
delay=0, status=deferred (connect to localhost[127.0.0.1]: Connection
refused)
The postfix/smtp error is not normal, most of the time the system works
and connects cleanly to send/receive email. I am using Outlook XP,
which from some posts I've seen could be responsible for the imap issue,
but I'm not sure and don't know how to diagnose further.
The issue seems to be related to load on the server. I am currently
trying to setup 'bacula' (www.bacula.org) to run backups on the server
and when this is running a large backup or restore things seem likely to
fail if I am trying to do something else at the same time (normally I
wouldn't be backing up and working, but I'm trying to set it up and test
it, hence the activity). Bacula connects to it's components using
TCP/IP and to a mysql backend DB using TCP/IP. Running the backup
results in thousands of lines being written to the database and also a
log file (was 50mb after a few days testing).
Has anyone seen similar behaviour before? If not does anyone have any
suggestions for how I can get a better understanding of why this is
happening? I really don't know where to start, and simply turning on
the logging seems likely to contribute to the problem if load is an
issue.
Thanks for taking the time to read this,
Damon
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
- Previous message: Carl William Spitzer IV: "Re: [SLE] NIC Speed"
- Next in thread: Martin Mielke: "Re: [SLE] SuSE 9.0 pro crashing"
- Reply: Martin Mielke: "Re: [SLE] SuSE 9.0 pro crashing"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|