[SLE] SuSE 9.0 pro crashing

From: Damon Jebb (list_at_damonjebb.net)
Date: 03/15/04

  • Next message: M. Halegua: "[SLE] Digital Camera issues with SUSE 9.0"
    To: <suse-linux-e@suse.com>
    Date: Mon, 15 Mar 2004 18:11:18 -0000
    
    

    I am struggling with occasional crashes of my SuSE 9.0 pro server that
    seem to be getting more frequent. I have been trying various google
    searches for a while, but so far haven't come up with many clues. The
    setup is an Athlon XP 1800+ running a clean install of SuSE 9.0. I have
    LDAP, Cyrus, Postfix, Apache2 and Samba running as I use this mainly for
    email and file serving to my XP workstation. All these are the standard
    versions supplied with the DVD version, except that the kernel has been
    updated using YOU. The disk is partitioned using the YaST partitioner
    and all partitions are Ext3 and I'm using ACL's. The machine has a
    single 160Gb ultra-ATA hard disk with a number of different partitions
    including 512M swap space (physical RAM is also 512M).

    Most often I come down in the morning to find the machine completely
    locked solid, I can't get to a console using Ctrl/Alt/Fx and the network
    is dead (doesn't respond to ping). After rebooting I have found entries
    indicating that there is activity across the Samba network connection,
    which I have traced to a nightly backup that I run on the XP workstation
    (I'm currently migrating this to the server). I stopped the backup,
    which helped to keep the server up over-night, but when I am doing
    things that make heavy use of the server it tends to fail still. When I
    am working with it I can sometimes catch it before it fails completely
    and recover use of the system simply by restarting the network (~#
    rcnework restart).

    I have tried to get a better idea of what is happening by increasing the
    logging level on Samba (level 1) and LDAP (level 384) as these seemed
    most active about the times of failure. When I did this logging I ended
    up with >45M of log data in a few hours, the vast majority of which is
    just connect information from the LDAP server. I have turned off the
    LDAP log and left the samba log running at level 1. After the last
    crash I found the following entries in the log.

    Mar 13 00:59:59 voyager smbd[6510]: [2004/03/13 00:59:59, 0]
    rpc_server/srv_pipe.c:api_pipe_netsec_process(1299)
    Mar 13 00:59:59 voyager smbd[6510]: failed to decode PDU
    Mar 13 00:59:59 voyager smbd[6510]: [2004/03/13 00:59:59, 0]
    rpc_server/srv_pipe_hnd.c:process_request_pdu(504)
    Mar 13 00:59:59 voyager smbd[6510]: process_request_pdu: failed to do
    schannel processing.
    Mar 13 01:04:20 voyager smbd[6510]: [2004/03/13 01:04:20, 0]
    smbd/oplock.c:oplock_brea (797)
    Mar 13 01:04:20 voyager smbd[6510]: oplock_break: receive_smb timed
    out after 30 seconds.
    Mar 13 01:04:20 voyager smbd[6510]: oplock_break failed for file
    Damon/damon.pst (dev = 306, inode = 97221, file_id = 1049).
    Mar 13 01:04:20 voyager smbd[6510]: [2004/03/13 01:04:20, 0]
    smbd/oplock.c:oplock_break(869)
    Mar 13 01:04:20 voyager smbd[6510]: oplock_break: client failure in
    oplock break in file Damon/damon.pst
    Mar 13 01:05:45 voyager smbd[6510]: [2004/03/13 01:05:45, 0]
    smbd/oplock.c:oplock_break(797)
    Mar 13 01:05:45 voyager smbd[6510]: oplock_break: receive_smb timed
    out after 30 seconds.
    Mar 13 01:05:45 voyager smbd[6510]: oplock_break failed for file
    Damon/Damon 00-01aSubmissionResponse.xml (dev = 306, inode = 97217,
    file_id = 1050).

    I found a reference to this error message that indicated that the
    entries
    oplocks = no
    level2 oplocks = no
    kernel oplocks = no

    would help, though it was quite old and referred to an earlier version
    of Samba. I tried the settings anyway, but with little change. I am
    now seeing errors like this.

    Mar 15 14:14:59 voyager imapd[2093]: Connection reset by peer, closing
    connection

    and

    Mar 15 14:24:39 voyager postfix/smtp[2129]: 378E0134FBE:
    to=<root@localhost.damonjebb.net>, orig_to=<root@localhost>, relay=none,
    delay=0, status=deferred (connect to localhost[127.0.0.1]: Connection
    refused)

    The postfix/smtp error is not normal, most of the time the system works
    and connects cleanly to send/receive email. I am using Outlook XP,
    which from some posts I've seen could be responsible for the imap issue,
    but I'm not sure and don't know how to diagnose further.

    The issue seems to be related to load on the server. I am currently
    trying to setup 'bacula' (www.bacula.org) to run backups on the server
    and when this is running a large backup or restore things seem likely to
    fail if I am trying to do something else at the same time (normally I
    wouldn't be backing up and working, but I'm trying to set it up and test
    it, hence the activity). Bacula connects to it's components using
    TCP/IP and to a mysql backend DB using TCP/IP. Running the backup
    results in thousands of lines being written to the database and also a
    log file (was 50mb after a few days testing).

    Has anyone seen similar behaviour before? If not does anyone have any
    suggestions for how I can get a better understanding of why this is
    happening? I really don't know where to start, and simply turning on
    the logging seems likely to contribute to the problem if load is an
    issue.

    Thanks for taking the time to read this,

    Damon

    -- 
    Check the headers for your unsubscription address
    For additional commands send e-mail to suse-linux-e-help@suse.com
    Also check the archives at http://lists.suse.com
    Please read the FAQs: suse-linux-e-faq@suse.com
    

  • Next message: M. Halegua: "[SLE] Digital Camera issues with SUSE 9.0"

    Relevant Pages

    • RE: SBS Back up Failure
      ... attached the log and report from yesterday's backup also. ... One or more components of Small Business Server Backup failed. ... recommended that you review errors in the Event log related to the service. ... Notifications task in the Server Management Monitoring and Reporting taskpad. ...
      (microsoft.public.windows.server.sbs)
    • Re: MSKB 891957, VSS Update for Windows Server 2003
      ... Well, it turns out the RDP connection dropping issue won't go away, ... level to the server in the office may have an affect on the issue. ... I just remembered I also re-installed RDP Client V6 last night as ... I left the server with user Backup logged in when I left the ...
      (microsoft.public.windows.server.general)
    • Re: MSKB 891957, VSS Update for Windows Server 2003
      ... I left the connection sit idle and checked back in an hour. ... server and browsed around for a few minutes. ... it would seem that there is still some issue with the V6 RDP ... I left the server with user Backup logged in when I left the ...
      (microsoft.public.windows.server.general)
    • Re: SCO 5.0.7 MP5 network hung up
      ... The last time I had a problem with streams on 5.0.7 it was caused by ... Is Samba in the mix? ... own directory, such as '/var/fetch', so it can be left out of backup ... "floating" IP address and be served by the backup server. ...
      (comp.unix.sco.misc)
    • RE: Server Management Backup Page Not Found
      ... reinstall backup and monitoring components. ... Perform a full backup of the SBS server. ...
      (microsoft.public.windows.server.sbs)