fork failed cannot allocate memory

From: Ben Russo (Ben_at_muppethouse.com)
Date: 02/23/04

  • Next message: Douglas Phillipson: "Re: Fate of RedHat"
    To: redhat-list@redhat.com
    Date: Mon, 23 Feb 2004 14:43:53 -0500
    
    

    I have a Dell 2650, dual Xeon box, 2GB RAM, 6GB swap with PERC Hardware
    RAID card. It was running RedHat AS2.1 with the 2.4.9-e.27smp kernel.
    Other than the kernel version, the box was fully up2date with patches.

    The box doesn't have too much running, apache (idle 99.999% of the time)
    ntp, snmp, sshd, netcool object server and some custom perl scripts
    (running with NON-ROOT ownership and some persistent "tail" commands
    feeding the perl processes).

    Every week or two the box will stop allowing new netcool client
    connections, and ssh connection attempts result in
            ssh fork failed: Cannot allocate memory
    If I keep trying every few seconds I can eventually get in, but the
    shell is unable to execute the environtment (bash_profile) config files,
    it gives a bunch of "fork failed: Cannot allocate memory" errors from
    bash and then dumps me at a bash prompt. I can then run shell built-ins
    like cd & pwd, and tab-completion works, but I can't run any processes.

    I was able to cd /proc, and then use "ls <tab><tab>" to see what pids
    were listed in proc. I then just crossed my fingers and tried to kill
    one of them... It kept giving me "fork" errors about allocating memory,
    but I kept trying again and again until it succeeded in killing something.

    Immediately after I killed some process I was able to run top and get
    another login! (hurray!) but then the same problem started again.
    At least this time I had top running. I could see that the load on the
    box was minimal (less than 0.3). The CPU's were 90%+ idle. "top"
    showed that the swap space (three 2GB swap partitions for a total of
    6GB) was almost completely unused, only about 7250KB of swap was used.
    The box has 2GB of RAM, the free command showed that only about 256MB
    were being used for processes and process-data. The rest was being used
    for cache/buffer, and about 9MB was "free". There were only about 50
    processes running on the box. netstat -nap only showed a few hundred
    open socket/ports.

    lsof | wc -l showed only 1703 files open, vmstat showed no unusual
    numbers, the box was basically idle.

    Top, sort by RAM and sort by Time showed the Netcool database and a perl
    process had both been running for several weeks, but weren't being CPU
    hogs, and each was only using about 70MB of RAM.

    By killing the httpd process and the ntpd process and then restarting
    them the problems appear to have gone away. (I doubt they had anything
    to do with the problem, I was just restarting processes to get the
    problem to go away)

    I don't see this problem on any of the dozens of other boxes I have that
    run the same OS on the same hardware. I installed the latest
    kernel-2.4.9-e.38 (or 39?) and rebooted last night. Is there some other
    metric or system parameter I should be looking at? What else would cause
    this problem? This is really bad.

    The only thing different on this machine is the netcool procs, perl
    processes and scripts and the tail processes that run continuously, but
    all of these are running with non-root privileges, how could a non-root
    process cause this?

    -- 
    redhat-list mailing list
    unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
    https://www.redhat.com/mailman/listinfo/redhat-list
    

  • Next message: Douglas Phillipson: "Re: Fate of RedHat"

    Relevant Pages

    • Re: excessive swap-in time
      ... Trying to do it all with RAM is just inefficient. ... I find that does not work well when the swapping is caused by excessive I/O ... Or you could have 5 TB of swap space. ...
      (comp.os.linux.development.system)
    • SLUB 0:1 SLAB (OOM during massive parallel kernel builds)
      ... 2G of RAM, 1G of swap partition. ... DMA per-cpu: ... kill process 25286 score 188662 or a child ...
      (Linux-Kernel)
    • 2.6.26-rc5-mm2: OOM with 1G free swap
      ... OOM condition happened with 1G free swap. ... 4G RAM, 1G swap partition, normally LTP survives during much, much higher ... Call Trace: ... 675611 total pagecache pages ...
      (Linux-Kernel)
    • Re: 2.6.0-test9 - poor swap performance on low end machines
      ... >> Well I was considering adding the swap pressure to this algorithm but I ... It wont be as aggressive as setting the swappiness ... > The test compile started in a similar way to the compile when using your ... detriment of other tasks that are on the runqueue and still need ram. ...
      (Linux-Kernel)
    • RE: Big Problem, Load Avg Very High
      ... I disabled one of my clients web pages (the one that gets the most hits ... 500MB of swap used? ...
      (freebsd-questions)