Re: Server stops responding



Hi. I know this isn't a very innovative idea, but how about you just reset
it every half month? (Just kidding, sort of)

I'm not a super expert with Ubuntu, but - just an idea - do you think the
problem have something to do with cron? I don't know much about cron, so I
don't know. It's probably not that, because no one else has said suggested
it yet.

David

On Sun, Jul 26, 2009 at 9:38 AM, Hal Burgiss <hal@xxxxxxxxxxx> wrote:

I have an issue with an 8.04 server, that about once a month, stops
responding. It doesn't "crash", really, it just stops responding.

Testing open ports:

$ nmap example.com

Starting nmap 3.70 ( http://www.insecure.org/nmap/ ) at 2009-07-26 08:33
EDT
Interesting ports on example.com:
(The 1655 ports scanned but not shown below are in state: closed)
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
80/tcp open http
443/tcp open https
3306/tcp open mysql

Looks good. Problem is none of those will fully establish connection. An
attempt to connect via ssh:

$ tcpdump -v host example.com
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96
bytes

08:35:07.529666 IP (tos 0x0, ttl 64, id 63108, offset 0, flags [DF], proto
6,
length: 60) example2.com.48625 > example.com.ssh: S
[tcp sum ok] 365499356:365499356(0) win 5840 <mss 1460,sackOK,timestamp
3810846040 0,nop,wscale 2>

08:35:07.530225 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto 6,
length: 60) example.com.ssh > example2.com.48625: S
[tcp sum ok] 2913998847:2913998847(0) ack 365499357 win 5792 <mss
1460,sackOK,timestamp 143947824 3810846040,nop,wscale 6>

08:35:07.530281 IP (tos 0x0, ttl 64, id 63110, offset 0, flags [DF], proto
6,
length: 52) example2.com.48625 > example.com.ssh: .
[tcp sum ok] ack 1 win 1460 <nop,nop,timestamp 3810846041 143947824>

But it dies right there. No further response at all. Consistently. Ever.
Until
the reset button is hit. Then runs flawlessly for a month or so.

Typically what I find if I dig through log files is the system clock seems
to
get wierd. Example just prior to system going belly up:


65.55.110.76 - - [26/Jul/2009:06:51:08 -0400] "GET
/academic-programs/teacher-education/ba-elementary-p-5 HTTP/1.1" 200

65.55.110.76 - - [26/Jul/2009:06:51:08 -0400] "GET
/academic-programs/teacher-education/ba-elementary-p-5 HTTP/1.1" 200

123.149.115.33 - - [26/Jul/2009:06:41:34 -0400] "GET
/academic-programs/teacher-education/ HTTP/1.1" 404 -

123.149.115.33 - - [26/Jul/2009:06:41:34 -0400] "GET
/academic-programs/teacher-education/ HTTP/1.1" 404 - "-" "-"

74.6.22.182 - - [26/Jul/2009:07:45:07 -0400] "GET
/alumni_development/endowingCampaign.html HTTP/1.0" 404 20

74.6.22.182 - - [26/Jul/2009:07:45:07 -0400] "GET
/alumni_development/endowingCampaign.html HTTP/1.0" 404 20 "-" "Mozil

65.55.210.87 - - [26/Jul/2009:06:58:03 -0400] "GET
/future-students/grad/why-mc
HTTP/1.1" 200 20

65.55.210.87 - - [26/Jul/2009:06:58:03 -0400] "GET
/future-students/grad/why-mc
HTTP/1.1" 200 20 "-" "msnbot/1.1 (+http

74.6.22.182 - - [26/Jul/2009:07:45:08 -0400] "GET
/calendar/athletics/2009-07-02
HTTP/1.0" 404 20

74.6.22.182 - - [26/Jul/2009:07:45:08 -0400] "GET
/calendar/athletics/2009-07-02
HTTP/1.0" 404 20 "-" "Mozilla/5.0 (com

123.149.115.33 - - [26/Jul/2009:06:41:32 -0400] "GET
/academic-programs/academic-calendar/ HTTP/1.1" 404 -

123.149.115.33 - - [26/Jul/2009:06:41:32 -0400] "GET
/academic-programs/academic-calendar/ HTTP/1.1" 404 - "-" "-"

This is a pretty active site. The correct time was 6:41.

Typically there is not anything interesting in syslog, but this time there
was
a bunch oom-killer actions against apache processes at 7:45. The time is
wrong
and after the wierdness started so I don't know whether to trust this. Or
whether its an effect or a cause of another problem.

This server is headless in a datacenter, so I am limited with what I can do
remotely (especially if I can't connect).

Any ideas how to hunt this down?

--
Hal

--
ubuntu-users mailing list
ubuntu-users@xxxxxxxxxxxxxxxx
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/ubuntu-users




--
David McNally
david3333333@xxxxxxxxx
apt-get moo
--
ubuntu-users mailing list
ubuntu-users@xxxxxxxxxxxxxxxx
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users


Relevant Pages

  • Re: 400/800 OS in a XL
    ... By writing all zeros, they remain as input. ... they are input in the power-up or reset state. ... This is completely harmless because the ports are ... Again, PBCTL, as all other PIA registers are already zero after reset. ...
    (comp.sys.atari.8bit)
  • Re: WaitForMultipleObjects and its acceptable handle types
    ... a handle to a file (which includes things like sockets and com ports and other things that are implemented as file object handles) becomes signaled when operations complete on it and no event handle is supplied with an OVERLAPPED structure. ... However, these aren't really event objects, and you can't pass handles to them to event-specific functions like SetEvent. ... handle is "reset" before an I/O operation begins and "signaled" when it ends ...
    (microsoft.public.win32.programmer.kernel)
  • [PATCH 1/7 2.6.28] cxgb3 - reset the adapter on fatal error
    ... when a fatal error occurs, bring ports down, reset the chip, ... and bring ports back up. ... Factorize code used for both EEH and fatal error recovery. ... * Bring the ports down, reset the chip, bring the ports back up. ...
    (Linux-Kernel)
  • Re: Software firewall and Xp firewall...
    ... >I'm not familiar with NPF personally but if by stealthing you mean ... >ports he scans on your machine he will go elsewhere to find his next victim. ... Keeping your ports from responding ... which results in the potential attacker not even ...
    (comp.security.firewalls)
  • Re: Software firewall and Xp firewall...
    ... >>preventing the ports from responding to scans then it can in fact be ... which results in the potential attacker not even ...
    (comp.security.firewalls)