Re: Mysterious delay establishing any TCP/IP connection

From: Carlos Moreno (moreno_at_mochima_dot_com_at_x.xxx)
Date: 01/07/04


Date: Tue, 06 Jan 2004 19:02:16 -0500


> Ok, your tcpdump command is a little off (Filtering on port 53 prevents
> you from seeing lots of usefull traffic), but thats probably ok, since
> what you did capture probably points to the problem. If it doesn't well
> re-visit your tcpdump options, but for now, it looks like the problem is
> in your name resolution setup. You should never have to query a DNS for
> a localhost name. Go into your /etc directory do the following:
>
> 1) edit the file hosts. Make sure this line appears in it:
> 127.0.0.1 localhost.localdomain localhost
>
> 2) edit the file resolv.conf. Ensure that the file has the directive:
> search blahblahblah.com
> is in the file, where blahblahblah is your domain name
> also ensure it contains the ip addresses of you domain name servers
> correctly
>
> 3) edit the nsswitch.conf file. Ensure that on the hosts: line, the
> files directive appears before the dns directive.

I don't see your message through my newsreader, but I just saw it
on groups.google.com, so I'm replying here...

The *very strange* thing is that everything you mention here checks
fine. The file /etc/hosts has always contained the line

127.0.0.1 localhost.localdomain localhost

I had added extra entries (extra aliases for 127.0.0.1), but then
removed it yesterday thinkging it might be the cause of the
problem -- nothing changed after removing the additional entries)

The file /etc/resolv.conf, however, does not contain the directive
search. It only contains two lines, each starting with the
keyword nameserver and followed by the IP of each of the DNS
servers of our hoster (our provider). But that has always been
like that, to the best of my knowledge, and things were running
fine in the past (including recent past days).

And yes, the file /etc/nsswitch.conf contains the following line:

hosts: files nisplus dns

(I didn't know about this file -- on RedHat 9, I thought that
was controlled by the file /etc/host.conf, which in our case,
has always contained one line: order hosts, bind )

So, given all this, it totally beats me how on earth our server
was taking 5 seconds trying to resolve localhost!! And I say
*was* because this morning, the problem had automagically
disappeared -- this kind of supports the theory that it was
a DNS misconfiguration or temporary malfunction (our hosters
may have fixed it or rebooted their servers... Though they
rent Linux dedicated servers, I wouldn't be surprised that
they were so incompetent as to use Windows machines as their
DNS servers *sigh*)

But regardless of the problem being solved, I'm curious!!
I have no explanation or even speculative ideas as to why
or how *could* a machine with the right setup take 5 seconds
on a hopeless attempt to resolve localhost via DNS server.

The only thing I could add is that this happened the same
day that I upgraded several RPMs (notably the kernel -- I
upgraded to RedHat's patch 2.4.20-27 for the kernel, and
glibc 2.2.5-44). The problem did not necessarily appear
right after the upgrades -- we noticed the problem about
12 hours later, and have no way to know if it had been
occurring before (even before our upgrades, maybe?)

Thanks!

Carlos

--