84105 – excessive IPv6 lookups done even when /etc/hosts resolves to IPv4

Bug 84105 - excessive IPv6 lookups done even when /etc/hosts resolves to IPv4

Summary: excessive IPv6 lookups done even when /etc/hosts resolves to IPv4

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	glibc
Sub Component:
Version:	8.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-02-12 07:43 UTC by Landon Curt Noll
Modified:	2016-11-24 14:58 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-04-26 06:10:22 UTC
Embargoed:

Attachments	(Terms of Use)
tcpdumps of excessive IPv6 and some extra IPv4 DNS lookups (30.84 KB, text/plain) 2003-02-12 07:50 UTC, Landon Curt Noll	no flags	Details
View All

Description Landon Curt Noll 2003-02-12 07:43:02 UTC

From Bugzilla Helper: 
User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux; X11; , en) 
 
Description of problem: 
 
Glibc seems to be ignoring /etc/nsswitch.conf.  Even when you place into 
/etc/nsswitch.conf:  
  
   hosts:      files [SUCCESS=return] dns  
  
and you try the excess . hack as described in: 
 
   https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77538#c13 
 
things are not right.  One has a lot of IPv6 DNS lookups going on. 
Observe: 
 
# cat /etc/resolv.conf 
search prv.test1.org 
nameserver 75.25.79.9 
nameserver 200.150.41.9 
 
# egrep '^hosts' /etc/nsswitch.conf 
hosts:      files dns 
 
# egrep 'emma|lucas|gauss|localhost' /etc/hosts 
127.0.0.1      localhost.prv.test1.org localhost 
127.0.0.1      localhost.prv.test1.org. localhost. 
10.1.1.2       emma.prv.test1.org emma.test1.org emma 
10.1.1.2       emma.prv.test1.org. emma.test1.org. emma. 
10.2.2.2       lucas.dmz.test1.org lucas.test1.org lucas 
10.2.2.2       lucas.dmz.test1.org. lucas.test1.org. lucas. 
10.2.2.4       gauss.dmz.test2.net gauss.test2.net gauss 
10.2.2.4       gauss.dmz.test2.net. gauss.test2.net. gauss. 
 
Now when pull the machine off the network so that I cannot contact 
its name servers and do a: 
 
	telnet localhost 
 
then tcp dump shows for a RH8.0 with glibc-2.2.93-5, 8 IPv6 
lookups will be attemted.  SInce the name server is unreachable, 
a full 5*8 seconds of delay is introduced! 
 
It does find 127.0.0.1 (due to the .-hack-a-round for RH8.0), 
but why all of those IPv6 (AAAA?) lookups?!  When your name 
servers are down, those DNS timeouts can be very painful! 
 
Q: How can one prevent these IPv6 DNS lookups? 
 
The same thing happens on RH7.3 with glibc-2.2.5-42 systems. 
 
The command: 
 
	telnet lucas.dmz.test1.org. 
 
will result in 4 IPv6 lookups, followed by 4 IPv4 lookups, followed 
4 more IPv6 lookups!  A whooping 12*5 seconds = 1 minute delay! 
 
The resolver(5) man page suggests there is an inet6 option 
(query AAAA before A).  I do not have this option set. One 
would think that without it, A would occur before AAAA and 
given success in A lookups, the AAAA lookup would be skipped.  
But no ... 
 
 
 
Version-Release number of selected component (if applicable): 
 
 
How reproducible: 
Always 
 
Steps to Reproduce: 
1.remove the host from the network 
  (or block access to name servers(s) 
 
2.In another window, monitor DNS traffic: 
	tcpdump -n -i eth0 udp port domain 
 
3. telnet localhost 
 
or: telnet some.hostname.in.etc.hosts.com 
     
 
Actual Results:  8 IPv6 DNS lookup attempts will be performed. 
 
When the DNS server(s) or the network to the DNS server(s) 
is down, these 5*8 second delay in IPv6 lookups can be 
very painful. 
 
Expected Results:  The hostname (or hostname. entry in the 
case of the RH8.0 hack-a-round  See Bug #77538 comment #c13) 
should match.  Once the IPv4 match occurs, no IPv6 lookups 
should be performed. 
 
 
Additional info: 
 
This excessive IPv6 problem has been found on 
RH8.0 with glibc-2.2.93-5 as well as this 
RH7.3 with glibc-2.2.5-42. 
 
This appears to be a larger pattern if nor processing the 
/etc/nsswitch.conf file and /etc/hosts correctly. 
 
I will attach a detailed tcpdump of various combinations. 
 
There are a number of related bugs: 
 
Bug 61391	telnet delay connecting to site not in DNS 
Bug 58568	glibc does not exactly follow nsswitch.conf settings 
Bug 66682	unexpected nsswitch behavior 
Bug 71546	ldap for user files always used, 
		regardless of nsswitch.conf 
Bug 58568	nis for host files always used, 
		regardless of nsswitch.conf 
Bug 76543	name to IP resolution issues 
Bug 77538	Konqueror will not resolve domain names 
		entered in /etc/hosts file

Comment 1 Landon Curt Noll 2003-02-12 07:50:38 UTC

Created attachment 90022 [details]
tcpdumps of excessive IPv6 and some extra IPv4 DNS lookups

tcpdumps of excessive IPv6 and some extra IPv4 DNS lookups being
performed under various conditions from both ssh and telnet
on both RH8.0 and RH7.3 systems.  Access to their name servers
was blocked by disconnecting the external gateway/router to
their networks.

The actual hostnames and IP addresses were changed.  However
their relationships (network and domain-wise) are real.

Comment 2 Jakub Jelinek 2003-02-17 14:14:08 UTC

Please try rawhide glibc.

Comment 3 Landon Curt Noll 2003-02-17 21:45:40 UTC

Do you mean try glibc-2.3.1-6.i686.rpm on RH8.0? 
If not, which RPM (and location) would you suggest that I test? 
 
Which RPM would you suggest that I test under RH7.3, 
the same?

Comment 4 Jakub Jelinek 2003-02-18 10:33:45 UTC

No, I mean try glibc-2.3.1-46 (or -48) from ftp.redhat.com/pub/redhat/linux/rawhide/
You can try it on 7.3 too (though I'd try it first on some testbox in case it is
a production 7.3 box).

Comment 5 Landon Curt Noll 2003-02-20 18:04:18 UTC

Just to let you know, we have not forgotten your request.
This weekend we plan to load the rawhide glibc* set onto a
RH8.0 machine and rerun the tests as found in the
attachment.

Comment 6 Landon Curt Noll 2003-02-22 15:15:17 UTC

On an RH8.0 system with current/up2date RPMs I installed:

    binutils-2.13.90.0.18-6
    glibc-2.3.1-51
    glibc-common-2.3.1-51
    glibc-devel-2.3.1-51
    glibc-kernheaders-2.4-8.10
    glibc-profile-2.3.1-51
    glibc-utils-2.3.1-51
    memprof-0.5.1-3

from rawhide.  After rebooting, many of the previously reported
DNS problems were resolved.  In particular:

Bug 61391	telnet delay connecting to site not in DNS  Fixed!
Bug 58568	glibc does not exactly follow nsswitch.conf settings  Fixed!
Bug 66682	unexpected nsswitch behavior Fixed!
Bug 76543	name to IP resolution issues  Fixed!
Bug 77538	Konqueror will not resolve domain names 
		  entered in /etc/hosts file  Fixed!

I no longer need the "dots after duplicate hostname lines"
hack-a-round as described in:

  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77538#c13

to resolve entries in /etc/hosts.

Now the /etc/nsswitch.conf line:

    hosts:  files dns

causes /etc/hosts to take priority over DNS.

=-=-=

All the above is good news.  However, the excessive IPv6 issue still remains.

When the host is disconnected from the network and there is an attempt
to resolve a hostname that is not otherwise found in /etc/hosts, a number
of IPv6 lookups are performed head of IPv4 lookups.  A site with 2
DNS servers listed in /etc/resolv.conf will incur 16 DNS timeouts
(8 IPv6 followed by 8 IPv4) of 5 second each for a whopping 80 seconds
of relay before the DNS resolution gives up.

For sites that do not carry IPv6 traffic, it would be very helpful
if they could disable the IPv6 DNS lookups.  Doing so would cut out
1/2 the timeout period when disconnected from the network.

Even when a host is connected to the network and the 1st DNS server
responds, an external DNS resolution must perform 2 IPv6 lookups before
the successful IPv4 lookup is performed.

I saw a typical connection (again when everything was working) delay
on successful DNS lookups jump from 9.4msec (for IPv4 only) to 139.0msec
because of the two extra IPv6 lookups.  Those unneeded IPv6 lookups
increased the DNS connection startup delay by a factor of 14.7!

Your mileage may vary.  But even if those 2 extra IPv6 failed lookups
return results as fast as the successful IPv4 lookup, you will
still be talking about a 3x increase in DNS induced startup delay.

For sites that are doing IPv4 traffic only, I highly recommend some
configuration parameter that allows them to say "do not even bother
doing IPv6 lookups".

Maybe two new keywords could be added to the /etc/nsswitch.conf syntax:

    ipv4    perform a DNS IPv4 based lookup
    ipv6    perform a DNS IPv6 based lookup

The 'dns' keyword could still mean 'IPv6 then IPV4'.  However sites
who only carry IPv4 traffic could do something like:

    hosts:  files ipv4

and see a connection establishment performance increase over the
current case.  Or sites could perform both IPv4 and IPv6, but in
a different order:

    hosts:  files ipv4 ipv6

Successful IPv4 DNS lookups would not have any IPv6 induced penalty
and IPv6 lookups would still occur.

Comment 7 Stephen Walton 2003-02-23 20:05:56 UTC

Mr. Noll's suggestion is good, but would it not be simpler to have glibc not
even do an IPv6 lookup if the local host's IP address is v4?

Comment 8 Landon Curt Noll 2003-02-25 19:25:02 UTC

A clarification on: 
 
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=84105#c6 
 
You should NOT install those RPMs on a production system.  Rawhide 
is raw bits.  Those RPMs were only in relationship to various DNS 
issues.  Those rpms have a number of non-DNS related problems. 
For example, they cause the rpm command to dump core.  They did 
resolve the DNS issues, with the possible exception of excessive 
IPv6 lookups.

Comment 9 Landon Curt Noll 2003-04-03 18:33:11 UTC

See bug #86564 for comments related to IPv6 under RH9.0.

Comment 10 Valeriy Kondrashov 2003-04-04 20:16:04 UTC

The problem is that ftp&telnet are compiled with IPv6 patch,
which links programs with libinet6.a,
and that library turns on resolver's "options inet6" at startup.

Comment 11 Ulrich Drepper 2003-04-26 06:10:22 UTC

I'm interpreting this bug now solely in the glibc sense.  There are other problems
involved (my guess is bugs in programs using name resolving and bugs in PAM).  In
these cases specific bug repots should be filed for those programs.

As for glibc, the code in RHL9 should already solve most problems.  The
getaddrinfo() function does search the services from nsswitch.conf in the
specified order.  If files is listed first and the host info is contained in
/etc/hosts the search stops.  The original RHL8 code (and previous release) did
not have this.

Problems occur if IPv6-enabled programs do not use getaddrinfo().  Some of them
look up names like this:

   gethostbyname2 ("somehost", AF_INET6)
   if (not found)
     gethostbyname2 ("somehost", AF_INET)

In this case the DNS server is contacted if /etc/hosts does not contain an IPv6
address for "somehost".  Only an IPv4 address is not sufficient.  This is the
kind of problems I suspect PAM and various programs to have.  Those programs
have to change, glibc is just fine.


Now, there is one more case where getaddrinfo() does too much: if the system has
no IPv6 interfaces, according to POSIX the programmer can use the AI_ADDRCONFIG
option to prevent looking up IPv6 addresses.  The same is true for IPv4
addresses if no IPv4 interfaces are present.  This flag for getaddrinfo()
is not implemented in RHL9.  But it is now in the official glibc CVS archive.
The next release (maybe the next glibc binary) will have the necessary changes.

Therefore I'm closing the bug with UPSTREAM.

Note You need to log in before you can comment on or make changes to this bug.