Description of problem:
glibc incorrectly resolves hosts when resolv.conf contains IPv6 only nameserver(s) and "options rotate" is set. It always appends a searchdomain even for FQDNs which makes every name resolution to fail.
Test program - getaddrinfo.c:
struct addrinfo *result;
struct addrinfo *res;
/* resolve the domain name into a list of addresses */
error = getaddrinfo("www.example.com", NULL, NULL, &result);
if (error != 0)
fprintf(stderr, "error in getaddrinfo: %s\n", gai_strerror(error));
Test program result:
# gcc getaddrinfo.c -o getaddrinfo
[root@vm-069 ~]# ./getaddrinfo
error in getaddrinfo: Name or service not known
Actual DNS queries:
# tcpdump -ni eth0 udp port 53
09:20:15.327171 IP6 fed0:babe:baab:0:216:3eff:fe56:436c.37002 > fed0:babe:baab:0:216:3eff:fe00:7caa.domain: 31256+ A? www.example.com.idm.lab.bos.redhat.com. (56)
09:20:15.327277 IP6 fed0:babe:baab:0:216:3eff:fe56:436c.37002 > fed0:babe:baab:0:216:3eff:fe00:7caa.domain: 6949+ AAAA? www.example.com.idm.lab.bos.redhat.com. (56)
09:20:15.329706 IP6 fed0:babe:baab:0:216:3eff:fe00:7caa.domain > fed0:babe:baab:0:216:3eff:fe56:436c.37002: 31256 NXDomain* 0/1/0 (104)
09:20:15.330053 IP6 fed0:babe:baab:0:216:3eff:fe00:7caa.domain > fed0:babe:baab:0:216:3eff:fe56:436c.37002: 6949 NXDomain* 0/1/0 (104)
; generated by /sbin/dhclient-script
Without "options rotate" in resolv.conf it worked:
# tcpdump -ni eth0 udp port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
09:22:24.348149 IP6 fed0:babe:baab:0:216:3eff:fe56:436c.38344 > fed0:babe:baab:0:216:3eff:fe00:7caa.domain: 54805+ A? www.example.com. (33)
09:22:24.348563 IP6 fed0:babe:baab:0:216:3eff:fe56:436c.38344 > fed0:babe:baab:0:216:3eff:fe00:7caa.domain: 37744+ AAAA? www.example.com. (33)
09:22:24.348643 IP6 fed0:babe:baab:0:216:3eff:fe00:7caa.domain > fed0:babe:baab:0:216:3eff:fe56:436c.38344: 54805 1/2/0 A 184.108.40.206 (97)
09:22:24.348743 IP6 fed0:babe:baab:0:216:3eff:fe00:7caa.domain > fed0:babe:baab:0:216:3eff:fe56:436c.38344: 37744 1/2/0 AAAA 2001:500:88:200::10 (109)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Configure resolv.conf with "options rotate" and IPv6 address to nameserver
2. Run getaddrinfo for some fqdn
Domain is appended to the query -> it fails
Domain is not appended to the query -> query succeeds
i was looking at it last night, making some progress, but won't be able to look at it again until Monday
Still working on it. This code is a bloody mess and the last attempt to fix this problem (from the Debian folks) got it wrong and was pulled just a couple hours after being installed.
I had Brock put this on the list of 6.3 known issues while I work to get it resolved.
Funny thing is it'll probably be a trivial looking on-liner once I settle on a change.
Notes for QE, rather than watching data over the wire, I've found it easier to just put a breakpoint in __libc_send and look at the "n" parameter (or examine the buffer itself).
So a good run looks like this on my box:
GNU gdb (GDB) Fedora (220.127.116.1110722-10.fc16)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /home/law/a.out...done.
(gdb) b main
Breakpoint 1 at 0x40063c: file foo.c, line 14.
Starting program: /home/law/a.out
Breakpoint 1, main () at foo.c:14
14 error = getaddrinfo("www.example.com", NULL, NULL, &result);
(gdb) b __libc_send
Breakpoint 2 at 0x7ffff7b18660: file ../sysdeps/unix/sysv/linux/x86_64/send.c, line 26.
Breakpoint 2, __libc_send (fd=7, buf=0x7fffffffc540, n=33, flags=16384)
Where the bad run will have a value like n=56 because the domain name has been bogusly tacked onto the end of the query.
OK. I think I've got this sorted out. The Debian guy was pretty close with his change. He never responded to my query about the case that wasn't working with his change; however, after a lot of pondering I'm pretty sure I found his mistake.
Note that this just fixes the problem with a *single* ipv6 nameserver defined and options rotate; with > 1 IPV6 server defined and options rotate, there's a separate problem which we have decided to not fix for 6.3 (see 771204). More generally, I just don't have a high degree of confidence in the correctness of much of the IPV6 codepaths.
dev_ack'd. Once QE acks, I'll commit and spin a new build.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
Fix added for rhel 6.3 (glibc-2.12-1.80.el6) causes some software to segfault in libresolv if there are ipv6 addresses listed in resolv.conf.
postfix-2.9.3 with ipv6 support enabled segfaults in smtp client.
freshclam (from clamav in epel6) segfaults on libresolv.
Rebuilding glibc without glibc-rh804630.patch fixes these issues.
postfix error is only when you want to listen on different than lo interfaces
inet_interfaces = all
#inet_interfaces = localhost
When you change this directive to localhost:
#inet_interfaces = all
inet_interfaces = localhost
... everytning is ok and segfaul is not present.
sendmail-8.14.4-8: this same problem
*** Bug 836016 has been marked as a duplicate of this bug. ***