Bug 804630

Summary: Bad resolution with IPv6 and rotate option in resolv.conf
Product: Red Hat Enterprise Linux 6 Reporter: Martin Kosek <mkosek>
Component: glibcAssignee: Jeff Law <law>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: azelinka, dpal, fweimer, mfranc, mishu, radoslaw.podedworny, tis
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 836016 (view as bug list) Environment:
Last Closed: 2012-06-20 12:10:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 766162, 805204    

Description Martin Kosek 2012-03-19 13:25:46 UTC
Description of problem:
glibc incorrectly resolves hosts when resolv.conf contains IPv6 only nameserver(s) and "options rotate" is set. It always appends a searchdomain even for FQDNs which makes every name resolution to fail.

Test program - getaddrinfo.c:
#include <stdio.h>
#include <stdlib.h>
#include <netdb.h>
#include <netinet/in.h>
#include <sys/socket.h>

int main(void)
{
    struct addrinfo *result;
    struct addrinfo *res;
    int error;

    /* resolve the domain name into a list of addresses */
    error = getaddrinfo("www.example.com", NULL, NULL, &result);
    if (error != 0)
    {
        fprintf(stderr, "error in getaddrinfo: %s\n", gai_strerror(error));
        return EXIT_FAILURE;
    }
    else
    {
        printf("Test OK\n");
    }

    freeaddrinfo(result);
    return EXIT_SUCCESS;
}


Test program result:
# gcc getaddrinfo.c -o getaddrinfo
[root@vm-069 ~]# ./getaddrinfo 
error in getaddrinfo: Name or service not known


Actual DNS queries:
# tcpdump -ni eth0 udp port 53
09:20:15.327171 IP6 fed0:babe:baab:0:216:3eff:fe56:436c.37002 > fed0:babe:baab:0:216:3eff:fe00:7caa.domain: 31256+ A? www.example.com.idm.lab.bos.redhat.com. (56)
09:20:15.327277 IP6 fed0:babe:baab:0:216:3eff:fe56:436c.37002 > fed0:babe:baab:0:216:3eff:fe00:7caa.domain: 6949+ AAAA? www.example.com.idm.lab.bos.redhat.com. (56)
09:20:15.329706 IP6 fed0:babe:baab:0:216:3eff:fe00:7caa.domain > fed0:babe:baab:0:216:3eff:fe56:436c.37002: 31256 NXDomain* 0/1/0 (104)
09:20:15.330053 IP6 fed0:babe:baab:0:216:3eff:fe00:7caa.domain > fed0:babe:baab:0:216:3eff:fe56:436c.37002: 6949 NXDomain* 0/1/0 (104)


/etc/resolv.conf:
; generated by /sbin/dhclient-script
search idm.lab.bos.redhat.com
options rotate
nameserver fed0:babe:baab:0:216:3eff:fe00:7caa


Without "options rotate" in resolv.conf it worked:
# ./getaddrinfo 
Test OK

# tcpdump -ni eth0 udp port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
09:22:24.348149 IP6 fed0:babe:baab:0:216:3eff:fe56:436c.38344 > fed0:babe:baab:0:216:3eff:fe00:7caa.domain: 54805+ A? www.example.com. (33)
09:22:24.348563 IP6 fed0:babe:baab:0:216:3eff:fe56:436c.38344 > fed0:babe:baab:0:216:3eff:fe00:7caa.domain: 37744+ AAAA? www.example.com. (33)
09:22:24.348643 IP6 fed0:babe:baab:0:216:3eff:fe00:7caa.domain > fed0:babe:baab:0:216:3eff:fe56:436c.38344: 54805 1/2/0 A 192.0.43.10 (97)
09:22:24.348743 IP6 fed0:babe:baab:0:216:3eff:fe00:7caa.domain > fed0:babe:baab:0:216:3eff:fe56:436c.38344: 37744 1/2/0 AAAA 2001:500:88:200::10 (109)


Version-Release number of selected component (if applicable):


How reproducible:
glibc-2.12-1.47.el6.x86_64


Steps to Reproduce:
1. Configure resolv.conf with "options rotate" and IPv6 address to nameserver
2. Run getaddrinfo for some fqdn
3.
  
Actual results:
Domain is appended to the query -> it fails

Expected results:
Domain is not appended to the query -> query succeeds

Comment 2 Jeff Law 2012-03-22 22:46:04 UTC
i was looking at it last night, making some progress, but won't be able to look at it again until Monday

Comment 4 Jeff Law 2012-03-30 17:13:05 UTC
Still working on it.  This code is a bloody mess and the last attempt to fix this problem (from the Debian folks) got it wrong and was pulled just a couple hours after being installed.

I had Brock put this on the list of 6.3 known issues while I work to get it resolved.

Funny thing is it'll probably be a trivial looking on-liner once I settle on a change.

Comment 5 Jeff Law 2012-03-30 19:02:15 UTC
Notes for QE, rather than watching data over the wire, I've found it easier to just put a breakpoint in __libc_send and look at the "n" parameter (or examine the buffer itself).

So a good run looks like this on my box:

gdb ./a.out
GNU gdb (GDB) Fedora (7.3.50.20110722-10.fc16)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/law/a.out...done.
(gdb) b main
Breakpoint 1 at 0x40063c: file foo.c, line 14.
(gdb) r
Starting program: /home/law/a.out 

Breakpoint 1, main () at foo.c:14
14          error = getaddrinfo("www.example.com", NULL, NULL, &result);
(gdb) b __libc_send
Breakpoint 2 at 0x7ffff7b18660: file ../sysdeps/unix/sysv/linux/x86_64/send.c, line 26.
(gdb) c
Continuing.

Breakpoint 2, __libc_send (fd=7, buf=0x7fffffffc540, n=33, flags=16384)
    at ../sysdeps/unix/sysv/linux/x86_64/send.c:26
26      {

Where the bad run will have a value like n=56 because the domain name has been bogusly tacked onto the end of the query.

Comment 6 Jeff Law 2012-03-30 19:09:41 UTC
OK.  I think I've got this sorted out.  The Debian guy was pretty close with his change.  He never responded to my query about the case that wasn't working with his change; however, after a lot of pondering I'm pretty sure I found his mistake.

Note that this just fixes the problem with a *single* ipv6 nameserver defined and options rotate; with > 1 IPV6 server defined and options rotate, there's a separate problem which we have decided to not fix for 6.3 (see 771204).  More generally, I just don't have a high degree of confidence in the correctness of much of the IPV6 codepaths.

dev_ack'd.  Once QE acks, I'll commit and spin a new build.

Comment 9 errata-xmlrpc 2012-06-20 12:10:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0763.html

Comment 10 Tuomo Soini 2012-06-27 20:26:10 UTC
Fix added for rhel 6.3 (glibc-2.12-1.80.el6) causes some software to segfault in libresolv if there are ipv6 addresses listed in resolv.conf.

postfix-2.9.3 with ipv6 support enabled segfaults in smtp client.

freshclam (from clamav in epel6) segfaults on libresolv.

Rebuilding glibc without glibc-rh804630.patch fixes these issues.

Comment 11 Radkowski 2012-06-28 12:20:22 UTC
postfix error is only when you want to listen on different than lo interfaces

inet_interfaces = all
#inet_interfaces = localhost

When you change this directive to localhost:

#inet_interfaces = all
inet_interfaces = localhost

... everytning is ok and segfaul is not present.

Comment 12 Radkowski 2012-06-28 12:24:42 UTC
sendmail-8.14.4-8: this same problem

Comment 13 Jeff Law 2012-06-28 16:31:25 UTC
*** Bug 836016 has been marked as a duplicate of this bug. ***