Bug 471450
Summary: | Occasional failure on lookup when empty AAAA response comes before good A response | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mads Kiilerich <mads> | ||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 10 | CC: | drepper, jakub, k.georgiou, sacredfox, tim, vanmeeuwen+fedora | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-12-08 22:23:28 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mads Kiilerich
2008-11-13 19:07:49 UTC
Openssh do memset(&hints, 0, sizeof(hints)); hints.ai_family = family; hints.ai_socktype = SOCK_STREAM; snprintf(strport, sizeof strport, "%u", port); if ((gaierr = getaddrinfo(host, strport, &hints, &aitop)) != 0) fatal("%s: Could not resolve hostname %.100s: %s", __progname, host, ssh_gai_strerror(gaierr)); and are thus not following Ullrichs advice on Bug 459756#24 - can that explain this? Perhaps openssh should be bugged? But for now, for f10: The openssh apparently worked before, and it would probably be easier to put a workaround in glibc than fixing all applications which have another (possibly wrong) opinion on how to do name resolution... ssh certainly should use AI_ADDRCONFIG. It should still work. You said you captured the traffic. Try to compile a little program with essentially the code in comment #1, run it under strace, capture DNS traffic using wireshark. Created attachment 324345 [details]
dns lookup test program
That was hard to reproduce. "My" DNS sometimes AAAA before A, but I can't reproduce it on command.
As workaround I use an ugly hack: I use iptables to drop some of the udp responses, tuning on match on package length, so that it completely drops the A answer but AAAA responses come through:
-A INPUT -p udp -m length --length 77 -j LOG --log-prefix "dropping "
-A INPUT -p udp -m length --length 77 -j DROP
-A INPUT -p udp -j LOG --log-prefix "not dropping "
With that I can reproduce the ssh behaviour I mentioned - and do the following.
I run the attached test program with hg as parameter. resolv.conf has "domain dadomain.com" and "search dadomain.com" and "nameserver 192.168.45.13".
When using ai_family = AF_INET then it twice sends an request for A and waits in a poll for 5 s, and then it fails with -2 (EAI_NONAME?).
With ai_family = AF_UNSPEC it sends an request for A and an request for AAAA, gets the AAAA response (with soa of the nameserver as authoritative ns for the domain), waits in poll for 5 s, and then it fails the same way:
getaddrinfo("hg", "(null)", {ai_family=0, ai_socktype=1}, {}) = -2
(I am not familiar with NS api, but in either case I would expect getaddrinfo to return something more like EAI_AGAIN (-3) instead of being conclusive after just 1-2 lossy udp attempts.)
The relevant(?) output from strace -v -s200:
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.45.13")}, 28) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1227298167, 857199}, NULL) = 0
poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
send(3, "\232\314\1\0\0\1\0\0\0\0\0\0\2hg\10dadomain\3com\0\0\1\0\1"..., 33, MSG_NOSIGNAL) = 33
poll([{fd=3, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=3, revents=POLLOUT}])
send(3, "\205\302\1\0\0\1\0\0\0\0\0\0\2hg\10dadomain\3com\0\0\34\0\1"..., 33, MSG_NOSIGNAL) = 33
gettimeofday({1227298167, 858053}, NULL) = 0
poll([{fd=3, events=POLLIN}], 1, 4999) = 1 ([{fd=3, revents=POLLIN}])
ioctl(3, FIONREAD, [84]) = 0
recvfrom(3, "\205\302\205\200\0\1\0\0\0\1\0\0\2hg\10dadomain\3com\0\0\34\0\1\300\17\0\6\0\1\0\0\16\20\0'\4srvx\300\17\nhostmaster\0\0\0\27E\0\0\3\204\0\0\2X\0\1Q\200\0\0\16\20"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.45.13")}, [16]) = 84
gettimeofday({1227298167, 858786}, NULL) = 0
poll([{fd=3, events=POLLIN}], 1, 4998) = 0 (Timeout)
close(3) = 0
Using glibc-2.9-2.i686.
Domain name carefully modified to protect the innocent.
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping I faced the same problem with glibc/kernel in F-10 release. $ rpm -q glibc kernel glibc-2.9-2.i686 kernel-2.6.27.5-117.fc10.i686 To avoid this, I disabled ipv6 using modprobe.conf. $ cat /etc/modprobe.conf install ipv6 : This workaround would not solve the problem at all, but it would be helpful to users who does not use IPv6. glibc-2.9-3 mentioned on bug 459756 seems to work around this problem too. BTW: I notice that now requests are sent with 5 s intervals, but while A requests are retried 4 times AAAA requests are only retried 2 times. I don't know if that is intended, but it makes it more OK that it fails with EAI_NONAME in case of failure. At least as long as I don't use IPv6 ... Thanks, Jakub! I will keep testing it to make sure it really works. Let's dupe this bug. No reason to keep it open as well, it's the same issue. *** This bug has been marked as a duplicate of bug 459756 *** |