Bug 454206 - Intermittant DNS failures with curl, firefox, yum, ...; works fine with nslookup
Summary: Intermittant DNS failures with curl, firefox, yum, ...; works fine with nslookup
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-06 16:14 UTC by Tom London
Modified: 2008-07-29 05:56 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-29 05:56:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
output of 10 conscutive runs of "curl; getent; nslookup" on mirrors.fedoraproject.org (7.20 KB, text/plain)
2008-07-08 01:13 UTC, Tom London
no flags Details
Output of "strace -tt curl http://mirrors.fedoraproject.org" (29.44 KB, text/plain)
2008-07-08 13:39 UTC, Tom London
no flags Details
output of 'strace -tt -T curl ....' (67.66 KB, text/plain)
2008-07-08 15:50 UTC, Tom London
no flags Details
Output of 'strace -tt -T curl ...' when capturing with wireshark (33.69 KB, text/plain)
2008-07-08 16:14 UTC, Tom London
no flags Details
wireshark capture of packets (including DNS) during "strace -tt -T curl ..." (5.99 KB, application/octet-stream)
2008-07-08 16:15 UTC, Tom London
no flags Details
output of 'strace -tt -T curl http://mirrors.fedoraproject.org' (65.37 KB, text/plain)
2008-07-09 13:26 UTC, Tom London
no flags Details
Wireshark capture of packets during 'strace -tt -T http://mirrors.fedoraproject.org' (171.83 KB, application/octet-stream)
2008-07-09 13:28 UTC, Tom London
no flags Details
wireshark capture showing "successful" 'getent ahosts mirrors.fedoraproject.org' (19.13 KB, text/plain)
2008-07-10 14:54 UTC, Tom London
no flags Details
wireshark capture showing "5 second" 'getent ahosts mirrors.fedoraproject.org' (15.99 KB, text/plain)
2008-07-10 14:55 UTC, Tom London
no flags Details
Console output from F9 system running "cat /etc/resolv.conf; time strace getent ahosts mirrors.fedoraproject.org" (9.89 KB, text/plain)
2008-07-12 20:33 UTC, Tom London
no flags Details
wireshark capture of "time getent ahosts mirrors.fedoraproject.org" (20.88 KB, text/plain)
2008-07-23 14:43 UTC, Tom London
no flags Details

Description Tom London 2008-07-06 16:14:06 UTC
Description of problem:
[Could this be a glibc issue?]

I seem to be suffering from (seemingly random) DNS failures on my system.  Below
shows three consecutive runs of "curl 'http://mirrors.fedoraproject.org/....'",
the first and last of which succeed, but the second one fails.  [I ran these
with no more than 2-3 seconds pause between them.]

Of course, this is causing challenges for other applications (like yum, firefox,
....).

This appears to happen both with and without nscd running.

Also, any idea why these all seem to be taking a minimum of 5 seconds?

If I run "nslookup", I get no such failures, and the time is much less than 5
seconds:

[root@localhost ~]# time curl
'http://mirrors.fedoraproject.org/mirrorlist?repo=rawhide&arch=i386'
# repo = rawhide arch = i386 country = US country = CA 
http://fedora.mirrors.tds.net/pub/fedora/development/i386/os
http://www.gtlib.gatech.edu/pub/fedora.redhat/linux/development/i386/os
http://mirrors.cat.pdx.edu/fedora/linux/development/i386/os
ftp://fedora.bu.edu/development/i386/os
http://mirrors.kernel.org/fedora/development/i386/os
http://mirror.cc.vt.edu/pub/fedora/linux/development/i386/os
http://ftp.ndlug.nd.edu/pub/fedora/linux/development/i386/os
http://archive.linux.duke.edu/pub/fedora/linux/development/i386/os
http://mirror.its.uidaho.edu/pub/fedora/linux/development/i386/os
http://mirror.hiwaay.net/pub/fedora/linux/development/i386/os
http://gulus.usherbrooke.ca/pub/distro/fedora/linux/development/i386/os

real	0m5.561s
user	0m0.000s
sys	0m0.019s
[root@localhost ~]# time curl
'http://mirrors.fedoraproject.org/mirrorlist?repo=rawhide&arch=i386'
curl: (6) Couldn't resolve host 'mirrors.fedoraproject.org'

real	0m5.018s
user	0m0.002s
sys	0m0.016s
[root@localhost ~]# time curl
'http://mirrors.fedoraproject.org/mirrorlist?repo=rawhide&arch=i386'
# repo = rawhide arch = i386 country = US country = CA 
http://mirrors.kernel.org/fedora/development/i386/os
http://www.gtlib.gatech.edu/pub/fedora.redhat/linux/development/i386/os
ftp://fedora.bu.edu/development/i386/os
http://fedora.mirrors.tds.net/pub/fedora/development/i386/os
http://mirrors.cat.pdx.edu/fedora/linux/development/i386/os
http://mirror.hiwaay.net/pub/fedora/linux/development/i386/os
http://mirror.its.uidaho.edu/pub/fedora/linux/development/i386/os
http://mirror.cc.vt.edu/pub/fedora/linux/development/i386/os
http://ftp.ndlug.nd.edu/pub/fedora/linux/development/i386/os
http://archive.linux.duke.edu/pub/fedora/linux/development/i386/os
http://gulus.usherbrooke.ca/pub/distro/fedora/linux/development/i386/os

real	0m5.191s
user	0m0.002s
sys	0m0.016s
[root@localhost ~]# 
[root@localhost ~]# while :; do time nslookup mirrors.fedoraproject.org; sleep
2; done
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 209.132.176.122
Name:	fedoraproject.org
Address: 66.35.62.162


real	0m0.030s
user	0m0.006s
sys	0m0.013s
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 209.132.176.122
Name:	fedoraproject.org
Address: 66.35.62.162


real	0m0.101s
user	0m0.005s
sys	0m0.014s
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 209.132.176.122
Name:	fedoraproject.org
Address: 66.35.62.162


real	0m0.031s
user	0m0.003s
sys	0m0.016s
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 66.35.62.162
Name:	fedoraproject.org
Address: 209.132.176.122


real	0m0.034s
user	0m0.005s
sys	0m0.011s


Version-Release number of selected component (if applicable):
glibc-2.8.90-8.i686
bind-utils-9.5.0-37.fc10.i386
glibc-common-2.8.90-8.i386
curl-7.18.2-3.fc10.i386
glibc-headers-2.8.90-8.i386
nscd-2.8.90-8.i386
glibc-debuginfo-common-2.8.90-8.i386
curl-debuginfo-7.18.2-3.fc10.i386
glibc-devel-2.8.90-8.i386
glibc-debuginfo-2.8.90-8.i686
glibc-utils-2.8.90-8.i386


How reproducible:
Very, very often

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Tom London 2008-07-06 17:25:49 UTC
Here are consecutive runs of "yum -x gnome-sharp\* update", with the first 2
failing and the third succeeding:

[root@localhost ~]# yum -x gnome-sharp\* update
Loaded plugins: fastestmirror, presto, refresh-packagekit, versionlock
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist
http://mirrors.fedoraproject.org/mirrorlist?repo=rawhide&arch=i386 error was
[Errno 4] IOError: <urlopen error (-2, 'Name or service not known')>
Error: Cannot find a valid baseurl for repo: rawhide
[root@localhost ~]# 
[root@localhost ~]# yum -x gnome-sharp\* update
Loaded plugins: fastestmirror, presto, refresh-packagekit, versionlock
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist
http://mirrors.fedoraproject.org/mirrorlist?repo=rawhide&arch=i386 error was
[Errno 4] IOError: <urlopen error (-2, 'Name or service not known')>
Error: Cannot find a valid baseurl for repo: rawhide
[root@localhost ~]# yum -x gnome-sharp\* update
Loaded plugins: fastestmirror, presto, refresh-packagekit, versionlock
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist
http://mirrors.fedoraproject.org/mirrorlist?repo=rawhide&arch=i386 error was
[Errno 4] IOError: <urlopen error (-2, 'Name or service not known')>
Error: Cannot find a valid baseurl for repo: rawhide
[root@localhost ~]# yum -x gnome-sharp\* update
Loaded plugins: fastestmirror, presto, refresh-packagekit, versionlock
Loading mirror speeds from cached hostfile
 * rawhide: mirrors.cat.pdx.edu
 * rawhide-debuginfo: mirrors.cat.pdx.edu
Setting up and reading Presto delta metadata
rawhide                                                  | 2.4 kB     00:00     
primary.sqlite.bz2                                       | 6.5 MB     00:04     
No Presto metadata available for rawhide
rawhide-debuginfo                                        | 1.9 kB     00:00     
primary.sqlite.bz2                                       | 1.1 MB     00:00     
No Presto metadata available for rawhide-debuginfo
Excluding Packages in global exclude list
Finished
Reading version lock configuration
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package openoffice.org-xsltfilter.i386 1:3.0.0-0.0.22.1.fc10 set to be updated
<<<<SNIP>>>>

Comment 2 Tom London 2008-07-06 20:58:01 UTC
Some more data, this time with www.nytimes.com.

I ran 2 shell loops: the first does 5 "curl http://www.nytimes.com", the second
does 5 "nslookup www.nytimes.com".

All of the curl's took a bit more than 5 seconds but the fourth: it took about
.75 seconds.  All the nslookup's were less than .1 seconds.

[root@localhost ~]# for i in 1 1 1 1 1; do time curl http://www.nytimes.com
>/dev/null; done
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103k    0  103k    0     0  18777      0 --:--:--  0:00:05 --:--:--  240k

real	0m5.667s
user	0m0.005s
sys	0m0.032s
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  104k    0  104k    0     0  18781      0 --:--:--  0:00:05 --:--:--  233k

real	0m5.687s
user	0m0.002s
sys	0m0.025s
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  104k    0  104k    0     0  18870      0 --:--:--  0:00:05 --:--:--  232k

real	0m5.685s
user	0m0.003s
sys	0m0.022s
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103k    0  103k    0     0   145k      0 --:--:-- --:--:-- --:--:--  229k

real	0m0.751s
user	0m0.002s
sys	0m0.030s
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  104k    0  104k    0     0  18855      0 --:--:--  0:00:05 --:--:--  230k

real	0m5.696s
user	0m0.002s
sys	0m0.024s
[root@localhost ~]# for i in 1 1 1 1 1; do time nslookup www.nytimes.com
>/dev/null; done

real	0m0.054s
user	0m0.003s
sys	0m0.016s

real	0m0.032s
user	0m0.000s
sys	0m0.012s

real	0m0.024s
user	0m0.002s
sys	0m0.010s

real	0m0.026s
user	0m0.003s
sys	0m0.009s

real	0m0.026s
user	0m0.001s
sys	0m0.011s
[root@localhost ~]# 


Comment 3 Jakub Jelinek 2008-07-07 16:23:53 UTC
timing nslookup vs. curl isn't exactly fair nor interesting, as curl does
a lot of stuff nslookup doesn't.  You can use
getent hosts www.nytimes.com
and/or
getent ahosts www.nytimes.com
to do a more useful comparisons.

Comment 4 Tom London 2008-07-07 17:14:20 UTC
Sorry for the miscue with 'nslookup'.

Trying the above plus 'getent hosts' at work (different ISP, different DNS)
fails to reproduce the issue:

[root@localhost ~]# time getent hosts www.nytimes.com
199.239.137.200 www.nytimes.com
199.239.137.245 www.nytimes.com
199.239.136.200 www.nytimes.com
199.239.136.245 www.nytimes.com

real	0m0.015s
user	0m0.001s
sys	0m0.007s
[root@localhost ~]# time getent hosts mirrors.fedoraproject.org
209.132.176.122 fedoraproject.org mirrors.fedoraproject.org
66.35.62.162    fedoraproject.org mirrors.fedoraproject.org

real	0m0.401s
user	0m0.002s
sys	0m0.006s
[root@localhost ~]# time curl http://mirrors.fedoraproject.org >/dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  155k  100  155k    0     0   167k      0 --:--:-- --:--:-- --:--:--  214k

real	0m0.945s
user	0m0.000s
sys	0m0.025s
[root@localhost ~]# 

Home ISP is Comcast.

I'll retest tonight when I get home.....

[If there is some ISP related issue, what should I be looking for.... ?]

Comment 5 Tom London 2008-07-08 01:11:45 UTC
Back at home (Comcast), and there is a difference:  curl takes about 5 seconds
about 4 out of 5 times, getent and nslookup take a few hundreths of a second
each.  I'll attach the complete output as an attachment.

Basically, there are about half a dozen consecutive runs with the curl taking
more than five seconds, followed by 2 runs with the curl taking less than half a
second, followed by the remaining runs back at more than 5 seconds.

There seems to be something about my Comcast connection.

This affects more than just curl, I also get about 5 second delays in firefox,
yum, etc.

Any suggestions on how to "track this down" will be greatly appreciated!


[tbl@localhost ~]$ for i in 1 1 1 1 1 1 1 1 1 1; do time curl
http://mirrors.fedoraproject.org >/dev/null; time getent hosts
mirrors.fedoraproject.org; time nslookup mirrors.fedoraproject.org; done
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  155k  100  155k    0     0  29682      0  0:00:05  0:00:05 --:--:--  579k

real	0m5.390s
user	0m0.003s
sys	0m0.048s
209.132.176.122 fedoraproject.org mirrors.fedoraproject.org
66.35.62.162    fedoraproject.org mirrors.fedoraproject.org

real	0m0.031s
user	0m0.001s
sys	0m0.007s
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 66.35.62.162
Name:	fedoraproject.org
Address: 209.132.176.122


real	0m0.101s
user	0m0.001s
sys	0m0.018s

<<<<SNIP ..... more of the above>>>>>
<<<<Now 2 in a row at under half a second>>>>

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  155k  100  155k    0     0   449k      0 --:--:-- --:--:-- --:--:--  637k

real	0m0.361s
user	0m0.005s
sys	0m0.032s
209.132.176.122 fedoraproject.org mirrors.fedoraproject.org
66.35.62.162    fedoraproject.org mirrors.fedoraproject.org

real	0m0.032s
user	0m0.002s
sys	0m0.006s
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 66.35.62.162
Name:	fedoraproject.org
Address: 209.132.176.122


real	0m0.039s
user	0m0.001s
sys	0m0.016s
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  155k  100  155k    0     0   362k      0 --:--:-- --:--:-- --:--:--  515k

real	0m0.445s
user	0m0.005s
sys	0m0.024s
209.132.176.122 fedoraproject.org mirrors.fedoraproject.org
66.35.62.162    fedoraproject.org mirrors.fedoraproject.org

real	0m0.032s
user	0m0.001s
sys	0m0.007s
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 209.132.176.122
Name:	fedoraproject.org
Address: 66.35.62.162


real	0m0.028s
user	0m0.003s
sys	0m0.013s

<<<<< Now back to 5.x seconds for curl >>>>>

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  155k  100  155k    0     0  29780      0  0:00:05  0:00:05 --:--:--  585k

real	0m5.359s
user	0m0.003s
sys	0m0.020s
209.132.176.122 fedoraproject.org mirrors.fedoraproject.org
66.35.62.162    fedoraproject.org mirrors.fedoraproject.org

real	0m0.033s
user	0m0.000s
sys	0m0.004s
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 66.35.62.162
Name:	fedoraproject.org
Address: 209.132.176.122


real	0m0.044s
user	0m0.002s
sys	0m0.011s

<<<<< SNIP >>>>>

[tbl@localhost ~]$ 







Comment 6 Tom London 2008-07-08 01:13:48 UTC
Created attachment 311215 [details]
output of 10 conscutive runs of "curl; getent; nslookup" on mirrors.fedoraproject.org

Comment 7 Ulrich Drepper 2008-07-08 02:00:28 UTC
(In reply to comment #5)
> Back at home (Comcast), and there is a difference:  curl takes about 5 seconds
> about 4 out of 5 times, getent and nslookup take a few hundreths of a second
> each.  I'll attach the complete output as an attachment.

Since getent and nslookup behave (approximately) the same this has nothing to do
with name resolution.   It's either a problem with the networking code or just a
problem with your connection.  curl actually downloads something and this means
the server and network play into it.  This might very well explain the
difference in timings.  Use

  strace -tt -T <program>

to show where the time is spent.

Anyway, you haven't shown any DNS problems, just slow network traffic.  If there
are problems with your network connection this of course can also explain DNS
problems.

Unless you say that you still also see DNS issues you should close this bug.  If
and when you can see problems with the networking code you can file a bug for
the kernel.  I very much doubt there is anything like this, though.

Comment 8 Tom London 2008-07-08 13:39:54 UTC
Created attachment 311265 [details]
Output of "strace -tt curl http://mirrors.fedoraproject.org"

Thanks again for helping here, as this has gotten me flustered.....

I attach the output of "strace -tt curl http://mirrors.fedoraproject.org".

Here is a snippet showing the 5 second "pause".

My limited understanding of this is that the process sits and waits 5 seconds
on a poll([{fd=3, events=POLLIN}]) after receiving (partial?) response from
DNS.

This appears to be in a call after loading /lib/libnss_dns.so.2.

Sorry if I am "crying wolf" here.... but this is quite reproducible.....

06:24:21.019907 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
06:24:21.020347 connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("68.87.76.178")}, 28) = 0
06:24:21.020591 fcntl64(3, F_GETFL)	= 0x2 (flags O_RDWR)
06:24:21.020749 fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
06:24:21.020893 gettimeofday({1215523461, 20976}, NULL) = 0
06:24:21.021080 poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
06:24:21.021277 send(3,
"\250\345\1\0\0\1\0\0\0\0\0\0\7mirrors\rfedoraproje"..., 43, MSG_NOSIGNAL) = 43

06:24:21.021674 poll([{fd=3, events=POLLIN|POLLOUT, revents=POLLOUT}], 1, 5000)
= 1
06:24:21.022121 send(3, "W\375\1\0\0\1\0\0\0\0\0\0\7mirrors\rfedoraproje"...,
43, MSG_NOSIGNAL) = 43
06:24:21.022566 gettimeofday({1215523461, 22725}, NULL) = 0
06:24:21.022824 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 4998) = 1
06:24:21.033435 ioctl(3, FIONREAD, [104]) = 0
06:24:21.033781 recvfrom(3,
"W\375\201\200\0\1\0\1\0\1\0\0\7mirrors\rfedoraproje"..., 2048, 0,
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")},
[16]) = 104
06:24:21.034510 gettimeofday({1215523461, 34602}, NULL) = 0
06:24:21.034705 poll([{fd=3, events=POLLIN}], 1, 4986) = 0
06:24:26.021112 close(3)		= 0
06:24:26.022035 time(NULL)		= 1215523466
06:24:26.022219 write(2, "c", 1c)	 = 1
06:24:26.022398 write(2, "u", 1u)	 = 1

Comment 9 Tom London 2008-07-08 13:40:55 UTC
Manually clearing NEEDINFO.

Comment 10 Ulrich Drepper 2008-07-08 15:14:14 UTC
(In reply to comment #8)
> My limited understanding of this is that the process sits and waits 5 seconds
> on a poll([{fd=3, events=POLLIN}]) after receiving (partial?) response from
> DNS.

You need to collect the DNS trafic using wireshark as well.  The strace output
alone is not sufficient.

I.e., collect strace and wireshark output at the same time.

Comment 11 Jakub Jelinek 2008-07-08 15:24:37 UTC
Also, if curl uses getaddrinfo rather than gethostbyname, you should try
getent ahosts rather than getent hosts.

Comment 12 Ulrich Drepper 2008-07-08 15:36:15 UTC
Furthermore as I explained before, use -tt -T for strace, not just -tt.  Without
we don't know for sure how long the syscall takes.

Comment 13 Tom London 2008-07-08 15:42:03 UTC
Sorry, I missed the '-T' above; my oversight.

I'll rerun as you describe.

Debugging my script (converting to ahosts), I get the following "puzzling"
result.  The first run of "curl" fails, but the 'getent ahosts' seem to be
taking about 5 seconds now, just like curl.

I'm working on the script to generate the strace output, and I'll figure out how
to run/capture wireshark.

Thanks again for your help.

[tbl@localhost ~]$ for i in 1 1 1 1 1 1 1 1 1 1; do echo curl; time curl
http://mirrors.fedoraproject.org >/dev/null; echo getent ahosts; time getent
ahosts mirrors.fedoraproject.org; echo nslookup; time nslookup
mirrors.fedoraproject.org; done
curl

curl: (6) Couldn't resolve host 'mirrors.fedoraproject.org'

real	0m5.053s
user	0m0.004s
sys	0m0.020s
getent ahosts
209.132.176.122 STREAM fedoraproject.org
209.132.176.122 DGRAM  
209.132.176.122 RAW    
66.35.62.162    STREAM 
66.35.62.162    DGRAM  
66.35.62.162    RAW    

real	0m5.036s
user	0m0.000s
sys	0m0.010s
nslookup
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 66.35.62.162
Name:	fedoraproject.org
Address: 209.132.176.122


real	0m0.339s
user	0m0.006s
sys	0m0.021s
curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  155k  100  155k    0     0  29413      0  0:00:05  0:00:05 --:--:--  491k

real	0m5.428s
user	0m0.001s
sys	0m0.024s
getent ahosts
66.35.62.162    STREAM fedoraproject.org
66.35.62.162    DGRAM  
66.35.62.162    RAW    
209.132.176.122 STREAM 
209.132.176.122 DGRAM  
209.132.176.122 RAW    

real	0m5.008s
user	0m0.000s
sys	0m0.008s
nslookup
Server:		68.87.76.178
Address:	68.87.76.178#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = fedoraproject.org.
Name:	fedoraproject.org
Address: 66.35.62.162
Name:	fedoraproject.org
Address: 209.132.176.122


real	0m0.041s
user	0m0.002s
sys	0m0.017s




Comment 14 Tom London 2008-07-08 15:50:16 UTC
Created attachment 311282 [details]
output of 'strace -tt -T curl ....'

Here is a '5 second' curl output.

I'm still working on the wireshark, and I'm looking at the strace output for
the calls to curl that fail with "Couldn't resolve ...."

Comment 15 Tom London 2008-07-08 16:12:37 UTC
OK. I ran 

[tbl@localhost ~]$ time strace -o curl-strace.wire -tt -T curl
http://mirrors.fedoraproject.org >/dev/null

curl: (6) Couldn't resolve host 'mirrors.fedoraproject.org'

real	0m5.157s
user	0m0.012s
sys	0m0.140s
[tbl@localhost ~]$ 

5 seconds, and "Couldn't resolve".

I attach below first the "strace" output, and then the wireshark capture (sorry
for a few extra packets).

Hope this is what you asked for.  If not I can run again....

Comment 16 Tom London 2008-07-08 16:14:00 UTC
Created attachment 311290 [details]
Output of 'strace -tt -T curl ...' when capturing with wireshark

This strace is of "Couldn't resolve" run

Comment 17 Tom London 2008-07-08 16:15:17 UTC
Created attachment 311291 [details]
wireshark capture of packets (including DNS) during "strace -tt -T curl ..."

Wireshark capture.....

Comment 18 Tom London 2008-07-09 13:25:46 UTC
Here's another run:

[tbl@localhost ~]$ time strace -o curl-strace.txt -tt -T curl
http://mirrors.fedoraproject.org >/dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  155k  100  155k    0     0  29458      0  0:00:05  0:00:05 --:--:--  568k

real	0m5.558s
user	0m0.032s
sys	0m0.296s
[tbl@localhost ~]$ 

The wireshark trace shows the DNS traffic and what appears to be a 5 second
pause with no DNS retries. 

Strace shows:
06:18:55.465098 ioctl(3, FIONREAD, [89]) = 0 <0.000029>
06:18:55.465385 recvfrom(3,
"\201\331\201\200\0\1\0\3\0\0\0\0\7mirrors\rfedoraproje"..., 2048, 0,
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")},
[16]) = 89 <0.000034>
06:18:55.465960 gettimeofday({1215609535, 466055}, NULL) = 0 <0.000027>
06:18:55.466192 poll([{fd=3, events=POLLIN}], 1, 4985) = 0 <4.985732>
06:19:00.452707 close(3)    

This seems to happen to me from home (Comcast), not at work (NTT/ATT) (or not as
frequently).

I attach below the output of strace and the capture file.

Comment 19 Tom London 2008-07-09 13:26:50 UTC
Created attachment 311373 [details]
output of 'strace -tt -T curl http://mirrors.fedoraproject.org'

Comment 20 Tom London 2008-07-09 13:28:33 UTC
Created attachment 311374 [details]
Wireshark capture of packets during 'strace -tt -T http://mirrors.fedoraproject.org'

Shows 5 second pause after DNS ....

Comment 21 Tom London 2008-07-09 13:59:25 UTC
For completeness, here is output of "strace -tt -T getent ahosts
mirrors.fedoraproject.org" with the same 5 second pause.

getent appears to load in 'lib/resolve.so before it 'hangs'.


[tbl@localhost ~]$ strace -tt -T getent ahosts mirrors.fedoraproject.org
06:55:06.661389 execve("/usr/bin/getent", ["getent", "ahosts",
"mirrors.fedoraproject.org"], [/* 40 vars */]) = 0 <0.000716>
06:55:06.663825 brk(0)                  = 0x81e5000 <0.000058>
06:55:06.664229 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory) <0.000116>
06:55:06.664754 open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000128>
06:55:06.665397 fstat64(3, {st_mode=S_IFREG|0644, st_size=104170, ...}) = 0
<0.000061>
06:55:06.666061 mmap2(NULL, 104170, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb80e4000
<0.000071>
06:55:06.666373 close(3)                = 0 <0.000058>
06:55:06.666702 open("/lib/libc.so.6", O_RDONLY) = 3 <0.000136>
06:55:06.667107 read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\267`\0004\0\0\0"..., 512) =
512 <0.000069>
06:55:06.667563 fstat64(3, {st_mode=S_IFREG|0755, st_size=1511052, ...}) = 0
<0.000060>
06:55:06.668224 mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80e3000 <0.000066>
06:55:06.668549 mmap2(0x5f5000, 1517136, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x5f5000 <0.000071>
06:55:06.668848 mmap2(0x762000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16c) = 0x762000 <0.000091>
06:55:06.669211 mmap2(0x765000, 9808, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x765000 <0.000073>
06:55:06.669495 close(3)                = 0 <0.000058>
06:55:06.669816 mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80e2000 <0.000064>
06:55:06.670090 set_thread_area({entry_number:-1 -> 6, base_addr:0xb80e26c0,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1,
seg_not_present:0, useable:1}) = 0 <0.000059>
06:55:06.670524 mprotect(0x804d000, 4096, PROT_READ) = 0 <0.000073>
06:55:06.670803 mprotect(0x762000, 8192, PROT_READ) = 0 <0.000070>
06:55:06.671105 mprotect(0x5f1000, 4096, PROT_READ) = 0 <0.000070>
06:55:06.671408 munmap(0xb80e4000, 104170) = 0 <0.000093>
06:55:06.672053 brk(0)                  = 0x81e5000 <0.000059>
06:55:06.672335 brk(0x8206000)          = 0x8206000 <0.000063>
06:55:06.672649 open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
<0.000138>
06:55:06.673140 fstat64(3, {st_mode=S_IFREG|0644, st_size=80040704, ...}) = 0
<0.000060>
06:55:06.673788 mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ee2000
<0.000181>
06:55:06.674244 close(3)                = 0 <0.000033>
06:55:06.674546 socket(PF_NETLINK, SOCK_RAW, 0) = 3 <0.000066>
06:55:06.674752 bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
<0.000059>
06:55:06.674981 getsockname(3, {sa_family=AF_NETLINK, pid=5515,
groups=00000000}, [12]) = 0 <0.000034>
06:55:06.675552 time(NULL)              = 1215611706 <0.000033>
06:55:06.675762 sendto(3, "\24\0\0\0\26\0\1\3:\303tH\0\0\0\0\0\0\0\0", 20, 0,
{sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20 <0.000080>
06:55:06.676073 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000},
msg_iov(1)=[{"0\0\0\0\24\0\2\0:\303tH\213\25\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"...,
4096}], msg_controllen=0, msg_flags=0}, 0) = 168 <0.000080>
06:55:06.676512 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000},
msg_iov(1)=[{"@\0\0\0\24\0\2\0:\303tH\213\25\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"...,
4096}], msg_controllen=0, msg_flags=0}, 0) = 192 <0.000073>
06:55:06.676923 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000},
msg_iov(1)=[{"\24\0\0\0\3\0\2\0:\303tH\213\25\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"...,
4096}], msg_controllen=0, msg_flags=0}, 0) = 20 <0.000045>
06:55:06.677312 close(3)                = 0 <0.000087>
06:55:06.677589 socket(PF_FILE, SOCK_STREAM, 0) = 3 <0.000070>
06:55:06.677786 fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000035>
06:55:06.678013 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"},
110) = -1 ENOENT (No such file or directory) <0.000083>
06:55:06.678529 close(3)                = 0 <0.000063>
06:55:06.678741 socket(PF_FILE, SOCK_STREAM, 0) = 3 <0.000060>
06:55:06.678928 fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000039>
06:55:06.679129 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"},
110) = -1 ENOENT (No such file or directory) <0.000073>
06:55:06.679546 close(3)                = 0 <0.000058>
06:55:06.679759 open("/etc/nsswitch.conf", O_RDONLY) = 3 <0.000083>
06:55:06.680030 fstat64(3, {st_mode=S_IFREG|0644, st_size=1678, ...}) = 0 <0.000057>
06:55:06.680416 mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80fd000 <0.000045>
06:55:06.680591 read(3, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1678
<0.000100>
06:55:06.680918 read(3, "", 4096)       = 0 <0.000042>
06:55:06.681082 close(3)                = 0 <0.000046>
06:55:06.681265 munmap(0xb80fd000, 4096) = 0 <0.000055>
06:55:06.681453 open("/etc/host.conf", O_RDONLY) = 3 <0.000078>
06:55:06.681692 fstat64(3, {st_mode=S_IFREG|0644, st_size=17, ...}) = 0 <0.000040>
06:55:06.682058 mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80fd000 <0.000059>
06:55:06.682245 read(3, "order hosts,bind\n", 4096) = 17 <0.000079>
06:55:06.682498 read(3, "", 4096)       = 0 <0.000041>
06:55:06.682659 close(3)                = 0 <0.000046>
06:55:06.682824 munmap(0xb80fd000, 4096) = 0 <0.000051>
06:55:06.683005 getpid()                = 5515 <0.000038>
06:55:06.683185 open("/etc/resolv.conf", O_RDONLY) = 3 <0.000077>
06:55:06.683426 fstat64(3, {st_mode=S_IFREG|0644, st_size=152, ...}) = 0 <0.000039>
06:55:06.683790 mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80fd000 <0.000043>
06:55:06.683961 read(3, "# generated by NetworkManager, d"..., 4096) = 152
<0.000078>
06:55:06.684273 read(3, "", 4096)       = 0 <0.000041>
06:55:06.684434 close(3)                = 0 <0.000045>
06:55:06.684600 munmap(0xb80fd000, 4096) = 0 <0.000048>
06:55:06.684796 open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000078>
06:55:06.685038 fstat64(3, {st_mode=S_IFREG|0644, st_size=104170, ...}) = 0
<0.000304>
06:55:06.685671 mmap2(NULL, 104170, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb80e4000
<0.000046>
06:55:06.685840 close(3)                = 0 <0.000041>
06:55:06.686035 open("/lib/libnss_files.so.2", O_RDONLY) = 3 <0.000112>
06:55:06.686347 read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\31\0\0004\0\0\0"..., 512) =
512 <0.000070>
06:55:06.686624 fstat64(3, {st_mode=S_IFREG|0755, st_size=46728, ...}) = 0
<0.000040>
06:55:06.686999 mmap2(NULL, 49812, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x111000 <0.000048>
06:55:06.687200 mmap2(0x11c000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa) = 0x11c000 <0.000053>
06:55:06.687399 close(3)                = 0 <0.000039>
06:55:06.687598 mprotect(0x11c000, 4096, PROT_READ) = 0 <0.000046>
06:55:06.687789 munmap(0xb80e4000, 104170) = 0 <0.000056>
06:55:06.687986 open("/etc/hosts", O_RDONLY|0x80000 /* O_??? */) = 3 <0.000094>
06:55:06.688242 fcntl64(3, F_GETFD)     = 0x1 (flags FD_CLOEXEC) <0.000039>
06:55:06.688415 fstat64(3, {st_mode=S_IFREG|0644, st_size=140, ...}) = 0 <0.000039>
06:55:06.688780 mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80fd000 <0.000043>
06:55:06.688953 read(3, "# Do not remove the following li"..., 4096) = 140
<0.000079>
06:55:06.689265 read(3, "", 4096)       = 0 <0.000041>
06:55:06.689428 close(3)                = 0 <0.000045>
06:55:06.689594 munmap(0xb80fd000, 4096) = 0 <0.000047>
06:55:06.689771 open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000077>
06:55:06.690012 fstat64(3, {st_mode=S_IFREG|0644, st_size=104170, ...}) = 0
<0.000040>
06:55:06.690394 mmap2(NULL, 104170, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb80e4000
<0.000045>
06:55:06.690559 close(3)                = 0 <0.000038>
06:55:06.690747 open("/lib/libnss_dns.so.2", O_RDONLY) = 3 <0.000108>
06:55:06.691029 read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\f\0\0004\0\0\0"..., 512) =
512 <0.000081>
06:55:06.695474 fstat64(3, {st_mode=S_IFREG|0755, st_size=22100, ...}) = 0
<0.000042>
06:55:06.695878 mmap2(NULL, 24708, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x11e000 <0.000053>
06:55:06.696068 mmap2(0x123000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4) = 0x123000 <0.000055>
06:55:06.696288 close(3)                = 0 <0.000040>
06:55:06.696473 open("/lib/libresolv.so.2", O_RDONLY) = 3 <0.000127>
06:55:06.696773 read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\220\221m\0024\0\0\0"..., 512) =
512 <0.000068>
06:55:06.697045 fstat64(3, {st_mode=S_IFREG|0755, st_size=77496, ...}) = 0
<0.000055>
06:55:06.697425 mmap2(0x26d7000, 84164, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x26d7000 <0.000445>
06:55:06.698008 mmap2(0x26e8000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11) = 0x26e8000 <0.000073>
06:55:06.698241 mmap2(0x26ea000, 6340, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x26ea000 <0.000048>
06:55:06.698414 close(3)                = 0 <0.000040>
06:55:06.698617 mprotect(0x26e8000, 4096, PROT_READ) = 0 <0.000048>
06:55:06.698793 mprotect(0x123000, 4096, PROT_READ) = 0 <0.000047>
06:55:06.698970 munmap(0xb80e4000, 104170) = 0 <0.000061>
06:55:06.699252 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 <0.000082>
06:55:06.699480 connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("68.87.76.178")}, 28) = 0 <0.000057>
06:55:06.699734 fcntl64(3, F_GETFL)     = 0x2 (flags O_RDWR) <0.000040>
06:55:06.699896 fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000039>
06:55:06.700055 gettimeofday({1215611706, 700138}, NULL) = 0 <0.000056>
06:55:06.700255 poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1 <0.000040>
06:55:06.700467 send(3, "\253\10\1\0\0\1\0\0\0\0\0\0\7mirrors\rfedoraproje"...,
43, MSG_NOSIGNAL) = 43 <0.000106>
06:55:06.700773 poll([{fd=3, events=POLLIN|POLLOUT, revents=POLLOUT}], 1, 5000)
= 1 <0.000040>
06:55:06.700983 send(3, "\357G\1\0\0\1\0\0\0\0\0\0\7mirrors\rfedoraproje"...,
43, MSG_NOSIGNAL) = 43 <0.000066>
06:55:06.701296 gettimeofday({1215611706, 701366}, NULL) = 0 <0.000040>
06:55:06.701474 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 4998) = 1
<0.082569>
06:55:06.784246 ioctl(3, FIONREAD, [104]) = 0 <0.000043>
06:55:06.784543 recvfrom(3,
"\357G\201\200\0\1\0\1\0\1\0\0\7mirrors\rfedoraproje"..., 2048, 0,
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")},
[16]) = 104 <0.000051>
06:55:06.785102 gettimeofday({1215611706, 785191}, NULL) = 0 <0.000040>
06:55:06.785302 poll([{fd=3, events=POLLIN}], 1, 4914) = 0 <4.914651>
06:55:11.700232 close(3)                = 0 <0.000117>
06:55:11.700662 exit_group(2)           = ?
[tbl@localhost ~]$ 


Comment 22 Tom London 2008-07-09 14:23:53 UTC
Narrowing this a bit.....

I booted up my wife's Fedora 9 system @home (on same LAN, using same router,
same ISP (Comcast)), and I do not see this problem.

'time curl http://mirrors.fedoraproject.org' always succeeds, and always takes
about 1.5 seconds or less.

Is there some additional debugging/tracing I can do?

Comment 23 Tom London 2008-07-09 14:52:51 UTC
More data:

I changed the nameserver entries in my /etc/resolv.conf to point at the
nameservers from my "@work" ISP.

All worked perfectly and fast (< 1 sec).

So something different between Comcast and other DNS?

Here are the nameservers that "work":
nameserver 208.67.222.222
nameserver 208.67.220.220

Here are the Comcast nameservers that "don't work" with Rawhide (but do with F9):
nameserver 68.87.76.178
nameserver 68.87.78.130


Here is snippet from strace of curl:

07:44:30.259107 send(3, "3\376\1\0\0\1\0\0\0\0\0\0\7mirrors\rfedoraproje"...,
43, MSG_NOSIGNAL) = 43 <0.000052>
07:44:30.259327 gettimeofday({1215614670, 259384}, NULL) = 0 <0.000027>
07:44:30.259466 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 4999) = 1
<0.015200>
07:44:30.274875 ioctl(3, FIONREAD, [57]) = 0 <0.000058>
07:44:30.275178 recvfrom(3,
"3\376\201\200\0\1\0\1\0\0\0\0\7mirrors\rfedoraproje"..., 2048, 0,
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("208.67.222.222")},
[16]) = 57 <0.000044>
07:44:30.275590 gettimeofday({1215614670, 275651}, NULL) = 0 <0.000031>
07:44:30.275738 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 4982) = 1
<0.087030>
07:44:30.362993 ioctl(3, FIONREAD, [89]) = 0 <0.000090>
07:44:30.363370 recvfrom(3,
"cm\201\200\0\1\0\3\0\0\0\0\7mirrors\rfedoraproje"..., 1991, 0,
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("208.67.222.222")},
[16]) = 89 <0.000082>
07:44:30.364083 close(3)                = 0 <0.000109>
07:44:30.364494 open("/etc/gai.conf", O_RDONLY) = -1 ENOENT (No such file or
directory) <0.000124>




Comment 24 Tom London 2008-07-10 13:47:23 UTC
OK.  This is a bit spookey......

This morning I decided to edit out the first nameserver from /etc/resolv.conf:

[root@localhost ~]# cat /etc/resolv.conf
# generated by NetworkManager, do not edit!

domain hsd1.ca.comcast.net.

search hsd1.ca.comcast.net.

# nameserver 68.87.76.178
nameserver 68.87.78.130

[root@localhost ~]# 

The result is that the issue vanishes:

[root@localhost ~]# time getent ahosts mirrors.fedoraproject.org
209.132.176.122 STREAM fedoraproject.org
209.132.176.122 DGRAM  
209.132.176.122 RAW    
66.35.62.162    STREAM 
66.35.62.162    DGRAM  
66.35.62.162    RAW    

real	0m0.118s
user	0m0.000s
sys	0m0.007s
[root@localhost ~]# 


Uncommenting the nameserver line makes the 5 second pause come back:

[tbl@localhost ~]$ time getent ahosts mirrors.fedoraproject.org

real	0m5.008s
user	0m0.001s
sys	0m0.008s
[tbl@localhost ~]$ 

So this appears to be an issue with 'nameserver 68.87.76.178' (but not with
'nameserver 68.87.78.130').

Sigh.

I'll do some 'differential comparisons' of wireshark captures.

Double sigh....


Comment 25 Tom London 2008-07-10 14:52:35 UTC
I've captured two wireshark runs with "getent ahosts mirrors.fedoraproject.org".

The first (using the "good" nameserver) shows 4 DNS packets: 2 outgoing request
packets to the nameserver, and 2 response packets from the nameserver, the
second with the "A 66.35.62.162" answer.

The second (using the "bad" nameserver) shows 3 DNS packets: the same 2 outgoing
request packets to the nameserver, and 1 response packet from the nameserver
with the "A 66.35.62.162".  The resolver then hangs for 5 seconds in a poll().

Could the resolver code still be wanting the other packet even though it has
gotten the response?

I'll attach text dumps of the 2 capture files below.



Comment 26 Tom London 2008-07-10 14:54:13 UTC
Created attachment 311485 [details]
wireshark capture showing "successful" 'getent ahosts mirrors.fedoraproject.org'

This one has 2 DNS response packets.  getent completes in a few tenths of a
second.

Comment 27 Tom London 2008-07-10 14:55:21 UTC
Created attachment 311486 [details]
wireshark capture showing "5 second" 'getent ahosts mirrors.fedoraproject.org'

This one has only one DNS response packet.  getent takes 5 seconds and fails.

Comment 28 Ulrich Drepper 2008-07-10 15:43:39 UTC
(In reply to comment #25)
> 
> The first (using the "good" nameserver) shows 4 DNS packets: 2 outgoing request
> packets to the nameserver, and 2 response packets from the nameserver, the
> second with the "A 66.35.62.162" answer.
> 
> The second (using the "bad" nameserver) shows 3 DNS packets: the same 2 outgoing
> request packets to the nameserver, and 1 response packet from the nameserver
> with the "A 66.35.62.162".  The resolver then hangs for 5 seconds in a poll().


OK, this is expected.  This would also happen for lookups with older releases. 
The only time you wouldn't see the delay if the program explicitly requests only
IPv4 addresses (i.e., it is not prepared for IPv6).

The nameserver simply doesn't react to IPv6 requests.  This is a server
configuration problem and you should contact your ISV.  The second server in
your file does it right, it replies with a CNAME but nothing else.  Since the
lookup happens using UDP the only way for the resolver to notice that the server
is not replying is to wait for the timeout (which is 5 secs).

To confirm you can try an F9 installation (or older).  Use

   getent ahosts mirrors.fedoraproject.org

(note: ahosts, not hosts).  With this you'll see that the program will issue two
requests and also only receive one reply.  With this code the replies are issued
one after the other instead of concurrently (that's an improvement made in newer
code) so the order might look different.

If you see a different behaviour using this code I'd be surprised.  This would
likely mean the requests we issue changed.  I don't think it's the case.

So, please report on the behaviour of old code.  If you don't have an older
installation you can extract the file  files needed from some old RPMs.  Start
getent in this case like

    /path/to/ld-linux-x86-64.so.2 --library-path /path/to/libs /usr/bin/getent
ahosts mirrors.fedoraproject.org

where /path/to[...] points to the extracted files.

Comment 29 Tom London 2008-07-10 17:02:34 UTC
I think I tested this before on my wife's FC9 system, and I did not observe this
issue there.

I can test this again when I get home tonight.

To be clear, you are expecting this to fail in the same way on F9 as on Rawhide,
right?

If I need to wrestle with Comcast, I'd like to have all the data in place....

Thanks again for your help here.

Comment 30 Ulrich Drepper 2008-07-10 17:27:17 UTC
(In reply to comment #29)
> I think I tested this before on my wife's FC9 system, and I did not observe this
> issue there.

This really would surprise me.  Don't use curl or so, use getent ahosts.  Who
knows whether curl was changed.



> To be clear, you are expecting this to fail in the same way on F9 as on Rawhide,
> right?

WRT to getent ahosts, yes.  If this does show a difference then the server
reacts differently.  This must mean the requests somehow differ.  This is easy
to verify using tcpdump.

Whether applications behave the same is a completely different issues.

Oh, and of course getent ahosts will not try to query IPv6 addresses if no IPv6
interface exists.  So, that playing field has to be levelled, too.


Comment 31 Tom London 2008-07-12 20:32:04 UTC
Sorry for the delay, but I discovered that my wife's system was an older F8, not F9.

So I spent this morning doing a clean install of F9 on that box (along with
released updates), afterwhich I ran "time getent ahosts mirrors.fedoraproject.org".

The bottom line is that the stock F9 system does not display this condition.
"time getent ahosts mirrors.fedoraproject.org" with the unmodified
/etc/resolv.conf completes in less than a tenth of a second.

At the same time, my Rawhide system, connected to the same Linksys router takes
5 seconds  (and fails):
[root@localhost ~]# time getent ahosts mirrors.fedoraproject.org

real	0m5.008s
user	0m0.002s
sys	0m0.006s
[root@localhost ~]# 

So this F9 system is not failing the same way as Rawhide against the same DNS
server.

I attach below an strace output of running "time getent ahosts
mirrors.fedoraproject.org" (sorry, I forgot the "-tt -T", but the output from
time shows this taking 0.09 seconds).

I make my Rawhide system "friendly" by commenting out the first nameserver in
/etc/resolv.conf.

Would wireshark outputs from F9 be of use?

Comment 32 Tom London 2008-07-12 20:33:54 UTC
Created attachment 311652 [details]
Console output from F9 system running "cat /etc/resolv.conf; time strace getent ahosts mirrors.fedoraproject.org"

Console output from fresh F9 system running "cat /etc/resolv.conf; time strace
getent ahosts mirrors.fedoraproject.org"

Comment 33 Tom London 2008-07-23 14:41:42 UTC
Sorry for the delay.

I've now installed wireshark on a basically stock F9 system and traced "time
getent ahosts mirrors.fedoraproject.org".

I attach the text dump from wireshark below.  Compare with the wireshark dump in
https://bugzilla.redhat.com/show_bug.cgi?id=454206#c27 obtained on a rawhide system.

The bottom line: F9 system completes in fractions of a second, Rawhide system
takes 5 seconds and ususally prints nothing.

I'm sorry if I am being obtuse here..... I just want to make sure that before I
approach Comcast (ugh....) that all the bases are covered.

Currently, I manually edit /etc/resolv.conf on my Rawhide system on each boot
(when I am at home) to comment out the first nameserver entry.

[tbl@localhost ~]$ cat /etc/resolv.conf
# generated by NetworkManager, do not edit!

domain hsd1.ca.comcast.net.

search hsd1.ca.comcast.net.

#nameserver 68.87.76.178
nameserver 68.87.78.130

[tbl@localhost ~]$ 

Here is the (identical but for the comment) resolv.conf from F9:
[tbl@localhost ~]$ cat resolv.conf.f9
# generated by NetworkManager, do not edit!

domain hsd1.ca.comcast.net.

search hsd1.ca.comcast.net.

nameserver 68.87.76.178
nameserver 68.87.78.130

[tbl@localhost ~]$ 

Wireshark dump below....


Comment 34 Tom London 2008-07-23 14:43:11 UTC
Created attachment 312482 [details]
wireshark capture of "time getent ahosts mirrors.fedoraproject.org"

Wireshark capture from stock F9 system on same LAN as mis-behaving rawhide
system.

Comment 35 Tom London 2008-07-24 13:22:00 UTC
Additional data..... :

Comcast added a new name server to my resolv.conf this morning.  There are now 3
name servers listed, the same two as before (68.87.76.178, 68.87.78.130), plus a
new one (68.87.69.146).

Commenting out the 2 "old ones" so the system only uses the "new one" reproduces
the "takes 5 seconds" and fails sometimes behavior:
[tbl@localhost ~]$ cat /etc/resolv.conf
# generated by NetworkManager, do not edit!

domain hsd1.ca.comcast.net.

search hsd1.ca.comcast.net.

#nameserver 68.87.76.178
#nameserver 68.87.78.130
nameserver 68.87.69.146

[tbl@localhost ~]$ 


[root@localhost ~]# time getent ahosts mirrors.fedoraproject.org
66.35.62.162    STREAM fedoraproject.org
66.35.62.162    DGRAM  
66.35.62.162    RAW    
209.132.176.122 STREAM 
209.132.176.122 DGRAM  
209.132.176.122 RAW    

real	0m5.008s
user	0m0.002s
sys	0m0.007s
[root@localhost ~]# time getent ahosts mirrors.fedoraproject.org

real	0m5.008s
user	0m0.000s
sys	0m0.009s
[root@localhost ~]# 

Obivously Comcast is unaware that they are causing problems....


Comment 36 Tom London 2008-07-28 14:57:06 UTC
OK.  I read your responses a bit more carefully (esp. #28).

After turning off IPV6 on my rawhide system by creating a file in
/etc/modprobe.d with:

[root@localhost modprobe.d]# cat kill-ipv6
install ipv6 /bin/true
[root@localhost modprobe.d]# 

and turning off ip6tables with chkconfig, the system runs fine with the "faulty
(at least for IPv6) DNS server".

I believe that fact that IPv6 DNS seems to work "everywhere" but this server
shows (per #28) that the server is not properly configured to handle IPv6 DNS
requests.

I will attempt to work this with Comcast.

Thanks for all your help here.....

Comment 37 Tom London 2008-07-28 22:50:24 UTC
For completeness, I post here a replay of my Chat with Comcast.  Result is not
too positive..... :-(

--------------------------------------
user Tom_ has entered room
Tom(Mon Jul 28 2008 11:03:38 GMT-0700 (PDT))> Comcast DNS servers are all not
configured to handle IPv6 DNS requests.  How to fix?
analyst Robin has entered room
Robin(Mon Jul 28 2008 11:03:41 GMT-0700 (PDT))> Hello Tom_, Thank you for
contacting Comcast Live Chat Support. My name is Robin. Please give me one
moment to review your information.
Robin(Mon Jul 28 2008 11:03:56 GMT-0700 (PDT))> Good Morning!
Tom_(Mon Jul 28 2008 08:04:00 GMT-0700 (PDT))> hi
Robin(Mon Jul 28 2008 11:04:34 GMT-0700 (PDT))> I am not quite sure what you are
asking for. Are you able to explain in a little bit more detail please?
Tom_(Mon Jul 28 2008 08:05:12 GMT-0700 (PDT))> Sure.  Comcast DHCP provides my
system with 3 "nameserver" DNS servers:
Tom_(Mon Jul 28 2008 08:05:19 GMT-0700 (PDT))> nameserver 68.87.76.178
nameserver 68.87.78.130 nameserver 68.87.69.146
Tom_(Mon Jul 28 2008 08:06:00 GMT-0700 (PDT))> Only the second of these
(68.86.79.130) properly responds to my systems IPV6 DNS requests.  The other 2
provide buggy responses.
Tom_(Mon Jul 28 2008 08:06:35 GMT-0700 (PDT))> Can you have Tier 3/4 tech
support review the configurations for these servers?
Robin(Mon Jul 28 2008 11:06:32 GMT-0700 (PDT))> Okay, Before we get started, For
your protection, and the security of your account; Can you please provide *one*
of the following items?   The last four-digits of the Social Security Number on
the account,  - OR -  the last payment amount to post to the account,  - OR -  
the account number - OR - the customer number.
Tom_(Mon Jul 28 2008 08:07:06 GMT-0700 (PDT))> Uhhhh
Robin(Mon Jul 28 2008 11:08:49 GMT-0700 (PDT))> Thank you, At this time I’ll
need to transfer you to our Advanced Technical Support team.  One moment please.
Robin(Mon Jul 28 2008 11:09:12 GMT-0700 (PDT))> Please wait, while the problem
is escalated to another analyst
analyst Ron has entered room
Ron(Mon Jul 28 2008 11:09:22 GMT-0700 (PDT))> Thank you for waiting while the
chat was transferred to me.  My name is Ron. One moment please while I read the
chat session so far, access your account, and assess the issue. 
analyst Robin has left room
Ron(Mon Jul 28 2008 11:11:04 GMT-0700 (PDT))> Hello Tom. 
Tom_(Mon Jul 28 2008 08:11:09 GMT-0700 (PDT))> For full technical details,
including packet traces/dumps
Tom_(Mon Jul 28 2008 08:11:17 GMT-0700 (PDT))> please review:
https://bugzilla.redhat.com/show_bug.cgi?id=454206
Tom_(Mon Jul 28 2008 08:11:36 GMT-0700 (PDT))> hi
Ron(Mon Jul 28 2008 11:16:45 GMT-0700 (PDT))> While I understand what you are
requesting, Comcast does not offer technical support for it. The DNS servers are
updated on a regular bases. Well within the two week guidelines set forth by
industry standards. 
Tom_(Mon Jul 28 2008 08:17:13 GMT-0700 (PDT))> Sorry, but you don't understand.
Tom_(Mon Jul 28 2008 08:17:24 GMT-0700 (PDT))> Nothing to do with updating the
servers for new entries.
Tom_(Mon Jul 28 2008 08:17:46 GMT-0700 (PDT))> Your servers are not properly
configured to correctly respond to DNS reqeusts from IPv6 DNS clients.
Tom_(Mon Jul 28 2008 08:17:55 GMT-0700 (PDT))> Perhaps you can escalate this?
Tom_(Mon Jul 28 2008 08:18:37 GMT-0700 (PDT))> One of your DNS servers works
properly.  The other 2 do not.
Tom_(Mon Jul 28 2008 08:18:53 GMT-0700 (PDT))> the packet traces provided in the
link above show the details.
Ron(Mon Jul 28 2008 11:21:41 GMT-0700 (PDT))> One moment please. 
Tom_(Mon Jul 28 2008 08:23:41 GMT-0700 (PDT))> Drat.  This is not efficient.  Is
there another way for me to report this/get this fixed?
Tom_(Mon Jul 28 2008 08:25:54 GMT-0700 (PDT))> Can I have a case#/ticket# to
refer to if/when I try to "call" back?
Ron(Mon Jul 28 2008 11:30:03 GMT-0700 (PDT))> I apologize for the delay. 
Ron(Mon Jul 28 2008 11:30:26 GMT-0700 (PDT))> I send the information you gave me
to our networking group. The said they will look into it. 
Ron(Mon Jul 28 2008 11:30:45 GMT-0700 (PDT))> Since I did this via an internal
email to them, there isn't a ticket number. 
Tom_(Mon Jul 28 2008 08:32:35 GMT-0700 (PDT))> Any ETA for any kind of response?
 Will they contact me?
Ron(Mon Jul 28 2008 11:34:07 GMT-0700 (PDT))> They normally look into the issues
I email them the same day as I send them. However, they do have up to 72 hours. 
Ron(Mon Jul 28 2008 11:34:41 GMT-0700 (PDT))> They just responded to me. 
Ron(Mon Jul 28 2008 11:35:24 GMT-0700 (PDT))> Currently the Comcast network is
designed to use IPv4. Updates with be made when needed for future support of IPv6. 


Comment 38 Ulrich Drepper 2008-07-29 05:48:25 UTC
(In reply to comment #34)
> Created an attachment (id=312482) [edit]
> wireshark capture of "time getent ahosts mirrors.fedoraproject.org"
> 
> Wireshark capture from stock F9 system on same LAN as mis-behaving rawhide
> system.

This trace shows that IPv6 lookups are not performed.  No wonder it's faster in
the presence of a name server which doesn't respond to IPv6 queries.

Comment 39 Ulrich Drepper 2008-07-29 05:56:13 UTC
(In reply to comment #27)
> Created an attachment (id=311486) [edit]
> wireshark capture showing "5 second" 'getent ahosts mirrors.fedoraproject.org'
> 
> This one has only one DNS response packet.  getent takes 5 seconds and fails.

OK, this one does an IPv6 lookup but there is no answer.  Therefore the 5 sec delay.

This means there is nothing wrong with the resolver.  This is the expected
behaviour.  If you want the "old" behaviour in the presence of the broken name
servers you have to disable all IPv6 addresses on your machine.


Note You need to log in before you can comment on or make changes to this bug.