Bug 209954

Summary: Caching nameserver setup gets no responses from root servers
Product: [Fedora] Fedora Reporter: Axel Thimm <axel.thimm>
Component: bindAssignee: Adam Tkac <atkac>
Status: CLOSED RAWHIDE QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: ovasik
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-10 17:27:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test config file none

Description Axel Thimm 2006-10-08 22:01:28 UTC
Description of problem:
When using the caching nameserver configuration a query on some random name
results in a query to one of the root servers, e.g.

Capturing on Pseudo-device that captures on all interfaces
  0.000000    127.0.0.1 -> 127.0.0.1    DNS Standard query A test.domain.tld
  0.001936 <ip> -> 198.41.0.4   DNS Standard query A test.domain.tld
  3.005514 <ip> -> 192.228.79.201 DNS Standard query A test.domain.tld
  5.005269    127.0.0.1 -> 127.0.0.1    DNS Standard query A test.domain.tld
  6.009322 <ip> -> 192.33.4.12  DNS Standard query A test.domain.tld
  9.017250 <ip> -> 128.8.10.90  DNS Standard query A test.domain.tld
 12.021372 <ip> -> 192.112.36.4 DNS Standard query A test.domain.tld
 15.025353 <ip> -> 192.203.230.10 DNS Standard query A test.domain.tld
 18.029274 <ip> -> 192.5.5.241  DNS Standard query A test.domain.tld
 21.033230 <ip> -> 128.63.2.53  DNS Standard query A test.domain.tld
 24.037666 <ip> -> 192.36.148.17 DNS Standard query A test.domain.tld
 27.041205 <ip> -> 198.32.64.12 DNS Standard query A test.domain.tld
 30.005357    127.0.0.1 -> 127.0.0.1    DNS Standard query response, Server failure
 30.005644    127.0.0.1 -> 127.0.0.1    DNS Standard query response, Server failure

Version-Release number of selected component (if applicable):
bind-9.3.2-41.fc6
caching-nameserver-9.3.2-41.fc6

How reproducible:
always

Steps to Reproduce:
1.Install the two packages above
2.service named start
3.host test.domain.tld 127.0.0.1
  
Actual results:
# host test.domain.tld 127.0.0.1
;; connection timed out; no servers could be reached
(see also dump above)


Expected results:
(on FC5 for example)
# host test.domain.tld 127.0.0.1
Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases: 

Host test.domain.tld not found: 3(NXDOMAIN)


Additional info:
test.domain.tld is just an example, existing names like www.google.com are
treated just the same, e.g. no response from the root servers. It almost looks
like the root servers ignore this version of bind (???).

Comment 1 Axel Thimm 2006-10-08 22:05:14 UTC
I forgot to finish the description:

When using the caching nameserver configuration a query on some random name
results in a query to one of the root servers which is never replied to.

Comment 2 Martin Stransky 2006-10-24 13:02:04 UTC
Have you pinged these servers? 
Please attach named's messages from /var/log/messages (from named start).
btw. do you use network manager?





Comment 3 Axel Thimm 2006-10-24 22:56:07 UTC
Yes, the servers pinged fine and no, there was no network manager involved. I
replaced the config files with the config files from a working FC5 installation
and everything started working, so it looks like an issue with the default
config files. Here are the log messages from back then:

Oct  8 22:54:38 fifty named[14218]: starting BIND 9.3.2 -u named -c
/etc/named.caching-nameserver.conf
Oct  8 22:54:38 fifty named[14218]: found 2 CPUs, using 2 worker threads
Oct  8 22:54:38 fifty named[14218]: loading configuration from
'/etc/named.caching-nameserver.conf'
Oct  8 22:54:38 fifty named[14218]: listening on IPv6 interface lo, ::1#53
Oct  8 22:54:38 fifty named[14218]: listening on IPv4 interface lo, 127.0.0.1#53
Oct  8 22:54:38 fifty named[14218]: command channel listening on 127.0.0.1#953
Oct  8 22:54:38 fifty named[14218]: command channel listening on ::1#953
Oct  8 22:54:38 fifty named[14218]: zone 0.in-addr.arpa/IN/localhost_resolver:
loaded serial 42
Oct  8 22:54:38 fifty named[14218]: zone
0.0.127.in-addr.arpa/IN/localhost_resolver: loaded serial 1997022700
Oct  8 22:54:38 fifty named[14218]: zone 255.in-addr.arpa/IN/localhost_resolver:
loaded serial 42
Oct  8 22:54:38 fifty named[14218]: zone
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN/localhost_resolver:
loaded serial 1997022700
Oct  8 22:54:38 fifty named[14218]: zone localdomain/IN/localhost_resolver:
loaded serial 42
Oct  8 22:54:38 fifty named[14218]: zone localhost/IN/localhost_resolver: loaded
serial 42
Oct  8 22:54:38 fifty named[14218]: running

Comment 4 Martin Stransky 2006-10-25 06:13:50 UTC
Which version works for you? (from FC5)

Comment 5 Axel Thimm 2006-10-25 20:58:37 UTC
The config files I copied over from FC5 were not the unmodified ones, I had
added local zones at about May/June, so these are config files matching the
begining of FC5. I haven't tried with pure FC5 config files from recent bind
updates.

I just switched back to the default FC6 config files and the issue is still
there, e.g. no response from any root server. Switching to the working setup
yields immediate responses from the root servers. So in the default setup
something must be entering the query packages and gets the package dropped on
the root servers' side.

Comment 6 Martin Stransky 2006-10-25 21:23:19 UTC
Hm, I'm asking you because I can't reproduce it with any configuration, so it's
quite hard to fix it...

Comment 7 Axel Thimm 2006-10-25 22:34:53 UTC
Have you tried on ppc hardware (although I don't see a reason for it to be ppc
specific, the system I#m testing this on is FC6/ppc).

Do you want root access to this system? You can do with the nameserver on it as
you please, I'll point resolve.conf elsewhere. If you'd like to look at it on
the system itself contact me in PM. Other than that I can only offer captured dumps.


Comment 8 Martin Stransky 2006-10-31 09:16:59 UTC
It could be a dupe of this one: 

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211282

Thanks for your offers, I've found one box where I can reproduce it.

Comment 9 Martin Stransky 2006-11-22 10:12:45 UTC
Could you please check bind-9.3.3-6.fc6? There's a new option enable-edns so try
to disable it. (see /usr/share/doc/bind-9.3.3/misc/options)

Comment 10 Axel Thimm 2007-02-07 08:19:29 UTC
Shouldn't the option be off by default to ensure proper operation?


Comment 11 Martin Stransky 2007-02-07 12:12:00 UTC
Definitely not. At least it'll break DNSSEC.

Comment 12 Axel Thimm 2007-02-07 13:09:46 UTC
But now it breaks on standard router hardware in the path to root servers, I
think this is more severe, or not?

Comment 13 Martin Stransky 2007-02-07 16:34:09 UTC
EDNS is really good thing what should be used...and it has been here for many
years. I don't think we should not deform our package because of wrong
configured routers/firewalls. btw. the generic upstream package even doesn't
allow to disable EDNS for all queries....

Comment 14 Axel Thimm 2007-02-07 19:18:29 UTC
In that case this bug is unrelated to EDNS (and therefore not a sibling of bug
#211282), e.g. EDNS is not the issue, since this network segment has been
running DNS services for > 2 decades now.

It also works with a config setup that didn't/doesn't have to enable/disable
EDNS, in fact was running before bind was patched to allow for turning off EDNS
(see comment #5).


Comment 15 Adam Tkac 2007-03-28 09:30:00 UTC
I did some tests and looks that problem is in your firewall configuration. Are
you sure that firewall doesn't dropped responses from root server? (when I were
behind firewall there I got no response and when I completely disable firewall
all works fine). I tried it with rawhide's caching-nameserver-9.4.0-3.fc7.
Please tell me your results.

Regards, -A-

Comment 16 Axel Thimm 2007-03-28 12:20:10 UTC
Adam, the bug seems in the default config, not firewalling or the code. See
comment #3 where I copied over the config of an FC5 system and the queries
worked again.

I'll try again and report back, after all there were two minor upstream releases
since and many package updates as well.

Comment 17 Adam Tkac 2007-03-29 11:22:19 UTC
Created attachment 151186 [details]
test config file

I can't believe that this is a bug. Please try this configfile. If bind not
works correctly with this configuration that means that something blocks dns
responses and this isn't bind problem. If this configuration works correctly
and caching-nameserver's not we could disscuss proposed fix

Regards, -A-

Comment 18 Adam Tkac 2007-04-06 08:25:58 UTC
After next thinking about this bug. Could you please try telnet to affected
computer to port 53? (Of course outside from network). If you can't, something
must throws DNS responses away. Tell me your results, please

-A-

Comment 19 Axel Thimm 2007-04-06 09:29:32 UTC
The default caching-nameserver setup yields again the same issue as the original
report:

  0.000000    127.0.0.1 -> 127.0.0.1    DNS Standard query A test.domain.tld
  0.003267 <ip> -> 192.228.79.201 DNS Standard query A test.domain.tld
  0.003267 <ip> -> 192.228.79.201 DNS Standard query A test.domain.tld
  0.003290 <ip> -> 192.228.79.201 DNS Standard query A test.domain.tld
  2.007135 <ip> -> 128.63.2.53  DNS Standard query A test.domain.tld
  2.007135 <ip> -> 128.63.2.53  DNS Standard query A test.domain.tld
  2.007147 <ip> -> 128.63.2.53  DNS Standard query A test.domain.tld
  4.011329 <ip> -> 193.0.14.129 DNS Standard query A test.domain.tld
  4.011329 <ip> -> 193.0.14.129 DNS Standard query A test.domain.tld
  4.011341 <ip> -> 193.0.14.129 DNS Standard query A test.domain.tld
  5.003525    127.0.0.1 -> 127.0.0.1    DNS Standard query A test.domain.tld
  6.015534 <ip> -> 198.32.64.12 DNS Standard query A test.domain.tld
  6.015534 <ip> -> 198.32.64.12 DNS Standard query A test.domain.tld
  6.015546 <ip> -> 198.32.64.12 DNS Standard query A test.domain.tld
  8.019744 <ip> -> 128.8.10.90  DNS Standard query A test.domain.tld
  8.019744 <ip> -> 128.8.10.90  DNS Standard query A test.domain.tld
  8.019757 <ip> -> 128.8.10.90  DNS Standard query A test.domain.tld
 10.023971 <ip> -> 192.5.5.241  DNS Standard query A test.domain.tld
 10.023971 <ip> -> 192.5.5.241  DNS Standard query A test.domain.tld
 10.023983 <ip> -> 192.5.5.241  DNS Standard query A test.domain.tld
 12.028143 <ip> -> 192.112.36.4 DNS Standard query A test.domain.tld
 12.028143 <ip> -> 192.112.36.4 DNS Standard query A test.domain.tld
 12.028156 <ip> -> 192.112.36.4 DNS Standard query A test.domain.tld
 14.032346 <ip> -> 202.12.27.33 DNS Standard query A test.domain.tld
 14.032346 <ip> -> 202.12.27.33 DNS Standard query A test.domain.tld
 14.032358 <ip> -> 202.12.27.33 DNS Standard query A test.domain.tld
 16.036555 <ip> -> 198.41.0.4   DNS Standard query A test.domain.tld
 16.036555 <ip> -> 198.41.0.4   DNS Standard query A test.domain.tld
 16.036567 <ip> -> 198.41.0.4   DNS Standard query A test.domain.tld
 18.040757 <ip> -> 192.33.4.12  DNS Standard query A test.domain.tld
 18.040757 <ip> -> 192.33.4.12  DNS Standard query A test.domain.tld
 18.040769 <ip> -> 192.33.4.12  DNS Standard query A test.domain.tld
 20.044960 <ip> -> 192.203.230.10 DNS Standard query A test.domain.tld
 20.044960 <ip> -> 192.203.230.10 DNS Standard query A test.domain.tld
 20.044972 <ip> -> 192.203.230.10 DNS Standard query A test.domain.tld
 22.049166 <ip> -> 192.58.128.30 DNS Standard query A test.domain.tld
 22.049166 <ip> -> 192.58.128.30 DNS Standard query A test.domain.tld
 22.049177 <ip> -> 192.58.128.30 DNS Standard query A test.domain.tld
 24.053367 <ip> -> 192.36.148.17 DNS Standard query A test.domain.tld
 24.053367 <ip> -> 192.36.148.17 DNS Standard query A test.domain.tld
 24.053379 <ip> -> 192.36.148.17 DNS Standard query A test.domain.tld
 26.057647 <ip> -> 192.228.79.201 DNS Standard query A test.domain.tld
 26.057647 <ip> -> 192.228.79.201 DNS Standard query A test.domain.tld
 26.057659 <ip> -> 192.228.79.201 DNS Standard query A test.domain.tld
 28.061779 <ip> -> 128.63.2.53  DNS Standard query A test.domain.tld
 28.061779 <ip> -> 128.63.2.53  DNS Standard query A test.domain.tld
 28.061791 <ip> -> 128.63.2.53  DNS Standard query A test.domain.tld
 30.010001    127.0.0.1 -> 127.0.0.1    DNS Standard query response, Server failure
 30.010067    127.0.0.1 -> 127.0.0.1    DNS Standard query response, Server failure

telneting works OK:

# telnet 192.228.79.201 53
Trying 192.228.79.201...
Connected to b.root-servers.net (192.228.79.201).
Escape character is '^]'.

I also tried the named.conf in attachment #151186 [details], but it didn't work either.

I also just tried a bare-metal install of RHEL5 on another system and pulled in
virgin caching-nameserver and bind packages, started named and had the same issue.

Comment 20 Adam Tkac 2007-04-10 15:22:28 UTC
After investigations problem was in query-source option. Will be disabled in
next release.