Bug 1560223

Summary: Regression: unbound unable to resolve when new WLAN connection is made
Product: [Fedora] Fedora Reporter: Dimitris <dimitris.on.linux>
Component: unboundAssignee: Paul Wouters <pwouters>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 27CC: bjorn, cra, dimitris.on.linux, dominik, fedora, mrunge, pemensik, pj.pandit, pwouters, theo148
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-11 17:40:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
unbound.conf under 1.6.8
none
unbound.conf under 1.7.0
none
unbound.conf under 1.7.0 with aggressive-nsec: yes and auth-zone: removed none

Description Dimitris 2018-03-25 00:38:12 UTC
Description of problem:
Whenever I (re)connect to a WLAN (reboot, manual disconnect/reconnect, resume from suspend) unbound is unable to resolve addresses.  I have to manually issue a reload command before it starts resolving again.

Version-Release number of selected component (if applicable):
Regression starts with 1.7.0-2.fc27

How reproducible:
Every time

Steps to Reproduce:

1. /etc/NetworkManager/NetworkManager.conf specifies dns=unbound

2. From disconnected state, connect to WLAN.

3. 'dig www.google.com' results in no resolution:
; <<>> DiG 9.11.3-RedHat-9.11.3-2.fc27 <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39856
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.google.com.			IN	A

;; AUTHORITY SECTION:
com.			172615	IN	NS	l.gtld-servers.net.
com.			172615	IN	NS	m.gtld-servers.net.
com.			172615	IN	NS	a.gtld-servers.net.
com.			172615	IN	NS	b.gtld-servers.net.
com.			172615	IN	NS	c.gtld-servers.net.
com.			172615	IN	NS	d.gtld-servers.net.
com.			172615	IN	NS	e.gtld-servers.net.
com.			172615	IN	NS	f.gtld-servers.net.
com.			172615	IN	NS	g.gtld-servers.net.
com.			172615	IN	NS	h.gtld-servers.net.
com.			172615	IN	NS	i.gtld-servers.net.
com.			172615	IN	NS	j.gtld-servers.net.
com.			172615	IN	NS	k.gtld-servers.net.

;; ADDITIONAL SECTION:
a.gtld-servers.net.	172615	IN	A	192.5.6.30
a.gtld-servers.net.	172615	IN	AAAA	2001:503:a83e::2:30
b.gtld-servers.net.	172615	IN	A	192.33.14.30
b.gtld-servers.net.	172615	IN	AAAA	2001:503:231d::2:30
c.gtld-servers.net.	172615	IN	A	192.26.92.30
c.gtld-servers.net.	172615	IN	AAAA	2001:503:83eb::30
d.gtld-servers.net.	172615	IN	A	192.31.80.30
d.gtld-servers.net.	172615	IN	AAAA	2001:500:856e::30
e.gtld-servers.net.	172615	IN	A	192.12.94.30
e.gtld-servers.net.	172615	IN	AAAA	2001:502:1ca1::30
f.gtld-servers.net.	172615	IN	A	192.35.51.30
f.gtld-servers.net.	172615	IN	AAAA	2001:503:d414::30
g.gtld-servers.net.	172615	IN	A	192.42.93.30
g.gtld-servers.net.	172615	IN	AAAA	2001:503:eea3::30
h.gtld-servers.net.	172615	IN	A	192.54.112.30
h.gtld-servers.net.	172615	IN	AAAA	2001:502:8cc::30
i.gtld-servers.net.	172615	IN	A	192.43.172.30
i.gtld-servers.net.	172615	IN	AAAA	2001:503:39c1::30
j.gtld-servers.net.	172615	IN	A	192.48.79.30
j.gtld-servers.net.	172615	IN	AAAA	2001:502:7094::30
k.gtld-servers.net.	172615	IN	A	192.52.178.30
k.gtld-servers.net.	172615	IN	AAAA	2001:503:d2d::30
l.gtld-servers.net.	172615	IN	A	192.41.162.30
l.gtld-servers.net.	172615	IN	AAAA	2001:500:d937::30
m.gtld-servers.net.	172615	IN	A	192.55.83.30
m.gtld-servers.net.	172615	IN	AAAA	2001:501:b1f9::30

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Mar 24 17:32:18 PDT 2018
;; MSG SIZE  rcvd: 839

4. At this point "systemctl status unbound.service" shows several entries like:
unbound[11585]: [11585:3] info: validation failure <domain> A IN
unbound[11585]: [11585:3] info: validation failure <domain> AAAA IN

5. After "sudo unbound-control reload", I can resolve names again, and systemctl status no longer shows validation failures.

Actual results:
Cannot resolve hostnames

Expected results:
Up until the previous version, currently in stable, name resolution worked across network changes without manual intervention.

Additional info:

Comment 1 Paul Wouters 2018-03-25 21:07:33 UTC
please as a workaround, try setting

aggressive-nsec: no

in unbound.conf

Comment 2 Charles R. Anderson 2018-03-26 05:17:02 UTC
I get this with dnssec-trigger:

Mar 26 01:00:16 gauge unbound-checkconf[1335]: unbound-checkconf: no errors in /etc/unbound/unbound.conf
Mar 26 01:00:16 gauge unbound-anchor[1351]: [1522040416] libunbound[1351:0] error: can't bind socket: Permission denied for 0.0.0.0
Mar 26 01:00:16 gauge audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=unbound comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] notice: init module 0: ipsecmod
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] notice: init module 1: validator
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] notice: init module 2: iterator
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] info: start of service (unbound 1.7.0).
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master k.root-servers.net
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master g.root-servers.net
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master f.root-servers.net
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master e.root-servers.net
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master c.root-servers.net
Mar 26 01:00:16 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master b.root-servers.net
Mar 26 01:00:19 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master k.root-servers.net
Mar 26 01:00:19 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master g.root-servers.net
Mar 26 01:00:19 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master f.root-servers.net
Mar 26 01:00:19 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master e.root-servers.net
Mar 26 01:00:19 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master c.root-servers.net
Mar 26 01:00:19 gauge unbound[1372]: [1372:0] error: .: failed lookup, cannot probe to master b.root-servers.net
Mar 26 01:00:25 gauge unbound[1372]: [1372:1] info: generate keytag query _ta-4a5c-4f66. NULL IN
Mar 26 01:00:39 gauge unbound[1372]: [1372:1] info: generate keytag query _ta-4a5c-4f66. NULL IN

Comment 3 Dimitris 2018-03-26 05:50:51 UTC
Tried aggressive-nsec: no, didn't help.  Same steps as description.

Note for anyone trying this, save your old unbound.conf, as after distro-sync back to 1.6.8, unbound won't start due to encountering unknown stanzas in the config file that were introduced by 1.7.

I'm attaching my 1.7 and 1.6.8 config files.

Comment 4 Dimitris 2018-03-26 05:51:48 UTC
Created attachment 1412968 [details]
unbound.conf under 1.6.8

Comment 5 Dimitris 2018-03-26 05:52:20 UTC
Created attachment 1412969 [details]
unbound.conf under 1.7.0

Comment 6 Dimitris 2018-03-26 06:10:53 UTC
FWIW, also running dnssec-triggerd here, and seeing the same as Charles:

Mar 25 22:33:46 vimes unbound[1318]: [1318:0] error: .: failed lookup, cannot probe to master k.root-servers.net
Mar 25 22:33:46 vimes unbound[1318]: [1318:0] error: .: failed lookup, cannot probe to master g.root-servers.net
Mar 25 22:33:46 vimes unbound[1318]: [1318:0] error: .: failed lookup, cannot probe to master f.root-servers.net
Mar 25 22:33:46 vimes unbound[1318]: [1318:0] error: .: failed lookup, cannot probe to master e.root-servers.net
Mar 25 22:33:46 vimes unbound[1318]: [1318:0] error: .: failed lookup, cannot probe to master c.root-servers.net
Mar 25 22:33:46 vimes unbound[1318]: [1318:0] error: .: failed lookup, cannot probe to master b.root-servers.net

Comment 7 Charles R. Anderson 2018-03-27 14:13:32 UTC
Given that the failures above are for exactly the same zones as listed in the 1.7.0 config as auth-zones, this seems to be the cause of the problem:

+# Authority zones
+# The data for these zones is kept locally, from a file or downloaded.
+# The data can be served to downstream clients, or used instead of the
+# upstream (which saves a lookup to the upstream).  The first example
+# has a copy of the root for local usage.  The second serves example.org
+# authoritatively.  zonefile: reads from file (and writes to it if you also
+# download it), master: fetches with AXFR and IXFR, or url to zonefile.
+auth-zone:
+       name: "."
+       for-downstream: no
+       for-upstream: yes
+       fallback-enabled: yes
+       master: b.root-servers.net
+       master: c.root-servers.net
+       master: e.root-servers.net
+       master: f.root-servers.net
+       master: g.root-servers.net
+       master: k.root-servers.net
+# auth-zone:
+#      name: "example.org"
+#      for-downstream: yes
+#      for-upstream: yes
+#      zonefile: "example.org.zone"

Comment 8 Charles R. Anderson 2018-03-27 14:36:21 UTC
I confirmed that after commenting out the auth-zone: configuration that name resolution once again works.  This is with aggressive-nsec: yes.

I have a theory why it works for some people--maybe some people are using unbound as a forwarder to their ISP or router's DNS server, but with dnssec-trigger it is making direct DNS queries starting with the root-servers.

Comment 9 Charles R. Anderson 2018-03-27 14:39:06 UTC
Created attachment 1413755 [details]
unbound.conf under 1.7.0 with aggressive-nsec: yes and auth-zone: removed

Working unbound.conf under 1.7.0 with aggressive-nsec: yes and auth-zone: removed.

Comment 10 Christian Stadelmann 2018-03-29 18:25:27 UTC
(In reply to Charles R. Anderson from comment #8)
> I confirmed that after commenting out the auth-zone: configuration that name
> resolution once again works.  This is with aggressive-nsec: yes.

I can confirm this behavior. When the "auth-zone:" part is commented out, name resolution works fine. With it being present in the config file (i.e. not commented out) it breaks.

(In reply to Charles R. Anderson from comment #8)
> I have a theory why it works for some people--maybe some people are using
> unbound as a forwarder to their ISP or router's DNS server, but with
> dnssec-trigger it is making direct DNS queries starting with the
> root-servers.

Is there an easy command you could provide to check this theory?
According to Gnome-control-center, my DNS server is running at an address in the 192.168.0.0/16 range.

$ nmcli
[…]
DNS configuration:
	servers: 192.168.178.1
	domains: fritz.box
	interface: wlp3s0

	servers: fd00::eadf:70ff:fe4b:a52a
	interface: wlp3s0
[…]

Comment 11 Dimitris 2018-04-10 03:25:42 UTC
1.7.0-4.fc27 seems to fix this for me.  Using default config installed by `dnf upgrade`.

Comment 12 Christian Stadelmann 2018-04-14 20:00:40 UTC
(In reply to Dimitris from comment #11)
> 1.7.0-4.fc27 seems to fix this for me.  Using default config installed by
> `dnf upgrade`.

+1, works for me too.