Bug 1367772

Summary: dns not updated after sleep and resume laptop
Product: Red Hat Enterprise Linux 7 Reporter: Jeff Bastian <jbastian>
Component: dnsmasqAssignee: Pavel Šimerda (pavlix) <psimerda>
Status: CLOSED ERRATA QA Contact: Vaclav Danek <vdanek>
Severity: medium Docs Contact:
Priority: high    
Version: 7.3CC: atragler, bgalvani, bjoernv, dgibson, jistone, jpazdziora, jscotka, lrintel, ovasik, psimerda, rkhan, sgraf, thaller, thozza, vdanek
Target Milestone: rcKeywords: Patch, Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1373485 (view as bug list) Environment:
Last Closed: 2016-11-04 06:15:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1373485    
Attachments:
Description Flags
dnsmasq system logs
none
upstream patch adapted to rhel 7.3 code base none

Description Jeff Bastian 2016-08-17 12:32:45 UTC
Description of problem:
I have NetworkManager on my laptop configured to run dnsmasq for DNS caching:

$ cat /etc/NetworkManager/NetworkManager.conf
[main]
plugins=ifcfg-rh
dns=dnsmasq

If I put my laptop to sleep, then wake it up and re-connect to the VPN, the dnsmasq process does not get updated with the new DNS information for the VPN.  I have to kill dnsmasq and let NM launch a new process in order for DNS to work.

$ ping host.inside.vpn.com
ping: host.inside.vpn.com: Name or service not known
$ pgrep -laf dnsmasq
15451 /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts ...
$ sudo kill 15451
$ pgrep -laf dnsmasq
19317 /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts ...
$ ping host.inside.vpn.com
PING host.inside.vpn.com (10.1.2.3) 56(84) bytes of data.
64 bytes from host.inside.vpn.com (10.1.2.3): icmp_seq=1 ttl=252 time=62.2 ms
...


Version-Release number of selected component (if applicable):
NetworkManager-1.4.0-0.5.beta1.el7.x86_64
NetworkManager-openvpn-1.0.8-1.el7.x86_64
dnsmasq-2.66-17.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. connect to VPN, work for a while, then put laptop to sleep
2. resume laptop, re-connect to VPN

Actual results:
the VPN is connected, but DNS lookups for hosts inside the VPN fail

Expected results:
DNS for VPN hosts works

Additional info:

Comment 2 Jeff Bastian 2016-08-17 12:37:48 UTC
Created attachment 1191625 [details]
dnsmasq system logs

Comment 3 Beniamino Galvani 2016-08-18 15:05:07 UTC
Looking at logs, NetworkManager correctly adds the nameservers to
dnsmasq after the VPN connection is re-established, but it seems
dnsmasq isn't using those server. I think that dnsmasq doesn't handle
properly the destruction and re-creation of the VPN tun interface when
the servers have a "@ifname" suffix.

The dnsmasq issue can be reproduced in the following way:

 * start 'dnsmasq -i'
 * set upstream server specifying the egress interface:

   busctl call uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.dnsmasq SetServersEx aas 1 1  "192.168.1.1@ens3"

 * check that name resolution works
 * destroy the ens3 interface (for example unloading the NIC module)
 * recreate and configure the interface
 * clear and re-add the nameserver:

   busctl call uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.dnsmasq SetServersEx aas 0
   busctl call uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.dnsmasq SetServersEx aas 1 1  "192.168.1.1@ens3"

 * check that name resolution works: this fails

I suspect that the problem lies in the caching of sockets bound to
interfaces inside dnsmasq. From the code it seems that the socket
bound to ens3 is cached and reused later, when the interface is a
different one (albeit with the same name).

I'm reassigning this for analysis to dnsmasq component. Also,
bumping priority as this affects every reconnection to VPNs from
NetworkManager.

Comment 11 Daniel van Rossum 2016-09-01 15:19:29 UTC
(In reply to Beniamino Galvani from comment #9)
> Upstream fix:
> 
> http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;
> h=2675f2061525bc954be14988d64384b74aa7bf8b

I can confirm that this patch fixes the issue for me.

Comment 22 Pavel Šimerda (pavlix) 2016-09-06 21:52:45 UTC
Created attachment 1198427 [details]
upstream patch adapted to rhel 7.3 code base

Comment 23 Jan Pazdziora (Red Hat) 2016-09-07 06:18:04 UTC
Could we get Fedora 24 packages respun to give it some testing in the Fedora land, for bug 1373485?

Comment 24 Beniamino Galvani 2016-09-07 07:53:08 UTC
(In reply to Pavel Šimerda (pavlix) from comment #22)
> Created attachment 1198427 [details]
> upstream patch adapted to rhel 7.3 code base

The backport looks ok to me. I noticed there is a follow-up patch to fix a potential crash and that should be backported too:

http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=16800ea072dd0cdf14d951c4bb8d2808b3dfe53d

Comment 25 Beniamino Galvani 2016-09-08 09:28:22 UTC
(In reply to Beniamino Galvani from comment #3)
> The dnsmasq issue can be reproduced in the following way:
> 
>  * start 'dnsmasq -i'

Note for QE: there is a typo, this should be 'dnsmasq -1' (to enable dnsmasq to accept updates through D-Bus).

Instead of using D-Bus to perform the updates it's possible to supply the --servers-file= option to specify a configuration file and use SIGHUP to tell dnsmasq to reload it.

Comment 37 Beniamino Galvani 2016-09-10 10:05:01 UTC
The test passes when the last 'reconfigure_downstream_server' is
removed only because doing so removes the CleanCache call and the last
result is returned from the cache.

Comparing version 2.66 with upstream, there is a difference in how new
servers added from D-Bus are handled. They both try to recycle and old
entry in add_update_server() but upstream version always memset()s the
recycled entry (thus clearing the sfd), while 2.66 only overwrites
some fields. This is buggy, as the values in the old server can be
totally unrelated to the new server.

I think we only need to ensure that the sfd (and other fields) are
cleared when we add the new server, instead of reusing possibly bogus
values:

--- a/src/dbus.c
+++ b/src/dbus.c
@@ -161,6 +161,10 @@ static void add_update_server(union mysockaddr *addr,

   if (serv)
     {
+      serv->sfd = NULL;
+      serv->queries = 0;
+      serv->failed_queries = 0;
+
       if (interface)
        strcpy(serv->interface, interface);
       else

Comment 40 errata-xmlrpc 2016-11-04 06:15:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2421.html