1338731 – DNS not flushed on VPN up/down.

Bug 1338731 - DNS not flushed on VPN up/down.

Summary: DNS not flushed on VPN up/down.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	NetworkManager
Sub Component:
Version:	24
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Lubomir Rintel
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-23 11:00 UTC by David Woodhouse
Modified:	2016-09-06 11:56 UTC (History)
CC List:	7 users (show)
Fixed In Version:	NetworkManager-1.2.2-2.fc24
Clone Of:
Environment:
Last Closed:	2016-06-18 18:46:48 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
[PATCH] dns: clear dnsmasq cache after an update (3.88 KB, patch) 2016-05-23 15:29 UTC, Beniamino Galvani	no flags	Details \| Diff
[PATCH v2 1/2] dns/dnsmasq: cancel pending update on dispose (3.73 KB, patch) 2016-05-24 12:35 UTC, Beniamino Galvani	no flags	Details \| Diff
[PATCH v2 2/2] dns: clear dnsmasq cache after an update (2.71 KB, patch) 2016-05-24 12:36 UTC, Beniamino Galvani	no flags	Details \| Diff
Show Obsolete (1) View All

Description David Woodhouse 2016-05-23 11:00:02 UTC

When I join the company VPN, DNS results (mostly NXDOMAIN) are not flushed from the cache until/unless I manually kick dnsmasq to forget them...

[dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com
Host viggo.jf.intel.com not found: 3(NXDOMAIN)
[dwoodhou@i7 ~]$ nmcli con up 'Intel AnyConnect VPN'
A password is required to connect to 'Intel AnyConnect VPN'.
Warning: password for 'vpn.secrets.gateway' not given in 'passwd-file' and nmcli cannot ask without '--ask' option.
VPN connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/15)
[dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com
Host viggo.jf.intel.com not found: 3(NXDOMAIN)
[dwoodhou@i7 ~]$ sudo killall -HUP dnsmasq
[dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com
viggo.jf.intel.com has address 10.54.39.121


The converse problem occurs when I take the VPN down.

[dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com
viggo.jf.intel.com has address 10.54.39.121
[dwoodhou@i7 ~]$ nmcli con down 'Intel AnyConnect VPN'
Connection 'Intel AnyConnect VPN' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/15)
[dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com
viggo.jf.intel.com has address 10.54.39.121
[dwoodhou@i7 ~]$ sudo killall -HUP dnsmasq
[dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com
Host viggo.jf.intel.com not found: 3(NXDOMAIN)


We can debate whether this is a dnsmasq or a NetworkManager bug. Personally, I think schizoDNS (where you get a different view of the *same* domain depending on where the query comes from) is an utterly stupid setup, and people should be using non-public domains for internal stuff ($COMPANY.internal). And thus, dnsmasq shouldn't necessarily *expect* to throw away its cache for a given domain, just because you point it at a different nameserver for that domain. (And also... what if you just *add* a new nameserver? What if you just take one away? When do we flush the cache? Every time anything is changed?)

Might be nice if dnsmasq did have an option to do a selective flush though, and allowed NetworkManager to flush *only* the domains which it's changing.

Comment 1 Beniamino Galvani 2016-05-23 15:29:08 UTC

Created attachment 1160687 [details]
[PATCH] dns: clear dnsmasq cache after an update

(In reply to David Woodhouse from comment #0)
> We can debate whether this is a dnsmasq or a NetworkManager bug. Personally,
> I think schizoDNS (where you get a different view of the *same* domain
> depending on where the query comes from) is an utterly stupid setup, and
> people should be using non-public domains for internal stuff
> ($COMPANY.internal). And thus, dnsmasq shouldn't necessarily *expect* to
> throw away its cache for a given domain, just because you point it at a
> different nameserver for that domain. (And also... what if you just *add* a
> new nameserver? What if you just take one away? When do we flush the cache?
> Every time anything is changed?)

I don't think it's so bad to clear the cache every time there is a
change (before switching to D-Bus updates, dnsmasq was restarted every
time, and thus the cache always implicitly flushed).

Maybe in the future it would be nice to implement a smarter update
logic, but for the moment I think the attached patch is enough.

Comment 2 David Woodhouse 2016-05-23 23:16:10 UTC

Comment on attachment 1160687 [details]
[PATCH] dns: clear dnsmasq cache after an update

There's a ClearCache D-Bus method. Might it be nicer (and perhaps less likely to trigger SELinux and other permissions issues) to use that instead?

Comment 3 Beniamino Galvani 2016-05-24 12:35:58 UTC

Created attachment 1161027 [details]
[PATCH v2 1/2] dns/dnsmasq: cancel pending update on dispose

Comment 4 Beniamino Galvani 2016-05-24 12:36:34 UTC

Created attachment 1161028 [details]
[PATCH v2 2/2] dns: clear dnsmasq cache after an update

Comment 5 Beniamino Galvani 2016-05-24 12:39:58 UTC

(In reply to David Woodhouse from comment #2)
> Comment on attachment 1160687 [details]
> [PATCH] dns: clear dnsmasq cache after an update
> 
> There's a ClearCache D-Bus method. Might it be nicer (and perhaps less
> likely to trigger SELinux and other permissions issues) to use that instead?

Yeah, good point, requesting the flush through D-Bus is indeed a better idea; I've updated the patches.

Comment 6 Thomas Haller 2016-05-25 16:28:34 UTC

Both patches lgtm.


Except, I wonder... we always call SetServerEx when there is a DNS update, even if the actual configuration didn't change.
Can it not happen, that with this change we flush the cache too often?

Maybe we should remember the settings that we applied last time, and if they didn't change, not flush the cache?


Maybe in update() we could do:


-    g_clear_pointer (&priv->set_server_ex_args, g_variant_unref);
-    priv->set_server_ex_args = g_variant_ref_sink (g_variant_new ("(aas)...
+    args = g_variant_new ("(aas)", ...
+    if (  priv->set_server_ex_args
+        && g_variant_equal (args, priv->set_server_ex_args))
+        g_variant_unref (args);
+    else {
+        g_clear_pointer (&priv->set_server_ex_args, g_variant_unref);
+        priv->set_server_ex_args = g_variant_ref_sink (args);
+    }

?

Comment 7 Thomas Haller 2016-05-25 16:30:08 UTC

(In reply to Thomas Haller from comment #6)
> +    if (  priv->set_server_ex_args
> +        && g_variant_equal (args, priv->set_server_ex_args))

Ah, of course, we should not compare with priv->set_server_ex_args, but somehow remember the last argument for a successfull SetServerEx...

Comment 8 Beniamino Galvani 2016-05-25 16:45:14 UTC

(In reply to Thomas Haller from comment #6)
> Both patches lgtm.
> 
> Except, I wonder... we always call SetServerEx when there is a DNS update,
> even if the actual configuration didn't change.
> Can it not happen, that with this change we flush the cache too often?

The DNS manager computes a hash of the configuration to detect if it really changed or not, and doesn't call update() if it's not necessary; so I think we don't have to care about that in the plugin.

OTOH, if the user forces a re-write of DNS configuration through SIGUSR1, the DNS manager skips the hash check and forces a dnsmasq update, and now a flush of the cache. In my opinion this is also desirable. What do you think?

Comment 9 Thomas Haller 2016-05-25 16:50:36 UTC

(In reply to Beniamino Galvani from comment #8)
> (In reply to Thomas Haller from comment #6)
> > Both patches lgtm.
> > 
> > Except, I wonder... we always call SetServerEx when there is a DNS update,
> > even if the actual configuration didn't change.
> > Can it not happen, that with this change we flush the cache too often?
> 
> The DNS manager computes a hash of the configuration to detect if it really
> changed or not, and doesn't call update() if it's not necessary; so I think
> we don't have to care about that in the plugin.
> 
> OTOH, if the user forces a re-write of DNS configuration through SIGUSR1,
> the DNS manager skips the hash check and forces a dnsmasq update, and now a
> flush of the cache. In my opinion this is also desirable. What do you think?

Yeah, that sounds all right.

ACK to both patches.

Comment 10 Dan Williams 2016-05-27 18:03:40 UTC

ACK on both patches.

Comment 11 Beniamino Galvani 2016-05-28 07:47:19 UTC

Patches applied to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d376787ce1a9e8c4990ed98be143ab892c9d29ed
https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=4feb58b50b9fd6caceda83bab907ad107ad8ed01

and nm-1-2:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-2&id=7541ca0692668070e48adfc5fa8e4c6501600e16
https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-2&id=a701e5b7ba35a0730d756ab0c1b15f0414bee592

Comment 12 Fedora Update System 2016-06-02 19:58:17 UTC

NetworkManager-1.2.2-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-5bbc872851

Comment 13 Fedora Update System 2016-06-03 09:25:36 UTC

NetworkManager-1.2.2-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-5bbc872851

Comment 14 Fedora Update System 2016-06-18 18:46:44 UTC

NetworkManager-1.2.2-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Jan Pazdziora (Red Hat) 2016-09-05 14:18:17 UTC

Is it possible that this fix somehow got lost in NetworkManager-1.2.4-2.fc24.

Even if I see dnsmasq[11494]: cleared cache shown in journal, I seem to be getting the queries answered locally number rising with no query sent to the VPN-ed upstream nameserver.

This happens when I setup VPN, stop it (in new NetworkManager applet (I'm on XFCE) there no longer is a separate item in the menu so I just uncheck the VPN entry) and I setup VPN again. The second time VPN is enabled, DNS just seems to act weird.

Comment 16 Beniamino Galvani 2016-09-05 14:42:51 UTC

(In reply to Jan Pazdziora from comment #15)
> Is it possible that this fix somehow got lost in NetworkManager-1.2.4-2.fc24.
> 
> Even if I see dnsmasq[11494]: cleared cache shown in journal, I seem to be
> getting the queries answered locally number rising with no query sent to the
> VPN-ed upstream nameserver.
> 
> This happens when I setup VPN, stop it (in new NetworkManager applet (I'm on
> XFCE) there no longer is a separate item in the menu so I just uncheck the
> VPN entry) and I setup VPN again. The second time VPN is enabled, DNS just
> seems to act weird.

If you execute:

 echo log-queries > /etc/NetworkManager/dnsmasq.d/log-queries

then restart NM and connect to the VPN, you should see that dnsmasq queries the upstream servers again after the cache gets cleared:

  dnsmasq[21395]: cleared cache
  ...
  dnsmasq[21395]: query[A] test.com from 127.0.0.1
  dnsmasq[21395]: forwarded test.com to 192.168.1.1

as opposed to:

  dnsmasq[21395]: cached test.com is 1.2.3.4

Is this correct?

Comment 17 Jan Pazdziora (Red Hat) 2016-09-05 16:20:18 UTC

(In reply to Beniamino Galvani from comment #16)
> 
> If you execute:
> 
>  echo log-queries > /etc/NetworkManager/dnsmasq.d/log-queries

Useful setting, thank you.

> then restart NM and connect to the VPN, you should see that dnsmasq queries
> the upstream servers again after the cache gets cleared:
> 
>   dnsmasq[21395]: cleared cache
>   ...
>   dnsmasq[21395]: query[A] test.com from 127.0.0.1
>   dnsmasq[21395]: forwarded test.com to 192.168.1.1
> 
> as opposed to:
> 
>   dnsmasq[21395]: cached test.com is 1.2.3.4
> 
> Is this correct?

Actually, I get

   dnsmasq[21395]: query[A] test.com from 127.0.0.1

and nothing else, while the VPN is on.

When I turn the VPN off, I get back to the

    dnsmasq[18748]: query[A] test.com from 127.0.0.1
    dnsmasq[18748]: forwarded test.com to 85.86.87.88
    dnsmasq[18748]: reply test.com is NODATA-IPv4

So in case of the second VPN activation, I get neither "forwarder", nor "cached" line in the journal.

Comment 18 Jan Pazdziora (Red Hat) 2016-09-05 16:23:40 UTC

I see this behaviour both with dnsmasq-2.75-4.fc24.x86_64 and dnsmasq-2.76-1.fc24.x86_64, should dnsmasq be the culprit.

Comment 19 Beniamino Galvani 2016-09-05 17:10:05 UTC

Hi, do you see NM sending the upstream VPN server list to dnsmasq in
the logs?

If the servers are correctly added, I think you are hitting bug
1367772, which makes dnsmasq fail to send requests out when a virtual
device is destroyed and recreated (like in case of a VPN reconnection)
due to wrong caching of request sockets.

Comment 20 Jan Pazdziora (Red Hat) 2016-09-05 17:57:01 UTC

(In reply to Beniamino Galvani from comment #19)
> Hi, do you see NM sending the upstream VPN server list to dnsmasq in
> the logs?

Yes, there are records like

dnsmasq[24083]: using nameserver 10.11.12.13#53 for domain example.com

in journal.

> If the servers are correctly added, I think you are hitting bug
> 1367772, which makes dnsmasq fail to send requests out when a virtual
> device is destroyed and recreated (like in case of a VPN reconnection)
> due to wrong caching of request sockets.

Thanks.

I've been only experiencing the problematic behaviour for the past few days since I've upgraded by Fedora 24 installation. The Fedora 24 GA had dnsmasq-2.75-4.fc24.x86_64.rpm in it and I'm pretty sure things where working fine for a couple of weeks / months. Bug 1367772 says the issue is present on dnsmasq-2.66-17.el7.x86_64 ... but on Fedora 24 was working fine for a while.

I do not rule out that it's the same bug ... but we had newer version of dnsmasq working fine for some time.

Comment 21 Thomas Haller 2016-09-05 18:09:22 UTC

(In reply to Jan Pazdziora from comment #20)
> 
> I've been only experiencing the problematic behaviour for the past few days
> since I've upgraded by Fedora 24 installation. The Fedora 24 GA had
> dnsmasq-2.75-4.fc24.x86_64.rpm in it and I'm pretty sure things where
> working fine for a couple of weeks / months. Bug 1367772 says the issue is
> present on dnsmasq-2.66-17.el7.x86_64 ... but on Fedora 24 was working fine
> for a while.
> 
> I do not rule out that it's the same bug ... but we had newer version of
> dnsmasq working fine for some time.

what are the versions of NetworkManager where you didn't have the issue, and which versions have the issue for you?

in the past, NM would not reconfigure dnsmasq via D-Bus but respawn it every time. That only changed with releases 1.2.2 and 1.4.0.

Comment 22 Jan Pazdziora (Red Hat) 2016-09-06 05:23:00 UTC

I don't know the exact versions. But it worked fine on Fedora 24 for some time, and I've upgraded to Fedora 24 day after GA. Looking at http://dl.fedoraproject.org/pub/fedora/linux/releases/24/Everything/x86_64/os/Packages/n/ the GA version was NetworkManager-1.2.2-1.fc24.x86_64.rpm so I'd say that it worked with 1.2.2.

Is there a way to go back to respawning dnsmasq with current NetworkManager?

Comment 23 Beniamino Galvani 2016-09-06 07:41:31 UTC

Hi, I think the issue started after we added the interface name to
nameservers sent to dnsmasq to prevent responses from the wrong
interface. That was in 1.2.4.

Currently there isn't a way to revert the old behavior. For Fedora
either the dnsmasq package needs to be rebuilt with the fix, or we
should revert the dnsmasq-interface commit.

In the meanwhile, a ugly workaround would be to add a dispatcher
script to reconfigure DNS when a VPN goes up:

cat <<'EOF' > /etc/NetworkManager/dispatcher.d/99-dnsmasq-vpn.sh
#!/bin/sh
[ "$2" = vpn-up ] && killall -HUP NetworkManager
exit 0
EOF
chmod +x /etc/NetworkManager/dispatcher.d/99-dnsmasq-vpn.sh

Comment 24 Jan Pazdziora (Red Hat) 2016-09-06 07:51:37 UTC

(In reply to Beniamino Galvani from comment #23)
> Hi, I think the issue started after we added the interface name to
> nameservers sent to dnsmasq to prevent responses from the wrong
> interface. That was in 1.2.4.

I guess we need new bugzilla, right? No matter if it ends up on dnsmasq or on NetworkManager. Would you like to file that, since you seem to have some insight into what is actually wrong?

Thank you, Jan

Comment 25 Beniamino Galvani 2016-09-06 11:56:06 UTC

(In reply to Jan Pazdziora from comment #24)
> (In reply to Beniamino Galvani from comment #23)
> > Hi, I think the issue started after we added the interface name to
> > nameservers sent to dnsmasq to prevent responses from the wrong
> > interface. That was in 1.2.4.
> 
> I guess we need new bugzilla, right? No matter if it ends up on dnsmasq or
> on NetworkManager. Would you like to file that, since you seem to have some
> insight into what is actually wrong?

Hi, I think this must be fixed in dnsmasq. I've filed bug 1373485 for that.

Note You need to log in before you can comment on or make changes to this bug.