When I join the company VPN, DNS results (mostly NXDOMAIN) are not flushed from the cache until/unless I manually kick dnsmasq to forget them... [dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com Host viggo.jf.intel.com not found: 3(NXDOMAIN) [dwoodhou@i7 ~]$ nmcli con up 'Intel AnyConnect VPN' A password is required to connect to 'Intel AnyConnect VPN'. Warning: password for 'vpn.secrets.gateway' not given in 'passwd-file' and nmcli cannot ask without '--ask' option. VPN connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/15) [dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com Host viggo.jf.intel.com not found: 3(NXDOMAIN) [dwoodhou@i7 ~]$ sudo killall -HUP dnsmasq [dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com viggo.jf.intel.com has address 10.54.39.121 The converse problem occurs when I take the VPN down. [dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com viggo.jf.intel.com has address 10.54.39.121 [dwoodhou@i7 ~]$ nmcli con down 'Intel AnyConnect VPN' Connection 'Intel AnyConnect VPN' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/15) [dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com viggo.jf.intel.com has address 10.54.39.121 [dwoodhou@i7 ~]$ sudo killall -HUP dnsmasq [dwoodhou@i7 ~]$ host -t a viggo.jf.intel.com Host viggo.jf.intel.com not found: 3(NXDOMAIN) We can debate whether this is a dnsmasq or a NetworkManager bug. Personally, I think schizoDNS (where you get a different view of the *same* domain depending on where the query comes from) is an utterly stupid setup, and people should be using non-public domains for internal stuff ($COMPANY.internal). And thus, dnsmasq shouldn't necessarily *expect* to throw away its cache for a given domain, just because you point it at a different nameserver for that domain. (And also... what if you just *add* a new nameserver? What if you just take one away? When do we flush the cache? Every time anything is changed?) Might be nice if dnsmasq did have an option to do a selective flush though, and allowed NetworkManager to flush *only* the domains which it's changing.
Created attachment 1160687 [details] [PATCH] dns: clear dnsmasq cache after an update (In reply to David Woodhouse from comment #0) > We can debate whether this is a dnsmasq or a NetworkManager bug. Personally, > I think schizoDNS (where you get a different view of the *same* domain > depending on where the query comes from) is an utterly stupid setup, and > people should be using non-public domains for internal stuff > ($COMPANY.internal). And thus, dnsmasq shouldn't necessarily *expect* to > throw away its cache for a given domain, just because you point it at a > different nameserver for that domain. (And also... what if you just *add* a > new nameserver? What if you just take one away? When do we flush the cache? > Every time anything is changed?) I don't think it's so bad to clear the cache every time there is a change (before switching to D-Bus updates, dnsmasq was restarted every time, and thus the cache always implicitly flushed). Maybe in the future it would be nice to implement a smarter update logic, but for the moment I think the attached patch is enough.
Comment on attachment 1160687 [details] [PATCH] dns: clear dnsmasq cache after an update There's a ClearCache D-Bus method. Might it be nicer (and perhaps less likely to trigger SELinux and other permissions issues) to use that instead?
Created attachment 1161027 [details] [PATCH v2 1/2] dns/dnsmasq: cancel pending update on dispose
Created attachment 1161028 [details] [PATCH v2 2/2] dns: clear dnsmasq cache after an update
(In reply to David Woodhouse from comment #2) > Comment on attachment 1160687 [details] > [PATCH] dns: clear dnsmasq cache after an update > > There's a ClearCache D-Bus method. Might it be nicer (and perhaps less > likely to trigger SELinux and other permissions issues) to use that instead? Yeah, good point, requesting the flush through D-Bus is indeed a better idea; I've updated the patches.
Both patches lgtm. Except, I wonder... we always call SetServerEx when there is a DNS update, even if the actual configuration didn't change. Can it not happen, that with this change we flush the cache too often? Maybe we should remember the settings that we applied last time, and if they didn't change, not flush the cache? Maybe in update() we could do: - g_clear_pointer (&priv->set_server_ex_args, g_variant_unref); - priv->set_server_ex_args = g_variant_ref_sink (g_variant_new ("(aas)... + args = g_variant_new ("(aas)", ... + if ( priv->set_server_ex_args + && g_variant_equal (args, priv->set_server_ex_args)) + g_variant_unref (args); + else { + g_clear_pointer (&priv->set_server_ex_args, g_variant_unref); + priv->set_server_ex_args = g_variant_ref_sink (args); + } ?
(In reply to Thomas Haller from comment #6) > + if ( priv->set_server_ex_args > + && g_variant_equal (args, priv->set_server_ex_args)) Ah, of course, we should not compare with priv->set_server_ex_args, but somehow remember the last argument for a successfull SetServerEx...
(In reply to Thomas Haller from comment #6) > Both patches lgtm. > > Except, I wonder... we always call SetServerEx when there is a DNS update, > even if the actual configuration didn't change. > Can it not happen, that with this change we flush the cache too often? The DNS manager computes a hash of the configuration to detect if it really changed or not, and doesn't call update() if it's not necessary; so I think we don't have to care about that in the plugin. OTOH, if the user forces a re-write of DNS configuration through SIGUSR1, the DNS manager skips the hash check and forces a dnsmasq update, and now a flush of the cache. In my opinion this is also desirable. What do you think?
(In reply to Beniamino Galvani from comment #8) > (In reply to Thomas Haller from comment #6) > > Both patches lgtm. > > > > Except, I wonder... we always call SetServerEx when there is a DNS update, > > even if the actual configuration didn't change. > > Can it not happen, that with this change we flush the cache too often? > > The DNS manager computes a hash of the configuration to detect if it really > changed or not, and doesn't call update() if it's not necessary; so I think > we don't have to care about that in the plugin. > > OTOH, if the user forces a re-write of DNS configuration through SIGUSR1, > the DNS manager skips the hash check and forces a dnsmasq update, and now a > flush of the cache. In my opinion this is also desirable. What do you think? Yeah, that sounds all right. ACK to both patches.
ACK on both patches.
Patches applied to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d376787ce1a9e8c4990ed98be143ab892c9d29ed https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=4feb58b50b9fd6caceda83bab907ad107ad8ed01 and nm-1-2: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-2&id=7541ca0692668070e48adfc5fa8e4c6501600e16 https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-2&id=a701e5b7ba35a0730d756ab0c1b15f0414bee592
NetworkManager-1.2.2-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-5bbc872851
NetworkManager-1.2.2-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-5bbc872851
NetworkManager-1.2.2-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.
Is it possible that this fix somehow got lost in NetworkManager-1.2.4-2.fc24. Even if I see dnsmasq[11494]: cleared cache shown in journal, I seem to be getting the queries answered locally number rising with no query sent to the VPN-ed upstream nameserver. This happens when I setup VPN, stop it (in new NetworkManager applet (I'm on XFCE) there no longer is a separate item in the menu so I just uncheck the VPN entry) and I setup VPN again. The second time VPN is enabled, DNS just seems to act weird.
(In reply to Jan Pazdziora from comment #15) > Is it possible that this fix somehow got lost in NetworkManager-1.2.4-2.fc24. > > Even if I see dnsmasq[11494]: cleared cache shown in journal, I seem to be > getting the queries answered locally number rising with no query sent to the > VPN-ed upstream nameserver. > > This happens when I setup VPN, stop it (in new NetworkManager applet (I'm on > XFCE) there no longer is a separate item in the menu so I just uncheck the > VPN entry) and I setup VPN again. The second time VPN is enabled, DNS just > seems to act weird. If you execute: echo log-queries > /etc/NetworkManager/dnsmasq.d/log-queries then restart NM and connect to the VPN, you should see that dnsmasq queries the upstream servers again after the cache gets cleared: dnsmasq[21395]: cleared cache ... dnsmasq[21395]: query[A] test.com from 127.0.0.1 dnsmasq[21395]: forwarded test.com to 192.168.1.1 as opposed to: dnsmasq[21395]: cached test.com is 1.2.3.4 Is this correct?
(In reply to Beniamino Galvani from comment #16) > > If you execute: > > echo log-queries > /etc/NetworkManager/dnsmasq.d/log-queries Useful setting, thank you. > then restart NM and connect to the VPN, you should see that dnsmasq queries > the upstream servers again after the cache gets cleared: > > dnsmasq[21395]: cleared cache > ... > dnsmasq[21395]: query[A] test.com from 127.0.0.1 > dnsmasq[21395]: forwarded test.com to 192.168.1.1 > > as opposed to: > > dnsmasq[21395]: cached test.com is 1.2.3.4 > > Is this correct? Actually, I get dnsmasq[21395]: query[A] test.com from 127.0.0.1 and nothing else, while the VPN is on. When I turn the VPN off, I get back to the dnsmasq[18748]: query[A] test.com from 127.0.0.1 dnsmasq[18748]: forwarded test.com to 85.86.87.88 dnsmasq[18748]: reply test.com is NODATA-IPv4 So in case of the second VPN activation, I get neither "forwarder", nor "cached" line in the journal.
I see this behaviour both with dnsmasq-2.75-4.fc24.x86_64 and dnsmasq-2.76-1.fc24.x86_64, should dnsmasq be the culprit.
Hi, do you see NM sending the upstream VPN server list to dnsmasq in the logs? If the servers are correctly added, I think you are hitting bug 1367772, which makes dnsmasq fail to send requests out when a virtual device is destroyed and recreated (like in case of a VPN reconnection) due to wrong caching of request sockets.
(In reply to Beniamino Galvani from comment #19) > Hi, do you see NM sending the upstream VPN server list to dnsmasq in > the logs? Yes, there are records like dnsmasq[24083]: using nameserver 10.11.12.13#53 for domain example.com in journal. > If the servers are correctly added, I think you are hitting bug > 1367772, which makes dnsmasq fail to send requests out when a virtual > device is destroyed and recreated (like in case of a VPN reconnection) > due to wrong caching of request sockets. Thanks. I've been only experiencing the problematic behaviour for the past few days since I've upgraded by Fedora 24 installation. The Fedora 24 GA had dnsmasq-2.75-4.fc24.x86_64.rpm in it and I'm pretty sure things where working fine for a couple of weeks / months. Bug 1367772 says the issue is present on dnsmasq-2.66-17.el7.x86_64 ... but on Fedora 24 was working fine for a while. I do not rule out that it's the same bug ... but we had newer version of dnsmasq working fine for some time.
(In reply to Jan Pazdziora from comment #20) > > I've been only experiencing the problematic behaviour for the past few days > since I've upgraded by Fedora 24 installation. The Fedora 24 GA had > dnsmasq-2.75-4.fc24.x86_64.rpm in it and I'm pretty sure things where > working fine for a couple of weeks / months. Bug 1367772 says the issue is > present on dnsmasq-2.66-17.el7.x86_64 ... but on Fedora 24 was working fine > for a while. > > I do not rule out that it's the same bug ... but we had newer version of > dnsmasq working fine for some time. what are the versions of NetworkManager where you didn't have the issue, and which versions have the issue for you? in the past, NM would not reconfigure dnsmasq via D-Bus but respawn it every time. That only changed with releases 1.2.2 and 1.4.0.
I don't know the exact versions. But it worked fine on Fedora 24 for some time, and I've upgraded to Fedora 24 day after GA. Looking at http://dl.fedoraproject.org/pub/fedora/linux/releases/24/Everything/x86_64/os/Packages/n/ the GA version was NetworkManager-1.2.2-1.fc24.x86_64.rpm so I'd say that it worked with 1.2.2. Is there a way to go back to respawning dnsmasq with current NetworkManager?
Hi, I think the issue started after we added the interface name to nameservers sent to dnsmasq to prevent responses from the wrong interface. That was in 1.2.4. Currently there isn't a way to revert the old behavior. For Fedora either the dnsmasq package needs to be rebuilt with the fix, or we should revert the dnsmasq-interface commit. In the meanwhile, a ugly workaround would be to add a dispatcher script to reconfigure DNS when a VPN goes up: cat <<'EOF' > /etc/NetworkManager/dispatcher.d/99-dnsmasq-vpn.sh #!/bin/sh [ "$2" = vpn-up ] && killall -HUP NetworkManager exit 0 EOF chmod +x /etc/NetworkManager/dispatcher.d/99-dnsmasq-vpn.sh
(In reply to Beniamino Galvani from comment #23) > Hi, I think the issue started after we added the interface name to > nameservers sent to dnsmasq to prevent responses from the wrong > interface. That was in 1.2.4. I guess we need new bugzilla, right? No matter if it ends up on dnsmasq or on NetworkManager. Would you like to file that, since you seem to have some insight into what is actually wrong? Thank you, Jan
(In reply to Jan Pazdziora from comment #24) > (In reply to Beniamino Galvani from comment #23) > > Hi, I think the issue started after we added the interface name to > > nameservers sent to dnsmasq to prevent responses from the wrong > > interface. That was in 1.2.4. > > I guess we need new bugzilla, right? No matter if it ends up on dnsmasq or > on NetworkManager. Would you like to file that, since you seem to have some > insight into what is actually wrong? Hi, I think this must be fixed in dnsmasq. I've filed bug 1373485 for that.