Bug 1400909
Summary: | DNS response SERVFAIL fails resolving, instead of trying the next DNS server | ||
---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | Kamil Páral <kparal> |
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
Status: | CLOSED DEFERRED | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | unspecified | CC: | agedosier, berrange, clalancette, crobinso, itamar, kparal, laine, libvirt-maint, simon, sjr, veillard, virt-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-12-17 12:12:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Kamil Páral
2016-12-02 09:35:44 UTC
laine, any thoughts on this? So the guests are of course only pointing at 192.168.122.1 for DNS, meaning that the guest can't possibly look to a secondary DNS if the primary fails. Are you saying, then, that the dnsmasq that is responding to the query from the guest should look to a secondary? Is there a dnsmasq option to control that? If so, possibly libvirt should set that option, but beyond that it's really out of libvirt's control. Or did I misunderstand your issue? The service that is running on the host (is that dnsmasq?) should try the next dns server if the first one fails, and only then return the result to the guest. I don't know how libvirt communicates with the host regarding dns. But it seems very weird that if the primary dns fails, I can still ping/wget/use firefox just fine on the host, but I have completely non-functional network in the guest. all DNS duties for libvirt networks are handled by a dnsmasq process started by libvirt, so any idiosyncracies in responses to DNS requests would be a result of the dnsmasq conf file created by libvirt and dnsmasq's own code. Looking through dnsmasq.conf, I see the "strict-order" option which seems to control how dnsmasq deals with multiple servers. libvirt always adds this option, and has done so since libvirt-0.2.3 (released sometime in 2007). Here's what is said about strict-order in dnsmasq.conf: # By default, dnsmasq will send queries to any of the upstream # servers it knows about and tries to favour servers to are known # to be up. Uncommenting this forces dnsmasq to try each query # with each server strictly in the order they appear in # /etc/resolv.conf #strict-order and here's what was said in the comments of libvirt commit 6a12fee1, which added --strict-order to libvirt's invocations of dnsmasq: + /* + * Needed to ensure dnsmasq uses same algorithm for processing + * multiple nameserver entries in /etc/resolv.conf as GLibC. + */ + APPEND_ARG(*argv, i++, "--strict-order"); To see if changing this option causes the behavior you desire, can you try doing the following: 1) edit the file /var/lib/libvirt/dnsmasq/${netname}.conf (where ${netname} is the name of the libvirt network you're connecting to), remove the line that says "strict-order", and save the file. 2) ps -AlF | grep dnsmasq | grep ${netname} to learn the pid and full commandline of the dnsmasq process. 3) kill ${dnsmasq-pid} 4) re-run exactly the same commandline you saw in the ps output your network will be in a strange mode where libvirt will no longer know the pid of dnsmasq, so if you attempt to destroy the network it won't be able to kill dnsmasq, *but* in the meantime you'll have a dnsmasq running without "strict-order", and you can try the same query that was failing before. Note that it's also possible you're just seeing a bug in dnsmasq behavior. I'm Cc'ing Simon (dnsmasq author) to see if he has anything to add. Also, maybe Daniel Berrange has something to say (he's the person who added "--strict-order" to libvirt's invocations of dnsmasq all the way back in 2007). It could be that we can't even consider removing strict-order for some reason I'm not aware of. (In reply to Laine Stump from comment #4) > + /* > + * Needed to ensure dnsmasq uses same algorithm for processing > + * multiple nameserver entries in /etc/resolv.conf as GLibC. > + */ > + APPEND_ARG(*argv, i++, "--strict-order"); Assuming all the system tools like ping or even web browsers use default glibc default behavior, the comment doesn't seem to be correct, or there's indeed a bug in --strict-order implementation (or in description what it means). It seems that the current implementation returns the response of the first dns server, even if it's a failure, and doesn't try the next one. > To see if changing this option causes the behavior you desire, can you try > doing the following: I'm sorry, our DNS server is no longer misconfigured (it was a temporary issue, due to which I discovered this problem), and I have no idea how to emulate that (I have very little networking knowledge). I believe the option you're looking for is: --all-servers By default, when dnsmasq has more than one upstream server available, it will send queries to just one server. Setting this flag forces dnsmasq to send all queries to all available servers. The reply from the server which answers first will be returned to the original requester. This problem also happens on OCP, when one of the servers doesn't have the answer returns it and doesn't queries the next DNS server. I haven't tried yet the above option but sounds like the solution. Thank you for reporting this issue to the libvirt project. Unfortunately we have been unable to resolve this issue due to insufficient maintainer capacity and it will now be closed. This is not a reflection on the possible validity of the issue, merely the lack of resources to investigate and address it, for which we apologise. If you none the less feel the issue is still important, you may choose to report it again at the new project issue tracker https://gitlab.com/libvirt/libvirt/-/issues The project also welcomes contribution from anyone who believes they can provide a solution. |