Bug 2160466
| Summary: | TCP Queries hang forever when an upstream server is not reachable | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> | |
| Component: | dnsmasq | Assignee: | Petr Menšík <pemensik> | |
| Status: | CLOSED MIGRATED | QA Contact: | rhel-cs-infra-services-qe <rhel-cs-infra-services-qe> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 8.7 | CC: | fkrska, horst.thaller, jfindysz, jorton, tmihinto | |
| Target Milestone: | rc | Keywords: | MigratedToJIRA, Reproducer, Triaged | |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | All | |||
| OS: | Linux | |||
| URL: | https://github.com/InfrastructureServices/dnsmasq/tree/dns-tcp-timeout | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2181244 (view as bug list) | Environment: | ||
| Last Closed: | 2023-09-21 18:20:09 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2181244 | |||
Clearly there is no parallelism in the tcp_request() code, servers are queried sequentially and there is no timeout handling either (socket is in blocking mode in particular).
Checking Upstream, I can see a complete rewrite of this code, which now brings concurrency:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
commit 12a9aa7c628e2d7dcd34949603848a3fb53fce9c
Author: Simon Kelley <simon.uk>
Date: Tue Jun 8 22:10:55 2021 +0100
Major rewrite of the DNS server and domain handling code.
...
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(In reply to Renaud Métrich from comment #1) > Clearly there is no parallelism in the tcp_request() code, servers are > queried sequentially and there is no timeout handling either (socket is in > blocking mode in particular). > > Checking Upstream, I can see a complete rewrite of this code, which now > brings concurrency: > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- > commit 12a9aa7c628e2d7dcd34949603848a3fb53fce9c > Author: Simon Kelley <simon.uk> > Date: Tue Jun 8 22:10:55 2021 +0100 > > Major rewrite of the DNS server and domain handling code. > ... > -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< > -------- This rewrite introduced several regressions and I doubt it is safe to be backported or rebased into RHEL9 or RHEL8. Parallel queries work with UDP test and that is also tested in our tests. But sure, case using just TCP is not tested well and quite possible to have mentioned issues. Yes, handling of queries over TCP and UDP has very different code path. It seems to me those versions work very similarly. There is one difference however. Both version 2.79 in RHEL8 and 2.85 in RHEL9 uses the last --server first, then tries them in reverse order. On the other hand 2.88 from Fedora tries the first --server as first, then in forward order. If the same parameters are passed and only one server works well, then results are different. 127.0.0.1 has listening forwarder, 127.0.0.83 has closed port, 10.0.137.114 drops all incoming requests, timeouts. Tried it on example: # 2.79+2.85 timeouts src/dnsmasq -d --log-queries --port 2053 --no-resolv --server=127.0.0.1 --server=127.0.0.83 --server=10.0.137.114 # 2.88 responds in time dnsmasq -d --log-queries --port 2053 --no-resolv --server=127.0.0.1 --server=127.0.0.83 --server=10.0.137.114 # 2.88 timeouts too dnsmasq -d --log-queries --port 2053 --no-resolv --server=127.0.0.83 --server=10.0.137.114 --server=127.0.0.1 I do not think --all-servers makes any difference here. The problem is there is no short timeout used and I have seen no code to allow multiple TCP queries done in parallel. It always does those queries sequentially, which does not work well for TCP queries. Dnsmasq does not seem to be able to do such thing even in latest master branch on upstream git repository. I think there are 2 main issues: - the forwarded TCP queries use default system timeout for TCP connection. That takes over 130 seconds on my Fedora 36 system and is way too much. It should be lowered significantly to maximum of 30 seconds, ideally configurable runtime. For common deployments 5s timeout should be enough. - unlike UDP packets failures, TCP-only queries fail do not trigger use of another server next time. Next query will start the same IP the previous started. Only UDP query failure can trigger different forwarder order. TCP is handled in own forked instance and failed queries results are not sent to main instance. If clients would use TCP-only queries for any reason, dnsmasq would not prefer working resolver over failing one the same way it does for UDP. So far it seems those issues are not handled significantly better way in the most recent dnsmasq release. For networks with many clients I think more complex DNS resolver such as unbound would do a better job. Maybe it should redirect just authoritative requests for domain handled by dnsmasq, but recursive part on the internet should be handled by better implementation. According to IPA maintainers, Kerberos libraries are using res_search calls from libresolv. It should handle "options edns0" used in /etc/resolv.conf, which is not the default. That would reduce the need for TCP queries, because even responses larger than 512 bytes would get accepted. Therefore it wouldn't need to fallback to TCP queries so often for common responses. That might help, but would have to be configured on client machines doing kinit. I have been digging in dnsmasq and found how it forwards items to be inserted into the cache from tcp forked processes. First catch is this is done different way when -d parameter is used. In that way forked processes are not created, blocking handling from the main thread happens instead. In src/cache.c, cache_end_insert() function contains serialization of received data from the tcp socket back into the parent process, which handles UDP packets. I think this socket could be reused to report also failures of TCP servers, which could then switch to other server instead. That way it would still fail the first query forwarding, but another retry on new connection should try other server first. Next time it would try working server instead. A bit limiting the easy implementation is that F_IMMORTAL and other flags in struct crec already use all bits possible in flags field. So just adding a new flag to list of forwarded entries in existing format does not seem easy nor self-contained. Alternative approach, which should improve reported behaviour significantly, would be generating UDP query and sending it from TCP handler process to main process. If that is done on connection failure during TCP forwarding, it would ensure UDP responsiveness is reevaluated for all forwarders and responding one should be chosen. There are corner cases where the issues are just with TCP, but not UDP. Such as misconfigured firewall. But for normal network caused delays that should work well, while remaining relative simple self-contained change. Current implementation requires clients to do UDP query themselves, but if that does not happen, order of tried servers would not change. Workaround patch sent upstream: https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2023q2/017097.html Simplified reproducer:
Let's create hepothetical network issue with one forwarder, which worked fine a while ago.
$ sudo iptables -I INPUT -i lo -d 127.0.0.255 -j DROP
Now start dnsmasq and send tcp query to it
# in 2.89 put the broken server to be tried first.
$ dnsmasq -d --log-queries --port 2053 --no-resolv --conf-file=/dev/null --server=127.0.0.255 --server=127.0.0.1
$ dig +tcp @localhost -p 2053 test
# retry few times
$ time for TRY in {1..8}; do dig +tcp @localhost -p 2053 test; done
# it should answer eventually, because we are using -d parameter. With -k parameter instead, it will never get an answer!
$ dnsmasq -k --log-queries --port 2053 --no-resolv --conf-file=/dev/null --server=127.0.0.255 --server=127.0.0.1
$ time for TRY in {1..8}; do dig +tcp @localhost -p 2053 test; done
This will wait 5 minutes, without ever getting any answer. Not even SERVFAIL, just nothing.
Sent another explanation to upstream, hopefully more detailed. Just comment #13 with details on top. https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2023q2/017118.html Pushed work-in-progress branch, where I try to serialize domain, address and array index of new last_server. https://github.com/InfrastructureServices/dnsmasq/tree/dns-tcp-timeout Sending UDP query to master process will not work, because in case it is already cached, it does not trigger forwarding to upstream. Trying some hacks to ensure forwarded query anyway for example by CD bit, might help, but that is just hack anyway. I found TCP reading overwrites the query in stored buffer. If for some reason only partial response arrives and then connection breaks, I think dnsmasq will try to retry with invalid DNS query on another forwarder. Haven't tried to reproduce that. Anyway, I serialize both name and address in addition to index. In rare case when servers were updated runtime during processing of TCP request, it should prevent switching to invalid server entry in case they have changed in mean time. Not yet tested it properly. Sent set of 7 patches to upstream for review. Multiple issues found. Added also separate TCP last server, which may need tweaks. https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2023q2/017131.html Created version for c9s branch. It differs in some significant parts and depends on downstream-only changes we have, but seems to be solving the problem similar way. I would like first some feedback on original proposal for the upstream master branch, but at least have some working prototype. Simon Kelley implemented timeout reduction a different way: https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;h=50adf82199c362da6c542f1d22be2eeab7481211 TCP_SYNCNT is used to reduce SYN packets sent. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |
Description of problem: A customer is using dnsmasq with 3 upstream servers. When one of them is not reachable, queries hang until they time out. This happens even though --all-servers is used, which is supposed to send the query to all servers concurrently, at least from the manpage: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- --all-servers By default, when dnsmasq has more than one upstream server available, it will send queries to just one server. Setting this flag forces dnsmasq to send all queries to all available servers. The reply from the server which answers first will be returned to the original requester. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Stracing dnsmasq, we can see indeed that it hangs on connect() until the daemon was killed: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 8731 14:27:08.458095 connect(13<TCP:[432416]>, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("1.2.3.4")}, 16 <unfinished ...> : 8731 14:29:05.145298 <... connect resumed>) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <116.687148> 8731 14:29:05.145373 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} --- -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Version-Release number of selected component (if applicable): dnsmasq-2.79-24.el8.x86_64 (also seen on RHEL9 dnsmasq-2.85-5.el9.x86_64) How reproducible: Always Steps to Reproduce: 1. Setup dnsmasq with upstream servers 192.168.122.1 (my VM gateway) and 1.2.3.4 (not reachable) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # dnsmasq -k --conf-file=/dev/null --port 2053 --server 192.168.122.1 --server 1.2.3.4 -i lo -z --all-servers --no-resolv --no-hosts -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 2. Query using *dig* -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # dig +tcp @localhost -p 2053 srv foo.bar ; <<>> DiG 9.11.36-RedHat-9.11.36-5.el8_7.2 <<>> +tcp @localhost -p 2053 srv foo.bar ; (2 servers found) ;; global options: +cmd ;; connection timed out; no servers could be reached -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Actual results: Time out, no result Expected results: Some result Additional info: When inversing --server options (--server 1.2.3.4 --server 192.168.122.1), we see the query being answered immediately, which "proves" 192.168.122.1 is queried first, and for sure nothing is queried in parallel. ss shows that both children query the same server, which is not reachable: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # ss -anp | grep SYN tcp SYN-SENT 0 1 192.168.122.184:56355 1.2.3.4:53 users:(("dnsmasq",pid=9373,fd=13)) tcp SYN-SENT 0 1 192.168.122.184:57173 1.2.3.4:53 users:(("dnsmasq",pid=9370,fd=13)) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- This happens because dig internally retries the query upon not getting any result.