2160466 – TCP Queries hang forever when an upstream server is not reachable

This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2160466 - TCP Queries hang forever when an upstream server is not reachable

Summary: TCP Queries hang forever when an upstream server is not reachable

Keywords:
Status:	CLOSED MIGRATED
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	dnsmasq
Sub Component:
Version:	8.7
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Petr Menšík
QA Contact:	rhel-cs-infra-services-qe
Docs Contact:
URL:	https://github.com/InfrastructureServ...
Whiteboard:
Depends On:
Blocks:	2181244
TreeView+	depends on / blocked

Reported:	2023-01-12 13:49 UTC by Renaud Métrich
Modified:	2024-01-20 04:25 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2181244 (view as bug list)
Environment:
Last Closed:	2023-09-21 18:20:09 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	redhat/centos-stream/rpms dnsmasq merge_requests 17	None	opened	Draft: Report TCP changed server to master process	2023-06-01 19:43:38 UTC
Red Hat Issue Tracker	RHEL-6513	None	Migrated	None	2023-09-21 18:20:07 UTC
Red Hat Issue Tracker	RHELPLAN-144958	None	None	None	2023-01-12 13:52:41 UTC
Red Hat Knowledge Base (Solution)	6325261	None	None	None	2023-04-17 12:36:06 UTC

Description Renaud Métrich 2023-01-12 13:49:53 UTC

Description of problem:

A customer is using dnsmasq with 3 upstream servers.
When one of them is not reachable, queries hang until they time out.
This happens even though --all-servers is used, which is supposed to send the query to all servers concurrently, at least from the manpage:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
       --all-servers
              By  default,  when  dnsmasq has more than one upstream server available, it will send queries to just
              one server. Setting this flag forces dnsmasq to send all queries to all available servers. The  reply
              from the server which answers first will be returned to the original requester.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Stracing dnsmasq, we can see indeed that it hangs on connect() until the daemon was killed:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
8731  14:27:08.458095 connect(13<TCP:[432416]>, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("1.2.3.4")}, 16 <unfinished ...>
 :
8731  14:29:05.145298 <... connect resumed>) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <116.687148>
8731  14:29:05.145373 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Version-Release number of selected component (if applicable):

dnsmasq-2.79-24.el8.x86_64 (also seen on RHEL9 dnsmasq-2.85-5.el9.x86_64)

How reproducible:

Always

Steps to Reproduce:
1. Setup dnsmasq with upstream servers 192.168.122.1 (my VM gateway) and 1.2.3.4 (not reachable)

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # dnsmasq -k --conf-file=/dev/null --port 2053 --server 192.168.122.1 --server 1.2.3.4 -i lo -z --all-servers --no-resolv --no-hosts
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Query using *dig*

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # dig +tcp @localhost -p 2053 srv foo.bar
  
  ; <<>> DiG 9.11.36-RedHat-9.11.36-5.el8_7.2 <<>> +tcp @localhost -p 2053 srv foo.bar
  ; (2 servers found)
  ;; global options: +cmd
  ;; connection timed out; no servers could be reached
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Actual results:

Time out, no result

Expected results:

Some result

Additional info:

When inversing --server options (--server 1.2.3.4 --server 192.168.122.1), we see the query being answered immediately, which "proves" 192.168.122.1 is queried first, and for sure nothing is queried in parallel.

ss shows that both children query the same server, which is not reachable:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# ss -anp | grep SYN
tcp   SYN-SENT   0      1                                             192.168.122.184:56355             1.2.3.4:53     users:(("dnsmasq",pid=9373,fd=13))                                                                                                                             
tcp   SYN-SENT   0      1                                             192.168.122.184:57173             1.2.3.4:53     users:(("dnsmasq",pid=9370,fd=13))
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

This happens because dig internally retries the query upon not getting any result.

Comment 1 Renaud Métrich 2023-01-12 14:46:12 UTC

Clearly there is no parallelism in the tcp_request() code, servers are queried sequentially and there is no timeout handling either (socket is in blocking mode in particular).

Checking Upstream, I can see a complete rewrite of this code, which now brings concurrency:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
commit 12a9aa7c628e2d7dcd34949603848a3fb53fce9c
Author: Simon Kelley <simon.uk>
Date:   Tue Jun 8 22:10:55 2021 +0100

    Major rewrite of the DNS server and domain handling code.
...
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 4 Petr Menšík 2023-01-23 14:24:00 UTC

(In reply to Renaud Métrich from comment #1)
> Clearly there is no parallelism in the tcp_request() code, servers are
> queried sequentially and there is no timeout handling either (socket is in
> blocking mode in particular).
> 
> Checking Upstream, I can see a complete rewrite of this code, which now
> brings concurrency:
> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8<
> --------
> commit 12a9aa7c628e2d7dcd34949603848a3fb53fce9c
> Author: Simon Kelley <simon.uk>
> Date:   Tue Jun 8 22:10:55 2021 +0100
> 
>     Major rewrite of the DNS server and domain handling code.
> ...
> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8<
> --------

This rewrite introduced several regressions and I doubt it is safe to be backported or rebased into RHEL9 or RHEL8. Parallel queries work with UDP test and that is also tested in our tests. But sure, case using just TCP is not tested well and quite possible to have mentioned issues. Yes, handling of queries over TCP and UDP has very different code path.

Comment 6 Petr Menšík 2023-01-23 18:14:25 UTC

It seems to me those versions work very similarly. There is one difference however. Both version 2.79 in RHEL8 and 2.85 in RHEL9 uses the last --server first, then tries them in reverse order. On the other hand 2.88 from Fedora tries the first --server as first, then in forward order. If the same parameters are passed and only one server works well, then results are different.

127.0.0.1 has listening forwarder,
127.0.0.83 has closed port,
10.0.137.114 drops all incoming requests, timeouts.

Tried it on example:
# 2.79+2.85 timeouts
src/dnsmasq -d --log-queries --port 2053 --no-resolv --server=127.0.0.1 --server=127.0.0.83 --server=10.0.137.114
# 2.88 responds in time
dnsmasq -d --log-queries --port 2053 --no-resolv --server=127.0.0.1 --server=127.0.0.83 --server=10.0.137.114
# 2.88 timeouts too
dnsmasq -d --log-queries --port 2053 --no-resolv --server=127.0.0.83 --server=10.0.137.114 --server=127.0.0.1

I do not think --all-servers makes any difference here. The problem is there is no short timeout used and I have seen no code to allow multiple TCP queries done in parallel. It always does those queries sequentially, which does not work well for TCP queries. Dnsmasq does not seem to be able to do such thing even in latest master branch on upstream git repository.

Comment 7 Petr Menšík 2023-01-24 12:30:44 UTC

I think there are 2 main issues:

- the forwarded TCP queries use default system timeout for TCP connection. That takes over 130 seconds on my Fedora 36 system and is way too much. It should be lowered significantly to maximum of 30 seconds, ideally configurable runtime. For common deployments 5s timeout should be enough.
- unlike UDP packets failures, TCP-only queries fail do not trigger use of another server next time. Next query will start the same IP the previous started. Only UDP query failure can trigger different forwarder order. TCP is handled in own forked instance and failed queries results are not sent to main instance. If clients would use TCP-only queries for any reason, dnsmasq would not prefer working resolver over failing one the same way it does for UDP.

So far it seems those issues are not handled significantly better way in the most recent dnsmasq release. For networks with many clients I think more complex DNS resolver such as unbound would do a better job. Maybe it should redirect just authoritative requests for domain handled by dnsmasq, but recursive part on the internet should be handled by better implementation.

Comment 9 Petr Menšík 2023-01-24 14:14:45 UTC

According to IPA maintainers, Kerberos libraries are using res_search calls from libresolv.

It should handle "options edns0" used in /etc/resolv.conf, which is not the default. That would reduce the need for TCP queries, because even responses larger than 512 bytes would get accepted. Therefore it wouldn't need to fallback to TCP queries so often for common responses. That might help, but would have to be configured on client machines doing kinit.

Comment 11 Petr Menšík 2023-05-17 12:38:30 UTC

I have been digging in dnsmasq and found how it forwards items to be inserted into the cache from tcp forked processes. First catch is this is done different way when -d parameter is used. In that way forked processes are not created, blocking handling from the main thread happens instead.

In src/cache.c, cache_end_insert() function contains serialization of received data from the tcp socket back into the parent process, which handles UDP packets. I think this socket could be reused to report also failures of TCP servers, which could then switch to other server instead. That way it would still fail the first query forwarding, but another retry on new connection should try other server first. Next time it would try working server instead. A bit limiting the easy implementation is that F_IMMORTAL and other flags in struct crec already use all bits possible in flags field. So just adding a new flag to list of forwarded entries in existing format does not seem easy nor self-contained.

Alternative approach, which should improve reported behaviour significantly, would be generating UDP query and sending it from TCP handler process to main process. If that is done on connection failure during TCP forwarding, it would ensure UDP responsiveness is reevaluated for all forwarders and responding one should be chosen. There are corner cases where the issues are just with TCP, but not UDP. Such as misconfigured firewall. But for normal network caused delays that should work well, while remaining relative simple self-contained change. Current implementation requires clients to do UDP query themselves, but if that does not happen, order of tried servers would not change.

Comment 12 Petr Menšík 2023-05-22 08:58:53 UTC

Workaround patch sent upstream:
https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2023q2/017097.html

Comment 13 Petr Menšík 2023-05-25 19:29:41 UTC

Simplified reproducer:

Let's create hepothetical network issue with one forwarder, which worked fine a while ago.

$ sudo iptables -I INPUT -i lo -d 127.0.0.255 -j DROP

Now start dnsmasq and send tcp query to it

# in 2.89 put the broken server to be tried first.
$ dnsmasq -d --log-queries --port 2053 --no-resolv --conf-file=/dev/null --server=127.0.0.255 --server=127.0.0.1
$ dig +tcp @localhost -p 2053 test

# retry few times
$ time for TRY in {1..8}; do dig +tcp @localhost -p 2053 test; done

# it should answer eventually, because we are using -d parameter. With -k parameter instead, it will never get an answer!
$ dnsmasq -k --log-queries --port 2053 --no-resolv --conf-file=/dev/null --server=127.0.0.255 --server=127.0.0.1
$ time for TRY in {1..8}; do dig +tcp @localhost -p 2053 test; done

This will wait 5 minutes, without ever getting any answer. Not even SERVFAIL, just nothing.

Comment 14 Petr Menšík 2023-05-25 19:36:57 UTC

Sent another explanation to upstream, hopefully more detailed. Just comment #13 with details on top.
https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2023q2/017118.html

Comment 15 Petr Menšík 2023-05-26 22:57:31 UTC

Pushed work-in-progress branch, where I try to serialize domain, address and array index of new last_server.
https://github.com/InfrastructureServices/dnsmasq/tree/dns-tcp-timeout

Sending UDP query to master process will not work, because in case it is already cached, it does not trigger forwarding to upstream. Trying some hacks to ensure forwarded query anyway for example by CD bit, might help, but that is just hack anyway. I found TCP reading overwrites the query in stored buffer. If for some reason only partial response arrives and then connection breaks, I think dnsmasq will try to retry with invalid DNS query on another forwarder. Haven't tried to reproduce that.

Anyway, I serialize both name and address in addition to index. In rare case when servers were updated runtime during processing of TCP request, it should prevent switching to invalid server entry in case they have changed in mean time. Not yet tested it properly.

Comment 16 Petr Menšík 2023-05-31 11:42:26 UTC

Sent set of 7 patches to upstream for review. Multiple issues found. Added also separate TCP last server, which may need tweaks.

https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2023q2/017131.html

Comment 17 Petr Menšík 2023-06-01 19:43:39 UTC

Created version for c9s branch. It differs in some significant parts and depends on downstream-only changes we have, but seems to be solving the problem similar way.

I would like first some feedback on original proposal for the upstream master branch, but at least have some working prototype.

Comment 18 Petr Menšík 2023-06-02 15:21:45 UTC

Simon Kelley implemented timeout reduction a different way:
https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;h=50adf82199c362da6c542f1d22be2eeab7481211

TCP_SYNCNT is used to reduce SYN packets sent.

Comment 19 RHEL Program Management 2023-09-21 18:19:49 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 20 RHEL Program Management 2023-09-21 18:20:09 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Comment 21 Red Hat Bugzilla 2024-01-20 04:25:37 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.