Bug 2188545
| Summary: | dnsmasq does not forward all of the received queries although it does in previous versions. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | rynakamu | ||||
| Component: | dnsmasq | Assignee: | Petr Menšík <pemensik> | ||||
| Status: | CLOSED MIGRATED | QA Contact: | rhel-cs-infra-services-qe <rhel-cs-infra-services-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 8.6 | CC: | pemensik, qguo | ||||
| Target Milestone: | rc | Keywords: | MigratedToJIRA, SupportQuestion | ||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-09-21 19:07:11 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
rynakamu
2023-04-21 06:25:54 UTC
That behaviour is quite intentional and is not an error. dnsmasq is a dns cache. It joins intentionally outgoing request received in parallel, until the server responds. Even without using a cache. The same answer is then forwarded to all joined clients. That is not going to change. That were introduced as a fix of vulnerability, do not have exact id at hand. The slower answer of upstream servers is, the more often this happens. Lower count of forwarded queries than received is okay. The only problem would be if any of clients did not receive a proper reply in such case. Unless that is the case, I would close this as not a bug. Is there any specific reason, why do you want to ensure outgoing query number matches? Is there another issue being tracked down? Hello Petr,
Thanks for your reply. In fact I also try to simulate the issue on Fedora 38 with the following packages:
---------------8< ---------------8< ---------------8< ---------------8< ---------------
# rpm -q dnsmasq bind-utils systemd kernel
dnsmasq-2.89-2.fc38.x86_64
bind-utils-9.18.13-1.fc38.x86_64
systemd-253.2-1.fc38.x86_64
kernel-6.2.9-300.fc38.x86_64
---------------8< ---------------8< ---------------8< ---------------8< ---------------
## Testing Procedure.
---------------8< ---------------8< ---------------8< ---------------8< ---------------
# tcpdump -n -nn -w /tmp/dump.cap port 53
# systemctl stop dnsmasq
# systemctl start dnsmasq
# systemctl status dnsmasq | grep Main
==> In my case, the PID is 800
# strace -ttTfvyy -s 2048 -o /tmp/dnsmasq.out -p 800
# for i in {1..5}; do dig google.com & done
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep google | grep "$i" | wc -l; done
# for i in {1..5}; do dig google.com & done
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep google | grep "$i" | wc -l; done
# for i in {1..5}; do dig google.com & done
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep google | grep "$i" | wc -l; done
# for i in {1..5}; do dig google.com & done
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep google | grep "$i" | wc -l; done
# for i in {1..5}; do dig google.com & done
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep google | grep "$i" | wc -l; done
# for i in {1..5}; do dig google.com & done
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep google | grep "$i" | wc -l; done
# for i in {1..5}; do dig google.com & done
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep google | grep "$i" | wc -l; done
# for i in {1..5}; do dig google.com & done
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep google | grep "$i" | wc -l; done
---------------8< ---------------8< ---------------8< ---------------8< ---------------
## My Observation
Seems dnsmasq basically follows the '1 query 1 forwarded' pattern when it's running in *stable* state. However, during its startup and the error handling, this rule is broken.
## From the strace output, the 'recvmsg' and 'sendto' syscalls are different:
---------------8< ---------------8< ---------------8< ---------------8< ---------------
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep "$i" | wc -l; done
196
189
---------------8< ---------------8< ---------------8< ---------------8< ---------------
## If we look into the dnsmasq logs, both are recorded:
---------------8< ---------------8< ---------------8< ---------------8< ---------------
# for i in recvmsg sendto; do cat /tmp/dnsmasq.out | grep "$i" | wc -l; done
196
189
# cat /tmp/dnsmasq.out | grep 'write(11' | grep query | wc -l
196
# cat /tmp/dnsmasq.out | grep 'write(11' | grep forwarded | wc -l
189
# cat /tmp/dnsmasq.out | grep 'write(11' | grep -v -e forwarded -e query
800 15:42:29.772770 write(11</var/log/dnsmasq.log>, "Apr 23 15:42:29 dnsmasq[800]: Maximum number of concurrent DNS queries reached (max: 150)\n", 90) = 90 <0.000019>
800 15:42:29.773097 write(11</var/log/dnsmasq.log>, "Apr 23 15:42:29 dnsmasq[800]: config error is REFUSED\n", 54) = 54 <0.000020>
---------------8< ---------------8< ---------------8< ---------------8< ---------------
## Digging into the log, when the dnsmasq starts there're 5 queries but only 1 is forwarded. Later all matches 1 query 1 forwarded pattern until the 'Maximum number of concurrent DNS queries reached (max: 150)' occurrs, after that, there's an incident that 3 queries vs 1 forwarded. Then all returns normal.
---------------8< ---------------8< ---------------8< ---------------8< ---------------
# cat /tmp/dnsmasq.log | awk '{print $5}' | uniq -c
1 started,
1 compile
1 read
1 using
1 query[A]
1 forwarded
5 query[A] <<<<<< 5 query
1 forwarded <<<<< 1 forwarded, gap=4
1 query[A]
1 forwarded
...
1 query[A]
1 forwarded
1 query[A]
1 Maximum <<<<<< maximum message displayed, no forwarded, gap=1
1 config
1 query[A]
1 forwarded
3 query[A] <<<<<< 3 query
1 forwarded <<<<< 1 forwarded, gap=2
1 query[A]
1 forwarded
...
1 query[A]
1 forwarded
---------------8< ---------------8< ---------------8< ---------------8< ---------------
total gap=4+1+2=7, matches 196-189=7
In this case, seems '1 query vs 1 forwarded' operation would be a normal behavior for a stably running dnsmasq. This is what the latest dnsmasq on RHEL7.9 and the latest dnsmasq on Fedora 38 follow. However, it's not the case for the latest dnsmasq of RHEL8.
Could you please explain the differences here? After comparing the source code, we're seeing the following patches may change the behavior:
- https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=141a26f979b4bc959d8e866a295e24f8cf456920
- https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=305cb79c5754d5554729b18a2c06fe7ce699687a
Thank you.
Best regards,
Flos
Ah, yes, those changes introduce roughly the same number of queries again. But their primary reason is *NOT* to have exactly the same count on incoming queries and outgoing queries. These queries return back correct retries in case the initial query forwarded to upstream were dropped by network. Because of dnsmasq design, dnsmasq does not do retries itself. I even cannot do that, because it stores just hash of incoming query. Not the query name and type themselves. So it relies on clients to do retries. Commits you mention ensure those retries are too forwarded again, making it more reliable on unreliable networks from dnsmasq to upstream servers. Later commit 64a16cb [1] again reduced the number of forwarded queries. It prevents dnsmasq clients ability to bombard any upstream server with queries sent to dnsmasq. Because dnsmasq is rate limiting (again) outgoing queries as it should. Still ensures retries are forwarded unless it were already retried less than 2 seconds ago. I think we want also commit 305cb79 [2]. So yes, we need to ensure retries work and are not ignored, but query count on internal network does not have to match forwarded count. That won't be fixed, because we want to prevent that. [1] http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=64a16cb376a5248a53fb55e81a8df4d61630d120 [2] http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=305cb79c5754d5554729b18a2c06fe7ce699687a Created attachment 1962321 [details]
python script to emulate packet drops
Attaching simple script, which emulates on address 127.0.0.2, port 2053, forwarder with unreliable connection. It forwards packets to localhost, port domain. But drops first packet, passes another then drops again.
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |