Bug 1923913
Summary: | dnsmasq-2.76-16.el7_9.1 update breaks name lookups | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Laszlo Ersek <lersek> | ||||||
Component: | dnsmasq | Assignee: | Petr Menšík <pemensik> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Petr Sklenar <psklenar> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 7.9 | CC: | aegorenk, fkrska, jorton, jreznik, pemensik, psklenar, qguo | ||||||
Target Milestone: | rc | Keywords: | AutoVerified, Regression, Triaged, ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1953093 (view as bug list) | Environment: | |||||||
Last Closed: | 2021-07-21 01:09:55 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1896055 | ||||||||
Bug Blocks: | 1953093 | ||||||||
Attachments: |
|
Description
Laszlo Ersek
2021-02-02 08:32:48 UTC
Created attachment 1754273 [details] tarball of pcap files for comment#0 | Additional Info | (4) File list: rhbz-1923913/ rhbz-1923913/dnsmasq-2.76-16.el7.x86_64/ rhbz-1923913/dnsmasq-2.76-16.el7.x86_64/enp0s25.pcap rhbz-1923913/dnsmasq-2.76-16.el7.x86_64/wlp3s0.pcap rhbz-1923913/dnsmasq-2.76-16.el7_9.1.x86_64/ rhbz-1923913/dnsmasq-2.76-16.el7_9.1.x86_64/enp0s25.pcap rhbz-1923913/dnsmasq-2.76-16.el7_9.1.x86_64/wlp3s0.pcap Thanks for recorded traffic. I will try to reproduce it on VM and debug socket handling. According to dumps, response were received right away. But it seems for some reason, it were not accepted by dnsmasq. It might be related to hardening in recent CVE, where it checks incoming socket better than in previous versions. My attempts to reproduce it were not yet successful. There was another report on upstream mailing list [1] about broken retries, which might be related. Fixes are commits [3]. But they are not necessary related. 1. https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q1/014697.html 2. http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=305cb79c5754d5554729b18a2c06fe7ce699687a 3. http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=141a26f979b4bc959d8e866a295e24f8cf456920 Hi Petr, could you brew a scratch build with these patches backported? I'd be happy to test it. Thanks! Laszlo Hi Petr, unfortunately the latest backports don't help in my case (taskid 35194733 from comment 8). The symptom remains unchanged. Thanks Laszlo Thanks for quick verification. I finally tried installation of RHEL 7.9 on my Lenovo and were able to reliably reproduce the issue you saw too. It will take some time to examine it using debugger, but with debuggable reproducer, I should find reason for it in a few days. Please have a bit more patience. Created attachment 1761055 [details]
Patch enabling again resolution on two connected interfaces
Candidate patch fixing this issue. It relaxes checks in situation, when bound outgoing sockets are used. When used servers contain also interface name, outgoing queries are bound to them. This change allows reply on any bound socket, if bound socket were used when sending. List of used servers is not available, stricness is therefore reduced a bit.
Problem lies in two parts. dnsmasq sends queries to both servers for each interface. But each send overwrites sentto in frec record, therefore only the last send is accepted. But for yet unknown reason the last (wifi) response does not reach response socket, dnsmasq does not receive it. It only receives response over eth, but that is ignored by dnsmasq. Reason is mismatching frec->sentto->sfd->fd points only to last send, which is wifi only.
(In reply to Petr Menšík from comment #12) > Problem lies in two parts. dnsmasq sends queries to both servers for each > interface. But each send overwrites sentto in frec record, therefore only > the last send is accepted. But for yet unknown reason the last (wifi) > response does not reach response socket, dnsmasq does not receive it. It > only receives response over eth, but that is ignored by dnsmasq. Reason is > mismatching frec->sentto->sfd->fd points only to last send, which is wifi > only. Is this perhaps a consequence of the RHEL kernels using Strict Reverse Path filtering by default? See: <https://access.redhat.com/solutions/53031>. (In reply to Petr Menšík from comment #12) > Created attachment 1761055 [details] > Patch enabling again resolution on two connected interfaces Tested-by: Laszlo Ersek <lersek> (using the scratch build from comment 13) Sent a bit modified patch to upstream: https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q1/014789.html (In reply to Laszlo Ersek from comment #14) > (In reply to Petr Menšík from comment #12) > > Is this perhaps a consequence of the RHEL kernels using Strict Reverse Path > filtering by default? See: <https://access.redhat.com/solutions/53031>. I don't think it should fail this test on RHEL7. I think more it might be unfixed bug in kernel, SO_BINDTODEVICE handling seems primary suspect to me. dnsmasq userspace code is similar in related parts, so I doubt it was fixed on later versions. It works on RHEL8 and Fedora, unless I block responses over wifi via iptables. But when I do, it cannot switch to resolution over ethernet like it should. According to strace, the reply is received only by the first socket by dnsmasq, where it was ignored. It is never received over the other socket used by wifi for some reason. But would not hurt testing it. It should be something different from RHEL8 and Fedora. (In reply to Laszlo Ersek from comment #14) > > Is this perhaps a consequence of the RHEL kernels using Strict Reverse Path > filtering by default? See: <https://access.redhat.com/solutions/53031>. Okay, I were wrong and you were right. It seems turning off rp_filter on wireless device enables received packets also from the second socket used. When I did: sudo sysctl -w net/ipv4/conf/wlp0s20f3/rp_filter=0 and made few dns requests, it arrived from time to time also on fd 12. When set to 1, they again arrived only over fd 11, which is eth with active default route. This was merged upstream together with CVE-2021-3448[1] fix unfortunately no separate commit just for this issue exist. 1. http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=74d4fcd756a85bc1823232ea74334f7ccfb9d5d2 This bug is present only on version with CVE-2020-25684. (In reply to Petr Menšík from comment #24) > This bug is present only on version with CVE-2020-25684. with CVE-2020-25684 fix I meant, at least dnsmasq-2.76-16.el7_9.1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (dnsmasq bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2806 |