RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1923913 - dnsmasq-2.76-16.el7_9.1 update breaks name lookups
Summary: dnsmasq-2.76-16.el7_9.1 update breaks name lookups
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dnsmasq
Version: 7.9
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Petr Menšík
QA Contact: Petr Sklenar
URL:
Whiteboard:
Depends On: 1896055
Blocks: 1953093
TreeView+ depends on / blocked
 
Reported: 2021-02-02 08:32 UTC by Laszlo Ersek
Modified: 2023-04-20 03:53 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1953093 (view as bug list)
Environment:
Last Closed: 2021-07-21 01:09:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
tarball of pcap files for comment#0 | Additional Info | (4) (1.11 KB, application/x-xz)
2021-02-02 08:36 UTC, Laszlo Ersek
no flags Details
Patch enabling again resolution on two connected interfaces (1.88 KB, patch)
2021-03-05 20:57 UTC, Petr Menšík
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:2806 0 None None None 2021-07-21 01:09:59 UTC

Description Laszlo Ersek 2021-02-02 08:32:48 UTC
* Description of problem:

After upgrading dnsmasq on my RHEL-7.9 Workstation laptop:

> Updated dnsmasq-2.76-16.el7.x86_64     @rhel-7-workstation-rpms
> Update          2.76-16.el7_9.1.x86_64 @rhel-7-workstation-rpms

name lookups no longer work.


* Version-Release number of selected component (if applicable):

dnsmasq-2.76-16.el7_9.1.x86_64


* How reproducible:

100%


* Steps to Reproduce:

1. Run the command "ping redhat.com".


* Actual results:

> ping: redhat.com: Name or service not known


* Expected results:

> PING redhat.com (209.132.183.105) 56(84) bytes of data.
> 64 bytes from redirect.redhat.com (209.132.183.105): icmp_seq=1 ttl=230 time=176 ms


* Additional info:

(1) The symptom can be worked around by either one of the following
    steps:

(1.1) downgrading dnsmasq to 2.76-16.el7.x86_64, or

(1.2) changing the NetworkManager configuration to not use dnsmasq
      (i.e., removing "dns=dnsmasq" from the [main] section in
      "/etc/NetworkManager/NetworkManager.conf").

      In other words, it's only DNS that's broken. Given a known numeric
      IPv4 address, IPv4 traffic works fine.

(2) My laptop has two upstream links configured, Ethernet and WiFi:

> enp0s25: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         inet 192.168.0.7  netmask 255.255.255.0  broadcast 192.168.0.255
>         inet6 fe80::56ee:75ff:fe65:72c8  prefixlen 64  scopeid 0x20<link>
>         ether 54:ee:75:65:72:c8  txqueuelen 1000  (Ethernet)
>         RX packets 84659  bytes 96374871 (91.9 MiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 35654  bytes 5418882 (5.1 MiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 20  memory 0xb3a00000-b3a20000
>
> wlp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         inet 192.168.0.8  netmask 255.255.255.0  broadcast 192.168.0.255
>         inet6 fe80::124a:7dff:febe:944d  prefixlen 64  scopeid 0x20<link>
>         ether 10:4a:7d:be:94:4d  txqueuelen 1000  (Ethernet)
>         RX packets 29  bytes 2042 (1.9 KiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 21  bytes 2862 (2.7 KiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
> 0.0.0.0         192.168.0.1     0.0.0.0         UG    100    0        0 enp0s25
> 0.0.0.0         192.168.0.1     0.0.0.0         UG    600    0        0 wlp3s0

(3) The contents of "/etc/resolv.conf" are (regardless of dnsmasq
    version):

> # Generated by NetworkManager
> search usersys.redhat.com
> nameserver 127.0.0.1

(4) As instructed by Petr in
    <https://bugzilla.redhat.com/show_bug.cgi?id=1896055#c18>, I'm going
    to attach a tarball of pcap files in the next comment. The
    to-be-attached sets of packets were captured as follows:

(4.1) With the functional version (dnsmasq-2.76-16.el7.x86_64), start
      the following two commands in parallel:

> # tcpdump -i enp0s25 -n -w enp0s25.pcap port 53
> # tcpdump -i wlp3s0 -n -w wlp3s0.pcap port 53

(4.2) Run "ping redhat.com".

(4.3) Once "ping" reports the first ICMP ECHO REPLY, interrupt the
      "ping" command and both "tcpdump" commands (^C).

(4.4) Save the pcap files in a safe place.

(4.5) Upgrade dnsmasq to "dnsmasq-2.76-16.el7_9.1.x86_64" with "yum".

(4.6) Reboot the laptop.

(4.7) Start the same two "tcpdump" commands in parallel as seen in
      (4.1).

(4.8) Run "ping redhat.com", same as in (4.2).

(4.9) Once "ping" reports name lookup failure (i.e., "ping: redhat.com:
      Name or service not known"), interrupt both "tcpdump" commands
      (^C).

(4.10) Save the pcap files in a safe place, separately from those saved
       in (4.4).

Comment 3 Laszlo Ersek 2021-02-02 08:36:53 UTC
Created attachment 1754273 [details]
tarball of pcap files for comment#0 | Additional Info | (4)

File list:

rhbz-1923913/
rhbz-1923913/dnsmasq-2.76-16.el7.x86_64/
rhbz-1923913/dnsmasq-2.76-16.el7.x86_64/enp0s25.pcap
rhbz-1923913/dnsmasq-2.76-16.el7.x86_64/wlp3s0.pcap
rhbz-1923913/dnsmasq-2.76-16.el7_9.1.x86_64/
rhbz-1923913/dnsmasq-2.76-16.el7_9.1.x86_64/enp0s25.pcap
rhbz-1923913/dnsmasq-2.76-16.el7_9.1.x86_64/wlp3s0.pcap

Comment 5 Petr Menšík 2021-02-03 08:50:14 UTC
Thanks for recorded traffic. I will try to reproduce it on VM and debug socket handling. According to dumps, response were received right away. But it seems for some reason, it were not accepted by dnsmasq. It might be related to hardening in recent CVE, where it checks incoming socket better than in previous versions.

Comment 6 Petr Menšík 2021-02-26 18:14:59 UTC
My attempts to reproduce it were not yet successful. There was another report on upstream mailing list [1] about broken retries, which might be related. Fixes are commits [3]. But they are not necessary related.

1. https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q1/014697.html
2. http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=305cb79c5754d5554729b18a2c06fe7ce699687a
3. http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=141a26f979b4bc959d8e866a295e24f8cf456920

Comment 7 Laszlo Ersek 2021-03-01 15:32:12 UTC
Hi Petr,

could you brew a scratch build with these patches backported? I'd be happy to test it.

Thanks!
Laszlo

Comment 9 Laszlo Ersek 2021-03-02 10:37:51 UTC
Hi Petr,

unfortunately the latest backports don't help in my case (taskid 35194733 from comment 8). The symptom remains unchanged.

Thanks
Laszlo

Comment 10 Petr Menšík 2021-03-03 23:48:09 UTC
Thanks for quick verification. I finally tried installation of RHEL 7.9 on my Lenovo and were able to reliably reproduce the issue you saw too. It will take some time to examine it using debugger, but with debuggable reproducer, I should find reason for it in a few days. Please have a bit more patience.

Comment 12 Petr Menšík 2021-03-05 20:57:15 UTC
Created attachment 1761055 [details]
Patch enabling again resolution on two connected interfaces

Candidate patch fixing this issue. It relaxes checks in situation, when bound outgoing sockets are used. When used servers contain also interface name, outgoing queries are bound to them. This change allows reply on any bound socket, if bound socket were used when sending. List of used servers is not available, stricness is therefore reduced a bit.

Problem lies in two parts. dnsmasq sends queries to both servers for each interface. But each send overwrites sentto in frec record, therefore only the last send is accepted. But for yet unknown reason the last (wifi) response does not reach response socket, dnsmasq does not receive it. It only receives response over eth, but that is ignored by dnsmasq. Reason is mismatching frec->sentto->sfd->fd points only to last send, which is wifi only.

Comment 14 Laszlo Ersek 2021-03-08 15:11:06 UTC
(In reply to Petr Menšík from comment #12)

> Problem lies in two parts. dnsmasq sends queries to both servers for each
> interface. But each send overwrites sentto in frec record, therefore only
> the last send is accepted. But for yet unknown reason the last (wifi)
> response does not reach response socket, dnsmasq does not receive it. It
> only receives response over eth, but that is ignored by dnsmasq. Reason is
> mismatching frec->sentto->sfd->fd points only to last send, which is wifi
> only.

Is this perhaps a consequence of the RHEL kernels using Strict Reverse Path filtering by default? See: <https://access.redhat.com/solutions/53031>.

Comment 15 Laszlo Ersek 2021-03-08 15:20:50 UTC
(In reply to Petr Menšík from comment #12)
> Created attachment 1761055 [details]
> Patch enabling again resolution on two connected interfaces

Tested-by: Laszlo Ersek <lersek>

(using the scratch build from comment 13)

Comment 16 Petr Menšík 2021-03-09 00:27:06 UTC
Sent a bit modified patch to upstream:

https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q1/014789.html

Comment 17 Petr Menšík 2021-03-09 00:35:25 UTC
(In reply to Laszlo Ersek from comment #14)
> (In reply to Petr Menšík from comment #12)
> 
> Is this perhaps a consequence of the RHEL kernels using Strict Reverse Path
> filtering by default? See: <https://access.redhat.com/solutions/53031>.

I don't think it should fail this test on RHEL7. I think more it might be unfixed bug in kernel, SO_BINDTODEVICE handling seems primary suspect to me. dnsmasq userspace code is similar in related parts, so I doubt it was fixed on later versions. It works on RHEL8 and Fedora, unless I block responses over wifi via iptables. But when I do, it cannot switch to resolution over ethernet like it should. According to strace, the reply is received only by the first socket by dnsmasq, where it was ignored. It is never received over the other socket used by wifi for some reason. But would not hurt testing it. It should be something different from RHEL8 and Fedora.

Comment 18 Petr Menšík 2021-03-09 11:46:56 UTC
(In reply to Laszlo Ersek from comment #14)
> 
> Is this perhaps a consequence of the RHEL kernels using Strict Reverse Path
> filtering by default? See: <https://access.redhat.com/solutions/53031>.

Okay, I were wrong and you were right. It seems turning off rp_filter on wireless device enables received packets also from the second socket used.

When I did:
sudo sysctl -w net/ipv4/conf/wlp0s20f3/rp_filter=0

and made few dns requests, it arrived from time to time also on fd 12. When set to 1, they again arrived only over fd 11, which is eth with active default route.

Comment 19 Petr Menšík 2021-03-31 07:30:21 UTC
This was merged upstream together with CVE-2021-3448[1] fix unfortunately no separate commit just for this issue exist.

1. http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=74d4fcd756a85bc1823232ea74334f7ccfb9d5d2

Comment 24 Petr Menšík 2021-06-10 10:54:53 UTC
This bug is present only on version with CVE-2020-25684.

Comment 26 Petr Menšík 2021-06-10 12:35:34 UTC
(In reply to Petr Menšík from comment #24)
> This bug is present only on version with CVE-2020-25684.

with CVE-2020-25684 fix I meant, at least dnsmasq-2.76-16.el7_9.1

Comment 38 errata-xmlrpc 2021-07-21 01:09:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (dnsmasq bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2806


Note You need to log in before you can comment on or make changes to this bug.