Version-Release number of selected component: dnsmasq-2.86-1.fc34 Additional info: reporter: libreport-2.15.2 backtrace_rating: 4 cmdline: /usr/sbin/dnsmasq crash_function: whine_malloc executable: /usr/sbin/dnsmasq journald_cursor: s=be6b2551834d4cd9a0f0ac7a5060cb92;i=bfaa;b=9a694a8b781046a7894c8dfffd73c989;m=5d0042c1c;t=5cc73728c6873;x=72e80c5246b823c2 kernel: 5.13.16-200.fc34.x86_64 rootdir: / runlevel: N 5 type: CCpp uid: 994 Truncated backtrace: Thread no. 1 (5 frames) #6 whine_malloc at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/util.c:316 #7 get_new_frec at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:2478 #8 forward_query at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:281 #9 receive_query at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:1640 #10 check_dns_listeners at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/dnsmasq.c:1817
Created attachment 1824954 [details] File: backtrace
Created attachment 1824955 [details] File: cgroup
Created attachment 1824956 [details] File: core_backtrace
Created attachment 1824957 [details] File: cpuinfo
Created attachment 1824958 [details] File: dso_list
Created attachment 1824959 [details] File: environ
Created attachment 1824961 [details] File: limits
Created attachment 1824962 [details] File: maps
Created attachment 1824963 [details] File: mountinfo
Created attachment 1824964 [details] File: open_fds
Created attachment 1824965 [details] File: proc_pid_status
Created attachment 1824966 [details] File: var_log_messages
Cockpit's CI just found the same, in the Fedora CoreOS image refresh [1]. systemd-coredump[14614]: Process 14290 (dnsmasq) of user 985 dumped core. Stack trace of thread 14290: #0 0x000055b86e5ce256 lookup_domain (dnsmasq + 0x53256) #1 0x000055b86e59ea3a forward_query.lto_priv.0 (dnsmasq + 0x23a3a) #2 0x000055b86e5a35d0 check_dns_listeners (dnsmasq + 0x285d0) #3 0x000055b86e587b00 main (dnsmasq + 0xcb00) #4 0x00007f1bfd9e7b75 __libc_start_main (libc.so.6 + 0x27b75) #5 0x000055b86e58848e _start (dnsmasq + 0xd48e) [1] https://github.com/cockpit-project/bots/pull/2434
It seems serious. Do we have any coredump file to analyse? It seems memory allocation is corrupted. Which would be quite tricky to find. It might be related to Dominik's analysis [1] reported to upstream. Backtrace included is not sufficient, but at least know errors should be catched. 1. https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q3/015594.html
Can somone remove my "mountinfo" attachment? Then I could remove the "private" flag from this bug. Sorry, should have spotted this before submitting. I have 3 core dumps that we generated in the last 3 days: $ ls -hgo /var/lib/systemd/coredump/core.dnsmasq* -rw-r-----. 1 144K Sep 22 22:00 /var/lib/systemd/coredump/core.dnsmasq.994.96590cd8b0ea42d29287cf0373b18048.395655.1632340828000000.zst -rw-r-----. 1 148K Sep 22 19:19 /var/lib/systemd/coredump/core.dnsmasq.994.96590cd8b0ea42d29287cf0373b18048.366314.1632331150000000.zst -rw-r-----. 1 142K Sep 20 22:51 /var/lib/systemd/coredump/core.dnsmasq.994.9a694a8b781046a7894c8dfffd73c989.45881.1632171098000000.zst
Thanks! Quite useful. I have hidden them from public. Coredump 3 from comment #16 (gdb) bt #0 allocate_rfd (fdlp=fdlp@entry=0x55c1a4bb9380, serv=serv@entry=0x0) at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:2238 #1 0x000055c1a461d563 in forward_query (udpfd=4, udpaddr=0x7fff04da1220, dst_addr=0x7fff04da11f0, dst_iface=1, header=0x55c1a4bb55e0, plen=49, limit=0x55c1a4bb57e0 "", now=1632340828, forward=0x55c1a4bb9330, ad_reqd=0, do_bit=0) at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:451 #2 0x000055c1a46225d0 in receive_query (now=1632340828, listen=0x55c1a4bb42d0) at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:1640 #3 check_dns_listeners (now=1632340828) at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/dnsmasq.c:1817 #4 0x000055c1a4606b00 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/dnsmasq.c:1244 They are clearly linked to linked issue. I think upstream commit de372d6 [1] should fix the issue. It obviously hits extracts wrong record from serverarray, because start is after last. (gdb) p start $5 = 36 (gdb) p last $6 = 35 (gdb) p /x dnsmasq_daemon->serverarray[last]->flags $8 = 0x900 (gdb) p /x dnsmasq_daemon->serverarray[start] $9 = 0x0 Last points to SERV_IS_LOCAL record, which should never been passed to allocate_rfd. 1. http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=de372d6914ae20a1f9997815f258efbf3b14c39b
FEDORA-2021-03b9f525f0 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-03b9f525f0
FEDORA-2021-5945df5d64 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-5945df5d64
Christian, Would it be problem to attach also configuration of dnsmasq, which generated these errors? Were any block lists used?
FEDORA-2021-03b9f525f0 has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-03b9f525f0` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-03b9f525f0 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-5945df5d64 has been pushed to the Fedora 35 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-5945df5d64` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-5945df5d64 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
(In reply to Petr Menšík from comment #23) > Christian, Would it be problem to attach also configuration of dnsmasq, > which generated these errors? Were any block lists used? Unfortunately I still don't know how to reproduce this crash. The crashes mentioned above were from a freshly installed F34 system, but the system is now running for a few days and no more crashes. I have upgraded to FEDORA-2021-03b9f525f0 (for F34) just now and will report new crashes of course. The (somewhat edited) configuration for this dnsmasq instance is: user=dnsmasq group=dnsmasq interface=lo conf-dir=/etc/dnsmasq.d/,*.conf --- and in /etc/dnsmasq.d/foo.conf this: listen-address=127.0.0.1 bind-dynamic read-ethers log-queries log-facility=/var/log/dnsmasq.log server=1.1.1.1 cache-size=1500 address=/foobar.lan/:: server=/somename.com/10.0.1.2 server=/somename.corp/10.0.1.2 server=/somename.de/10.0.1.2 [...] (more server=/somename.tld/10.0.1.2 entries after this)
FEDORA-2021-5945df5d64 has been pushed to the Fedora 35 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2021-03b9f525f0 has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report.
Unfortunately this still happens with dnsmasq-2.86-2.fc35.x86_64 -- our CI just saw it [1] on the most recent Fedora 35 image. Full journal is at [2], stack trace is identical to the ones above: #0 0x000055c972bbd2be lookup_domain (dnsmasq + 0x532be) #1 0x000055c972b8da2a forward_query.lto_priv.0 (dnsmasq + 0x23a2a) #2 0x000055c972b925c0 check_dns_listeners (dnsmasq + 0x285c0) #3 0x000055c972b76d71 main (dnsmasq + 0xcd71) #4 0x00007fb972daa560 __libc_start_call_main (libc.so.6 + 0x2d560) #5 0x00007fb972daa60c __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2d60c) #6 0x000055c972b77475 _start (dnsmasq + 0xd475) [1] https://logs.cockpit-project.org/logs/pull-16517-20211026-120740-ff058897-fedora-35/log.html#71 [2] https://logs.cockpit-project.org/logs/pull-16517-20211026-120740-ff058897-fedora-35/TestTeam-testBasic-fedora-35-127.0.0.2-2201-FAIL.log.gz
In fact, our OS regression tracker in https://github.com/cockpit-project/bots/issues/2435 continues to see this all the time, so this was not just an once-off.
Okay, this is known issue and tracked under bug #2009975 now. If would help if you had some coredumps to revalidate, especially if they are from automated CI. It seems upstream is not responding now, I will prepare version with some candidate patches. *** This bug has been marked as a duplicate of bug 2009975 ***