2006367 – [abrt] dnsmasq: whine_malloc()/forward_query()/lookup_domain(): dnsmasq killed by SIGABRT

Bug 2006367 - [abrt] dnsmasq: whine_malloc()/forward_query()/lookup_domain(): dnsmasq killed by SIGABRT

Summary: [abrt] dnsmasq: whine_malloc()/forward_query()/lookup_domain(): dnsmasq kille...

Keywords:
Status:	CLOSED DUPLICATE of bug 2009975
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dnsmasq
Sub Component:
Version:	35
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Petr Menšík
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://retrace.fedoraproject.org/faf...
Whiteboard:	abrt_hash:8a4b1422902c3110e7b5b7c03d7...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-21 14:31 UTC by Christian Kujau
Modified:	2021-10-27 16:16 UTC (History)
CC List:	8 users (show)
Fixed In Version:	dnsmasq-2.86-2.fc35 dnsmasq-2.86-2.fc34
Clone Of:
Environment:
Last Closed:	2021-10-27 14:50:45 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
File: backtrace (19.03 KB, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: cgroup (45 bytes, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: core_backtrace (2.92 KB, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: cpuinfo (2.52 KB, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: dso_list (766 bytes, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: environ (306 bytes, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: limits (1.29 KB, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: maps (3.97 KB, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: open_fds (965 bytes, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
File: proc_pid_status (1.37 KB, text/plain) 2021-09-21 14:31 UTC, Christian Kujau	no flags	Details
View All

Description Christian Kujau 2021-09-21 14:31:33 UTC

Version-Release number of selected component:
dnsmasq-2.86-1.fc34

Additional info:
reporter:       libreport-2.15.2
backtrace_rating: 4
cmdline:        /usr/sbin/dnsmasq
crash_function: whine_malloc
executable:     /usr/sbin/dnsmasq
journald_cursor: s=be6b2551834d4cd9a0f0ac7a5060cb92;i=bfaa;b=9a694a8b781046a7894c8dfffd73c989;m=5d0042c1c;t=5cc73728c6873;x=72e80c5246b823c2
kernel:         5.13.16-200.fc34.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            994

Truncated backtrace:
Thread no. 1 (5 frames)
 #6 whine_malloc at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/util.c:316
 #7 get_new_frec at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:2478
 #8 forward_query at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:281
 #9 receive_query at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:1640
 #10 check_dns_listeners at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/dnsmasq.c:1817

Comment 1 Christian Kujau 2021-09-21 14:31:37 UTC

Created attachment 1824954 [details]
File: backtrace

Comment 2 Christian Kujau 2021-09-21 14:31:38 UTC

Created attachment 1824955 [details]
File: cgroup

Comment 3 Christian Kujau 2021-09-21 14:31:40 UTC

Created attachment 1824956 [details]
File: core_backtrace

Comment 4 Christian Kujau 2021-09-21 14:31:41 UTC

Created attachment 1824957 [details]
File: cpuinfo

Comment 5 Christian Kujau 2021-09-21 14:31:42 UTC

Created attachment 1824958 [details]
File: dso_list

Comment 6 Christian Kujau 2021-09-21 14:31:43 UTC

Created attachment 1824959 [details]
File: environ

Comment 7 Christian Kujau 2021-09-21 14:31:44 UTC

Created attachment 1824961 [details]
File: limits

Comment 8 Christian Kujau 2021-09-21 14:31:45 UTC

Created attachment 1824962 [details]
File: maps

Comment 9 Christian Kujau 2021-09-21 14:31:46 UTC

Created attachment 1824963 [details]
File: mountinfo

Comment 10 Christian Kujau 2021-09-21 14:31:47 UTC

Created attachment 1824964 [details]
File: open_fds

Comment 11 Christian Kujau 2021-09-21 14:31:48 UTC

Created attachment 1824965 [details]
File: proc_pid_status

Comment 12 Christian Kujau 2021-09-21 14:31:49 UTC

Created attachment 1824966 [details]
File: var_log_messages

Comment 14 Martin Pitt 2021-09-22 07:50:03 UTC

Cockpit's CI just found the same, in the Fedora CoreOS image refresh [1].

systemd-coredump[14614]: Process 14290 (dnsmasq) of user 985 dumped core.
                                                                      
Stack trace of thread 14290:
 #0  0x000055b86e5ce256 lookup_domain (dnsmasq + 0x53256)
 #1  0x000055b86e59ea3a forward_query.lto_priv.0 (dnsmasq + 0x23a3a)
 #2  0x000055b86e5a35d0 check_dns_listeners (dnsmasq + 0x285d0)
 #3  0x000055b86e587b00 main (dnsmasq + 0xcb00)
 #4  0x00007f1bfd9e7b75 __libc_start_main (libc.so.6 + 0x27b75)
 #5  0x000055b86e58848e _start (dnsmasq + 0xd48e)

[1] https://github.com/cockpit-project/bots/pull/2434

Comment 15 Petr Menšík 2021-09-22 16:10:43 UTC

It seems serious. Do we have any coredump file to analyse?

It seems memory allocation is corrupted. Which would be quite tricky to find. It might be related to Dominik's analysis [1] reported to upstream. Backtrace included is not sufficient, but at least know errors should be catched.

1. https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q3/015594.html

Comment 19 Christian Kujau 2021-09-23 13:13:27 UTC

Can somone remove my "mountinfo" attachment? Then I could remove the "private" flag from this bug. Sorry, should have spotted this before submitting. I have 3 core dumps that we generated in the last 3 days:

$ ls -hgo /var/lib/systemd/coredump/core.dnsmasq*
-rw-r-----. 1 144K Sep 22 22:00 /var/lib/systemd/coredump/core.dnsmasq.994.96590cd8b0ea42d29287cf0373b18048.395655.1632340828000000.zst
-rw-r-----. 1 148K Sep 22 19:19 /var/lib/systemd/coredump/core.dnsmasq.994.96590cd8b0ea42d29287cf0373b18048.366314.1632331150000000.zst
-rw-r-----. 1 142K Sep 20 22:51 /var/lib/systemd/coredump/core.dnsmasq.994.9a694a8b781046a7894c8dfffd73c989.45881.1632171098000000.zst

Comment 20 Petr Menšík 2021-09-23 15:34:59 UTC

Thanks! Quite useful. I have hidden them from public. Coredump 3 from comment #16

(gdb) bt
#0  allocate_rfd (fdlp=fdlp@entry=0x55c1a4bb9380, serv=serv@entry=0x0)
    at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:2238
#1  0x000055c1a461d563 in forward_query (udpfd=4, udpaddr=0x7fff04da1220, dst_addr=0x7fff04da11f0, dst_iface=1, 
    header=0x55c1a4bb55e0, plen=49, limit=0x55c1a4bb57e0 "", now=1632340828, forward=0x55c1a4bb9330, ad_reqd=0, do_bit=0)
    at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:451
#2  0x000055c1a46225d0 in receive_query (now=1632340828, listen=0x55c1a4bb42d0)
    at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/forward.c:1640
#3  check_dns_listeners (now=1632340828) at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/dnsmasq.c:1817
#4  0x000055c1a4606b00 in main (argc=<optimized out>, argv=<optimized out>)
    at /usr/src/debug/dnsmasq-2.86-1.fc34.x86_64/src/dnsmasq.c:1244

They are clearly linked to linked issue. I think upstream commit de372d6 [1] should fix the issue. It obviously hits extracts wrong record from serverarray, because start is after last.

(gdb) p start
$5 = 36
(gdb) p last
$6 = 35

(gdb) p /x dnsmasq_daemon->serverarray[last]->flags
$8 = 0x900
(gdb) p /x dnsmasq_daemon->serverarray[start]
$9 = 0x0

Last points to SERV_IS_LOCAL record, which should never been passed to allocate_rfd.

1. http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=de372d6914ae20a1f9997815f258efbf3b14c39b

Comment 21 Fedora Update System 2021-09-23 17:22:25 UTC

FEDORA-2021-03b9f525f0 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-03b9f525f0

Comment 22 Fedora Update System 2021-09-23 17:27:37 UTC

FEDORA-2021-5945df5d64 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-5945df5d64

Comment 23 Petr Menšík 2021-09-23 18:40:54 UTC

Christian, Would it be problem to attach also configuration of dnsmasq, which generated these errors? Were any block lists used?

Comment 24 Fedora Update System 2021-09-23 19:41:57 UTC

FEDORA-2021-03b9f525f0 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-03b9f525f0`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-03b9f525f0

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 25 Fedora Update System 2021-09-24 02:51:06 UTC

FEDORA-2021-5945df5d64 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-5945df5d64`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-5945df5d64

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 26 Christian Kujau 2021-09-28 18:41:43 UTC

(In reply to Petr Menšík from comment #23)
> Christian, Would it be problem to attach also configuration of dnsmasq,
> which generated these errors? Were any block lists used?

Unfortunately I still don't know how to reproduce this crash. The crashes mentioned above were from a freshly installed F34 system, but the system is now running for a few days and no more crashes. I have upgraded to FEDORA-2021-03b9f525f0 (for F34) just now and will report new crashes of course.

The (somewhat edited) configuration for this dnsmasq instance is:

user=dnsmasq
group=dnsmasq
interface=lo
conf-dir=/etc/dnsmasq.d/,*.conf

--- and in /etc/dnsmasq.d/foo.conf this:

listen-address=127.0.0.1
bind-dynamic
read-ethers
log-queries
log-facility=/var/log/dnsmasq.log
server=1.1.1.1
cache-size=1500
address=/foobar.lan/::
server=/somename.com/10.0.1.2
server=/somename.corp/10.0.1.2
server=/somename.de/10.0.1.2
[...]

(more server=/somename.tld/10.0.1.2 entries after this)

Comment 27 Fedora Update System 2021-10-04 00:15:15 UTC

FEDORA-2021-5945df5d64 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 28 Fedora Update System 2021-10-09 00:21:25 UTC

FEDORA-2021-03b9f525f0 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 29 Martin Pitt 2021-10-26 15:57:31 UTC

Unfortunately this still happens with dnsmasq-2.86-2.fc35.x86_64 -- our CI just saw it [1] on the most recent Fedora 35 image. Full journal is at [2], stack trace is identical to the ones above:

#0  0x000055c972bbd2be lookup_domain (dnsmasq + 0x532be)
#1  0x000055c972b8da2a forward_query.lto_priv.0 (dnsmasq + 0x23a2a)
#2  0x000055c972b925c0 check_dns_listeners (dnsmasq + 0x285c0)
#3  0x000055c972b76d71 main (dnsmasq + 0xcd71)
#4  0x00007fb972daa560 __libc_start_call_main (libc.so.6 + 0x2d560)
#5  0x00007fb972daa60c __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2d60c)
#6  0x000055c972b77475 _start (dnsmasq + 0xd475)


[1] https://logs.cockpit-project.org/logs/pull-16517-20211026-120740-ff058897-fedora-35/log.html#71
[2] https://logs.cockpit-project.org/logs/pull-16517-20211026-120740-ff058897-fedora-35/TestTeam-testBasic-fedora-35-127.0.0.2-2201-FAIL.log.gz

Comment 30 Martin Pitt 2021-10-26 15:58:20 UTC

In fact, our OS regression tracker in https://github.com/cockpit-project/bots/issues/2435 continues to see this all the time, so this was not just an once-off.

Comment 31 Petr Menšík 2021-10-27 14:50:45 UTC

Okay, this is known issue and tracked under bug #2009975 now. If would help if you had some coredumps to revalidate, especially if they are from automated CI. It seems upstream is not responding now, I will prepare version with some candidate patches.

*** This bug has been marked as a duplicate of bug 2009975 ***

Note You need to log in before you can comment on or make changes to this bug.