Description of problem: I'm using dnsmasq to provide local DNS caching with NetworkManager so NM is responsible for starting and managing dnsmasq. This is an IPv4-only system. These are the related configuration files in the setup: # cat /etc/resolv.conf nameserver 127.0.0.1 search example.com options edns0 no-aaaa trust-ad # cat /etc/NetworkManager/conf.d/50-dns.conf [main] dns=dnsmasq # cat /etc/NetworkManager/conf.d/99-rc-manager.conf [main] rc-manager=unmanaged # cat /etc/NetworkManager/dnsmasq.d/zz-local.conf addn-hosts=/etc/hosts domain-needed bogus-priv local=/example.com/ local-ttl=10 min-cache-ttl=10 neg-ttl=10 When booting up the system or restarting NM I see new core files appearing: # file /var/lib/systemd/coredump/core.5835 /var/lib/systemd/coredump/core.5835: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/dnsmasq', platform: 'x86_64' I'm not sure if this is the same issue as bug 2182342 but filing just in case and will attach a core dump. Thanks. Version-Release number of selected component (if applicable): dnsmasq-2.85-5.el9.x86_64
I should add that, interestingly enough, the setup described above still works, I think after the crash NM restarts it and then it's running: # ps auxwww | grep dnsmasq dnsmasq 1666 0.0 0.0 10456 4096 ? S 16:15 0:00 /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces --pid-file=/run/NetworkManager/dnsmasq.pid --listen-address=127.0.0.1 --cache-size=400 --clear-on-reload --conf-file=/dev/null --proxy-dnssec --enable-dbus=org.freedesktop.NetworkManager.dnsmasq --conf-dir=/etc/NetworkManager/dnsmasq.d SELinux is Permissive, the only audit log entry is: # grep -i dns /var/log/audit/audit.log type=ANOM_ABEND msg=audit(1681218907.965:646): auid=4294967295 uid=985 gid=980 ses=4294967295 subj=system_u:system_r:dnsmasq_t:s0 pid=5835 comm="dnsmasq" exe="/usr/sbin/dnsmasq" sig=11 res=1AUID="unset" UID="dnsmasq" GID="dnsmasq" The most relevant log entries in the system log are: Apr 11 16:15:43 $HOST NetworkManager[1536]: <info> [1681218943.9239] dnsmasq: starting /usr/sbin/dnsmasq Apr 11 16:15:43 $HOST dnsmasq[1666]: started, version 2.85 cachesize 400 Apr 11 16:15:43 $HOST dnsmasq[1666]: compile time options: IPv6 GNU-getopt DBus no-UBus no-i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth cryptohash DNSSEC loop-detect inotify dumpfile Apr 11 16:15:43 $HOST dnsmasq[1666]: chown of PID file /run/NetworkManager/dnsmasq.pid failed: Operation not permitted Apr 11 16:15:43 $HOST dnsmasq[1666]: DBus support enabled: connected to system bus Apr 11 16:15:43 $HOST dnsmasq[1666]: using only locally-known addresses for domain example.com Apr 11 16:15:43 $HOST dnsmasq[1666]: read /etc/hosts - 27 addresses Apr 11 16:15:43 $HOST dnsmasq[1666]: setting upstream servers from DBus Apr 11 16:15:43 $HOST dnsmasq[1666]: using only locally-known addresses for domain example.com Apr 11 16:15:43 $HOST dnsmasq[1666]: using nameserver 192.168.1.1#53(via wlan0) Apr 11 16:15:43 $HOST dnsmasq[1666]: using nameserver 192.168.1.1#53 for domain 1.168.192.in-addr.arpa Apr 11 16:15:43 $HOST dnsmasq[1666]: read /etc/hosts - 27 addresses Thanks.
I filed a bug about the tiny chown failure at https://bugzilla.redhat.com/show_bug.cgi?id=2185880.
FWIW, I tried to reproduce this on a fresh all-defaults RHEL 9 VM installation+updates but there dnsmasq didn't crash. I can't say what's different on the system where it crashed so hopefully the coredump will contain some clues. Thanks.
Interesting issue, thanks for the core dump. Core was generated by `/usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces'. Program terminated with signal SIGSEGV, Segmentation fault. #0 dbus_watch_get_enabled (watch=0x555f992406ac) at ../../dbus/dbus-watch.c:707 707 return watch->enabled; (gdb) bt #0 dbus_watch_get_enabled (watch=0x555f992406ac) at ../../dbus/dbus-watch.c:707 #1 0x0000555aca9c8c45 in check_dbus_listeners () at /usr/src/debug/dnsmasq-2.85-5.el9.x86_64/src/dbus.c:804 #2 main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/dnsmasq-2.85-5.el9.x86_64/src/dnsmasq.c:1202 (gdb) p watch $1 = (DBusWatch *) 0x555f992406ac (gdb) p *watch Cannot access memory at address 0x555f992406ac (gdb) p dnsmasq_daemon->watches $8 = (struct watch *) 0x0 It is on unusual place. It seems the pointer got invalidated, but during the for cycle invalid data were used after that. Most likely remove_watch() were called in reaction to some dbus event, which made the pointer invalid. Even upstream looks the same. I guess similar issue is possible also on Fedora. Are you able to trigger this in a reliable way? How often does it happen?
Thanks for looking into this. > Are you able to trigger this in a reliable way? How often does it happen? This happens always when NM is started on two RHEL 9 systems I have. Thanks.
Found out simple reproducer, restarting NetworkManager does not do anything to me. But restarting dbus.service crashes dnsmasq on very similar backtrace. Breakpoint 1, remove_watch (watch=0x555c8ecc4a60, data=0x0) at /usr/src/debug/dnsmasq-2.85-6.el9.x86_64/src/dbus.c:122 122 { (gdb) bt #0 remove_watch (watch=0x555c8ecc4a60, data=0x0) at /usr/src/debug/dnsmasq-2.85-6.el9.x86_64/src/dbus.c:122 #1 0x00007f4261df2247 in _dbus_watch_list_remove_watch () from target:/lib64/libdbus-1.so.3 #2 0x00007f4261df246d in free_watches () from target:/lib64/libdbus-1.so.3 #3 0x00007f4261df24ed in socket_disconnect.lto_priv () from target:/lib64/libdbus-1.so.3 #4 0x00007f4261dee45b in do_io_error () from target:/lib64/libdbus-1.so.3 #5 0x00007f4261df85f8 in do_reading.part () from target:/lib64/libdbus-1.so.3 #6 0x00007f4261df89ff in socket_handle_watch.lto_priv () from target:/lib64/libdbus-1.so.3 #7 0x00007f4261dddf90 in _dbus_connection_handle_watch () from target:/lib64/libdbus-1.so.3 #8 0x0000555c8df5dc9e in check_dbus_listeners () at /usr/src/debug/dnsmasq-2.85-6.el9.x86_64/src/dbus.c:819 #9 main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/dnsmasq-2.85-6.el9.x86_64/src/dnsmasq.c:1202 It is obvious dbus_watch_handle happens to cause remove_watch call. Because there are two watches of dbus, the removal of the first one causes following pointer invalidated. But the loop does not catch any changes and tries to continue. It makes the next loop iteration to access invalid pointer.
Created attachment 1957345 [details] Candidate-dbus.patch Candidate patch adjusted for latest dnsmasq version.
Candidate patch merged into dnsmasq. Request: https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2023q2/017007.html
Hi! would you know as a dbus maintainer, are there any cases where dbus watches might change during runtime? Is there any case, where watch can be removed but the dbus service will continue to provide dbus service? It seems to be dbus-broker does not have supported restart operation and services using dbus do not have any good behaviour for dbus service disconnection. Is there any good behaviour I should implement in dnsmasq for those cases?