Bug 2186481
Summary: | Crash when --server=/example/# is used [rhel8] | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Tomas Kopecek <tkopecek> | |
Component: | dnsmasq | Assignee: | Petr Menšík <pemensik> | |
Status: | CLOSED ERRATA | QA Contact: | Petr Sklenar <psklenar> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 8.7 | CC: | jorton, psklenar, sbroz, sgraf | |
Target Milestone: | rc | Keywords: | AutoVerified, Regression, TestCaseProvided, Triaged, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | dnsmasq-2.79-30.el8 | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2188712 2192898 2192899 2192900 (view as bug list) | Environment: | ||
Last Closed: | 2023-11-14 15:36:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1919894 | |||
Bug Blocks: | 2188712, 2192898, 2192899, 2192900 |
Comment 1
Tomas Kopecek
2023-04-13 12:59:44 UTC
(gdb) bt #0 allocate_rfd (fdlp=0xaaaaaab31e00, serv=0xaaaaaab20010) at forward.c:2097 #1 0x0000aaaaaaabd528 in forward_query (udpfd=4, udpaddr=udpaddr@entry=0xfffffffff078, dst_addr=dst_addr@entry=0xfffffffff058, dst_iface=dst_iface@entry=1, header=header@entry=0xaaaaaab2b1a0, plen=plen@entry=54, now=now@entry=1681389382, forward=0xaaaaaab31db0, forward@entry=0x0, ad_reqd=ad_reqd@entry=1, do_bit=do_bit@entry=0) at forward.c:500 #2 0x0000aaaaaaabe160 in receive_query (listen=0xaaaaaab24150, now=1681389382) at forward.c:1490 #3 0x0000aaaaaaac24d4 in check_dns_listeners (now=1681389382) at dnsmasq.c:1619 #4 0x0000aaaaaaaa9620 in main (argc=<optimized out>, argv=<optimized out>) at dnsmasq.c:1070 (gdb) frame 1 #1 0x0000aaaaaaabd528 in forward_query (udpfd=4, udpaddr=udpaddr@entry=0xfffffffff078, dst_addr=dst_addr@entry=0xfffffffff058, dst_iface=dst_iface@entry=1, header=header@entry=0xaaaaaab2b1a0, plen=plen@entry=54, now=now@entry=1681389382, forward=0xaaaaaab31db0, forward@entry=0x0, ad_reqd=ad_reqd@entry=1, do_bit=do_bit@entry=0) at forward.c:500 500 if (type == (start->flags & SERV_TYPE) && (gdb) info local fd = <optimized out> firstsentto = 0xaaaaaab20010 oph = <optimized out> pheader = 0xaaaaaab2b1c2 "\020" subnet = 0 forwarded = 0 edns0_len = 23 domain = 0x0 type = 0 norebind = 0 addrp = 0x0 flags = 0 fwd_flags = <optimized out> start = 0xaaaaaab20010 sd = 0xaaaaaab21d00 hash = <optimized out> do_dnssec = <optimized out> gotname = 128 oph = <optimized out> (gdb) frame 0 #0 allocate_rfd (fdlp=0xaaaaaab31e00, serv=0xaaaaaab20010) at forward.c:2097 2097 return serv->sfd->fd; (gdb) info local i = <optimized out> rfl = <optimized out> finger = 0 rfl_poll = <optimized out> (gdb) p serv->sfd $10 = (struct serverfd *) 0x1000100000000 (gdb) p *serv->sfd Cannot access memory at address 0x1000100000000 (gdb) p *serv $11 = {addr = {sa = {sa_family = 6, sa_data = "\a\000\006\000\006\000\003\000\a\000\005\000\a"}, in = {sin_family = 6, sin_port = 7, sin_addr = {s_addr = 393222}, sin_zero = "\003\000\a\000\005\000\a"}, in6 = {sin6_family = 6, sin6_port = 7, sin6_flowinfo = 393222, sin6_addr = {__in6_u = {__u6_addr8 = "\003\000\a\000\005\000\a\000\005\000\005\000\005\000\a", __u6_addr16 = {3, 7, 5, 7, 5, 5, 5, 7}, __u6_addr32 = {458755, 458757, 327685, 458757}}}, sin6_scope_id = 458759}}, source_addr = {sa = {sa_family = 1, sa_data = "\000\000\003\000\002\000\001\000\000\000\000\000\000"}, in = {sin_family = 1, sin_port = 0, sin_addr = {s_addr = 131075}, sin_zero = "\001\000\000\000\000\000\000"}, in6 = {sin6_family = 1, sin6_port = 0, sin6_flowinfo = 131075, sin6_addr = {__in6_u = { __u6_addr8 = "\001", '\000' <repeats 14 times>, __u6_addr16 = {1, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {1, 0, 0, 0}}}, sin6_scope_id = 0}}, interface = "\000\000\001\000\000\000\a\000\001\000\004\000\000\000\001\000\002", ifindex = 65536, sfd = 0x1000100000000, domain = 0x1 <error: Cannot access memory at address 0x1>, flags = 0, tcpfd = 0, edns_pktsz = 65538, pktsz_reduced = 562954248454144, queries = 1, failed_queries = 0, uid = 2863813776, serv_domain = 0xaaaaaab32170, next = 0xaaaaaab32f90} There were two forwarded queries at the time. First seem okay, another one is obviously corrupted. It is not yet obvious how that happened. (gdb) set $f = $daemon->frec_list (gdb) p *$f $29 = {frec_src = {source = {sa = {sa_family = 0, sa_data = '\000' <repeats 13 times>}, in = {sin_family = 0, sin_port = 0, sin_addr = { s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = { __in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}}, dest = {addr = {addr4 = {s_addr = 0}, addr6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = { 0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, log = {keytag = 0, algo = 0, digest = 0}, dnssec = {class = 0, type = 0}}}, iface = 0, log_id = 0, fd = 0, orig_id = 0, next = 0x0}, sentto = 0x0, rfds = 0x0, new_id = 0, forwardall = 0, flags = 0, time = 1681389374, hash = {0x0 <repeats 32 times>}, class = 0, work_counter = 0, stash = 0x0, stash_len = 0, dependent = 0x0, blocking_query = 0x0, next = 0xaaaaaab31db0} (gdb) p *$f->next $30 = {frec_src = {source = {sa = {sa_family = 2, sa_data = "\324\376\177\000\000\001\000\000\000\000\000\000\000"}, in = {sin_family = 2, sin_port = 65236, sin_addr = {s_addr = 16777343}, sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, sin6_port = 65236, sin6_flowinfo = 16777343, sin6_addr = {__in6_u = { __u6_addr8 = "\000\000\000\000\000\000\000\000h\362\377\377\377\377\000", __u6_addr16 = {0, 0, 0, 0, 62056, 65535, 65535, 0}, __u6_addr32 = {0, 0, 4294963816, 65535}}}, sin6_scope_id = 1681389374}}, dest = {addr = {addr4 = {s_addr = 16777343}, addr6 = { __in6_u = {__u6_addr8 = "\177\000\000\001\252\252\000\000@\034\261\252\252\252\000", __u6_addr16 = {127, 256, 43690, 0, 7232, 43697, 43690, 0}, __u6_addr32 = {16777343, 43690, 2863733824, 43690}}}, log = {keytag = 127, algo = 256, digest = 43690}, dnssec = { class = 127, type = 256}}}, iface = 1, log_id = 31, fd = 4, orig_id = 51516, next = 0x0}, sentto = 0x0, rfds = 0x0, new_id = 38919, forwardall = 0, flags = 1568, time = 1681389382, hash = {0x62881b9fffdaaf63 <error: Cannot access memory at address 0x62881b9fffdaaf63>, 0x490cbb3edf6d061c <error: Cannot access memory at address 0x490cbb3edf6d061c>, 0x321ced3370ce1fc8 <error: Cannot access memory at address 0x321ced3370ce1fc8>, 0x678576a3dfa214ec <error: Cannot access memory at address 0x678576a3dfa214ec>, 0x0 <repeats 28 times>}, class = 0, work_counter = 50, stash = 0x0, stash_len = 0, dependent = 0x0, blocking_query = 0x0, next = 0x0} After analysing provided core dump, I think this is caused by a change for the bug #1919894. It added a new structure, which seems not behaving well in this specific configuration. It seems the required change happens when existing record TTL is reduced to 0 and new one is fetched instead. It happens faster on names with short TTL. Minimal reproducer: server=/example.com/# max-cache-ttl=1 address=/#/ Then retry query a few after delay: dig @localhost example.com && sleep 1 && dig @localhost example.com && sleep 1 && dig @localhost example.com Created candidate fix at MR https://gitlab.com/redhat/centos-stream/rpms/dnsmasq/-/merge_requests/12 The fix improves freeing at cleanup_servers function. It ensures even servers not used as last_server are unassigned in case server_domain is released. At the same time it adds check to handle also case where a domain is using common resolvers. That is marked by SERV_USE_RESOLV flag in struct server. server_domain_check() is done also in this case, ensuring the domain used in this case is not released. Additional check helps to not create server_domain duplicates. Backport to RHEL 8.6 requested and later included. The regression is present since bug #1919894 change. There is nothing strictly internal in public comments, opening this bug to public to be viewable from bug #2182342. Additional change is needed for passing test Regression/domain-query-the-last-known-server with DBus part. Additional change makes sure server_domain records are reused more often, prevents creation of duplicate entries for the same domain. Ensures also DBus added entries do not have to wait until check_servers to have serv_domain pointer set properly. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: dnsmasq security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:7046 |