Bug 1275626
| Summary: | dnsmasq crash with coredump on infiniband network with OpenStack | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Moshe Levi <moshele> | ||||
| Component: | dnsmasq | Assignee: | Pavel Šimerda (pavlix) <psimerda> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Vaclav Danek <vdanek> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 7.1 | CC: | ichute, ihrachys, jlibosva, jscotka, moshele, noama, thozza, vdanek | ||||
| Target Milestone: | beta | Keywords: | OtherQA, Patch | ||||
| Target Release: | 7.3 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | dnsmasq-2.66-17.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-11-04 06:14:30 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1171868, 1255429, 1289025, 1289204, 1295829, 1313485, 1364088 | ||||||
| Attachments: |
|
||||||
|
Description
Moshe Levi
2015-10-27 11:31:04 UTC
I compiled the latest dnsmasq
commit 98079ea89851da1df4966dfdfa1852a98da02912
Author: Simon Kelley <simon.uk>
Date: Tue Oct 13 20:30:32 2015 +0100
Catch errors from sendmsg in DHCP code.
Logs, eg, iptables DROPS of dest 255.255.255.255
and we don't experience cordump now. is it possible to build newer version of dnsmasq for el/centos 7 or at least in the RDO Delorean repository for Openstack?
I suspect we need http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=53c4c5c85942d4733f4723531c4d325235448326 that is included in 2.67 upstream version. yes this patch should solve the issue I also confirm it with the dnsmasq community -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 This bug was fixed in the 2.67 release. The fix is here: http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=53c4c5c85942d4 733f4723531c4d325235448326 the patch should apply fine to the version you're using, if that suits you best. Cheers, Simon. I won't describe exact steps to reproduce it, but I believe dnsmasq maintainers should be able to deduce them from the code.
So, the bug was introduced by 'Support IPv6 assignment based on MAC for DHCPv6' patch that we backported before for openstack neutron needs (dnsmasq 2.66-13).
Looking at the code, the following should occur to trigger the trace:
- DHCP should be enabled for the service:
if (daemon->dhcp || daemon->doing_dhcp6)
{
...
lease_update_from_configs();
}
- there should an existing lease with client ID:
void lease_update_from_configs(void)
{
for (lease = leases; lease; lease = lease->next)
{
if ((config = find_config(daemon->dhcp_conf, NULL, lease->clid, lease->clid_len, lease->hwaddr, lease->hwaddr_len, lease->hwaddr_type, NULL)) ...
}
}
- no existing configuration should match for the client ID;
- there should be at least one lease configurations entry that matches based on client ID, that does NOT match an existing lease.
I also see we have config files attached the the bug. Has anyone actually tried to HUP the service when using the files?.. I suspect it could just reveal the issue. After an analysis of the code I can say that both the problem and the fix are obvious. There is a single call to `find_config()` with context explicitly set to `NULL` in the code. And that `find_config()` leads to code that doesn't work for a `NULL` context. (In reply to Ihar Hrachyshka from comment #13) > I also see we have config files attached the the bug. Has anyone actually > tried to HUP the service when using the files?.. I suspect it could just > reveal the issue. I second that suspicion. Entering the code path seems to be feasible. Thanks for your help! The following script is enough to show the issue. It manipulates the clid variable using gdb to avoid the need to test on infiniband.
#!/bin/bash -xe
interface=ens9
# prepare
mkdir -p tmp
cat > tmp/leases << EOF
2000000000 02:00:00:00:00:00 192.0.2.2 host *
EOF
cat > tmp/script << EOF
start
advance dhcp-common.c:308
set clid = "*"
continue
quit
EOF
# run
gdb -x tmp/script \
--args dnsmasq \
--no-daemon --no-hosts --no-resolv --conf-file= \
--dhcp-leasefile=tmp/leases \
--dhcp-range=192.0.2.0,static \
--dhcp-host=fa:16:3e:3c:ac:55,id:ff:00:00:00:00:00:02:00:00:02:c9:00:fa:16:3e:00:00:3c:ac:55,host,192.0.2.2
# cleanup
rm tmp/leases
rm tmp/script
rmdir tmp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2421.html |