Bug 1301659
| Summary: | Bulk introspection fails because of collisions produced by dnsmasq | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Filip Hubík <fhubik> | ||||||
| Component: | openstack-puppet-modules | Assignee: | Dmitry Tantsur <dtantsur> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Filip Hubík <fhubik> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 7.0 (Kilo) | CC: | dtantsur, ebarrera, fhubik, jguiditt, kbasil, mburns, mcornea, mgould, michele, mkovacik, rbartal, rhel-osp-director-maint, sasha, srevivo, tkammer, vcojot | ||||||
| Target Milestone: | ga | Keywords: | Automation, AutomationBlocker, TestOnly, ZStream | ||||||
| Target Release: | 8.0 (Liberty) | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | openstack-puppet-modules-7.0.14-1.el7ost | Doc Type: | Bug Fix | ||||||
| Doc Text: |
The dnsmasq server assigns an IP address to a client based on the hash of the client's MAC address. However, this behavior leads to numerous collisions which are not always properly processed by the client's PXE or iPXE firmware, causing introspection to fail. With this update, ironic-inspector puppet modules now set the "dhcp-sequential-ip" option in the dnsmasq configuration. As a result, IP addresses are now assigned sequentially, which eliminates potential conflicts.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-04-28 13:51:32 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Filip Hubík
2016-01-25 15:46:22 UTC
*** Bug 1301663 has been marked as a duplicate of this bug. *** Are there any chances you could retry this with OPSd8? *** Bug 1312020 has been marked as a duplicate of this bug. *** The same issue shows up with OSPd8, in a virtual environment with 8 VMs, default undercloud conf one of the VMs times out to boot. I see the following in openstack-ironic-inspector-dnsmasq.service journal: DHCPNAK(br-ctlplane) 192.0.2.108 00:c7:f4:cf:58:7a address in use DHCPNAK(br-ctlplane) 192.0.2.108 00:c7:f4:cf:58:7a address in use DHCPNAK(br-ctlplane) 192.0.2.108 00:c7:f4:cf:58:7a address in use DHCPNAK(br-ctlplane) 192.0.2.108 00:c7:f4:cf:58:7a address in use which corresponds with the mac of the VM which is failed to boot. FWIW I have seen the same happen in a recent RDO Mitaka. Now I have switched to doing introspection node by node, thereby avoiding the bulk problem. If there are specific logs/tests that we want do have done, I can try to reproduce on mitaka again, just let me know what you need. *** Bug 1306417 has been marked as a duplicate of this bug. *** Dnsmasq changed their hashing algorithm in version 2.53 to ameliorate the problem described in that link; AFAICT the new algorithm (https://github.com/guns/dnsmasq/blob/nerv/src/dhcp.c#L649) shouldn't produce the same output for two MACs which differ only in the final byte. So I think the problem is more to do with the tiny range of available IPs - though dnsmasq's stateless hash-of-MAC design does mean it will happen deterministically. I'll try using --dhcp-sequential-ip as suggested by Milan in one ML thread. could someone please check if putting dhcp-sequential-ip to their /etc/ironic-inspector/dnsmasq.conf fixes the issue? Needinfo is still in effect, see comment 11 Created attachment 1132900 [details]
Reproducer python script
This is indeed a dnsmasq hash-collision issue.
I'm attaching a reproducer script.
It works by generating fake random DHCP discoveries in a burst, exercising the dnsmasq server.
It requires scapy; pip install should resolve this dependency.
Plese, adjust as needed (edit conf.iface and and address_count variables); default values work OK on devstack.
I've observed that some 15% DHCP offers collide for me with the default settings.
As far as working around the dnsmasq hash-collision, running dnsmasq with the --dhcp-sequential-ip command line option solves it.
Note please that dnsmasq doesn't rotate the address pool and will reject discoveries once the pool was exhausted.
This workaround doesn't involve inspector.
Resetting needinfo flag based on Comment #13 The attachment #1132900 [details] was tested with scapy-2.2.0-5.fc22.noarch See also Comment #13 Created attachment 1132908 [details] Reproducer python script Update to Comment #13 dnsmasq --dhcp-sequential-ip solves the issue dnsmasq rotates the address pool with expiring leases Attached new version of reproducer. To reproduce the issue: * use default settings of dnsmasq and sudo execute the script * the script will exit once it detects a lease collision (usually after first iteration) To prove the workaround solves the issue: * update the dhcp pool lease time to 2 minutes /etc/ironic-inspector/dnsmasq.conf: dhcp-range=172.24.42.100,172.24.42.253,2m * dnsmasq --dhcp-sequential-ip --config-file=/etc/ironic-inspector/dnsmasq.conf * run the script through sudo * the script will try detect a collision 10 times * the script will exit after ~22min with no collision detected Thanks for the confirmation! The patches merged upstream, so we'll get the fix soon. Hello, reproduced and confirmed https://bugzilla.redhat.com/show_bug.cgi?id=1301659#c10 working on OSP7 baremetal deployment with MAC's emulated using QEMU (method B I mentioned in first post). Output of $ journalctl -f -u openstack-ironic-discoverd-dnsmasq after workaround: dnsmasq-dhcp[61273]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPOFFER(br-ctlplane) 10.200.200.100 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPOFFER(br-ctlplane) 10.200.200.101 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPOFFER(br-ctlplane) 10.200.200.100 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPOFFER(br-ctlplane) 10.200.200.101 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPREQUEST(br-ctlplane) 10.200.200.100 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPACK(br-ctlplane) 10.200.200.100 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPREQUEST(br-ctlplane) 10.200.200.101 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPACK(br-ctlplane) 10.200.200.101 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPOFFER(br-ctlplane) 10.200.200.100 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPOFFER(br-ctlplane) 10.200.200.101 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPOFFER(br-ctlplane) 10.200.200.100 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPOFFER(br-ctlplane) 10.200.200.101 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPREQUEST(br-ctlplane) 10.200.200.100 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPACK(br-ctlplane) 10.200.200.100 34:17:eb:e6:45:32 dnsmasq-dhcp[61273]: DHCPREQUEST(br-ctlplane) 10.200.200.101 34:17:eb:e6:45:ef dnsmasq-dhcp[61273]: DHCPACK(br-ctlplane) 10.200.200.101 34:17:eb:e6:45:ef Same IP is not offered to MAC's similar enough anymore, NAK's are therefore gone. Without this workaround is still: dnsmasq-dhcp[61749]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPOFFER(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:32 dnsmasq-dhcp[61749]: DHCPOFFER(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 dnsmasq-dhcp[61749]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPOFFER(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:32 dnsmasq-dhcp[61749]: DHCPOFFER(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 dnsmasq-dhcp[61749]: DHCPREQUEST(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPACK(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPREQUEST(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 dnsmasq-dhcp[61749]: DHCPNAK(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 address in use dnsmasq-dhcp[61749]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPOFFER(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPREQUEST(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 dnsmasq-dhcp[61749]: DHCPNAK(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 address in use dnsmasq-dhcp[61749]: DHCPREQUEST(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 dnsmasq-dhcp[61749]: DHCPNAK(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 address in use dnsmasq-dhcp[61749]: DHCPREQUEST(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 dnsmasq-dhcp[61749]: DHCPNAK(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:32 address in use dnsmasq-dhcp[61749]: DHCPDISCOVER(br-ctlplane) 34:17:eb:e6:45:ef dnsmasq-dhcp[61749]: DHCPOFFER(br-ctlplane) 10.200.200.115 34:17:eb:e6:45:ef OSPD7 baremetal deployment: dnsmasq-2.66-14.el7_1.x86_64 dnsmasq-utils-2.66-14.el7_1.x86_64 instack-undercloud-2.1.2-39.el7ost.noarch python-rdomanager-oscplugin-0.0.10-28.el7ost.noarch openstack-tripleo-image-elements-0.9.6-10.el7ost.noarch openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch openstack-tripleo-puppet-elements-0.0.1-5.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-123.el7ost.noarch openstack-tripleo-common-0.0.1.dev6-6.git49b57eb.el7ost.noarch openstack-ironic-api-2015.1.2-2.el7ost.noarch openstack-ironic-common-2015.1.2-2.el7ost.noarch openstack-ironic-conductor-2015.1.2-2.el7ost.noarch openstack-ironic-discoverd-1.1.0-8.el7ost.noarch Note this is more OSPd7 issue than OSPd8, as later one has mechanisms to prevent these issues (e.g. one can add delay to introspection process). Thanks. Filip, would it be possible for you to run your reproducer with the suggested workaround (see Comment #11)? The workaround should resolve the issue no matter product version; it's dnsmasq configuration option. Thanks a lot! milan (In reply to mkovacik from comment #19) ... Hello, adding "dhcp-sequential-ip" at the end of /etc/ironic-discoverd/dnsmasq.conf fixes the problem. Filip |