Bug 1969908
| Summary: | Deployment failed: dnsmasq DHCPOFFERing the same address twice | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Raviv Bar-Tal <rbartal> | ||||||
| Component: | dnsmasq | Assignee: | Petr Menšík <pemensik> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | rhel-cs-infra-services-qe <rhel-cs-infra-services-qe> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 8.4 | CC: | abhijadh, aegorenk, arivkin, derekh, lshilin, pemensik, rpittau, ydalal | ||||||
| Target Milestone: | beta | Keywords: | Triaged | ||||||
| Target Release: | --- | Flags: | rpittau:
needinfo+
pm-rhel: mirror+ |
||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2022-01-01 07:26:59 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 2028704 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Description
Raviv Bar-Tal
2021-06-09 12:57:31 UTC
Created attachment 1789563 [details]
master-0-1 screenshot
Created attachment 1789564 [details]
conductor and inspector logs
I assume it's a virtual environment? Could you please watch the node booting to see what is actually happening (e.g. if it fails to PXE boot and just falls back to the disk)? Is there anything special about this environment? Can you reproduce the same on other environments? This is a virtual environment, unfortunately it is no longer available, We use this environment to run our CI jobs, so there is nothing special about it. I will keep a note about rebooting the machine and see what happens for the next environment we have. Also can you include the httpd and dnsmasq logs the next time, they might help us figure out if the node attempted to PXE boot. httpd and dnsmask log were attached (In reply to Raviv Bar-Tal from comment #9) > httpd and dnsmask log were attached Thanks, looking at the httpd logs I can see that 2 of the nodes downloaded the PXE config as expected 172.22.0.233 - - [07/Jun/2021:12:09:08 +0000] "GET /dualboot.ipxe HTTP/1.1" 200 741 "-" "iPXE/1.0.0+" 172.22.0.104 - - [07/Jun/2021:12:09:13 +0000] "GET /dualboot.ipxe HTTP/1.1" 200 741 "-" "iPXE/1.0.0+" But one didn't, the dnsmasq logs show that it did attempt DHCP, it looks to me like dnsmasq replied to a DHCPDISCOVER from 2 masters with the same IP dnsmasq-dhcp: 3004428367 DHCPDISCOVER(ens4) 52:54:00:2e:bc:9e dnsmasq-dhcp: 3004428367 DHCPOFFER(ens4) 172.22.0.104 52:54:00:2e:bc:9e dnsmasq-dhcp: 3004428367 DHCPDISCOVER(ens4) 52:54:00:a2:5a:82 dnsmasq-dhcp: 3004428367 DHCPOFFER(ens4) 172.22.0.104 52:54:00:a2:5a:82 Then when each node requested the same IP, one got ACK'd and the other NACK'd dnsmasq-dhcp: 3004428367 DHCPREQUEST(ens4) 172.22.0.104 52:54:00:2e:bc:9e dnsmasq-dhcp: 3004428367 DHCPACK(ens4) 172.22.0.104 52:54:00:2e:bc:9e dnsmasq-dhcp: 3004428367 DHCPREQUEST(ens4) 172.22.0.104 52:54:00:a2:5a:82 dnsmasq-dhcp: 3004428367 DHCPNAK(ens4) 172.22.0.104 52:54:00:a2:5a:82 address in use My guess then is that because PXE failed it fell back to booting from HD I'm surprised dnsmasq would DHCPOFFER the same address twice, I suggest you attach the dnsmasq version, config and command line. We can then assign the bug to the dnsmasq component to assess if this is a legitimate bug. Hey, I attached the dnsmask inspect file, which have the version and CreateCommand in it. Can you please assign it to dnsmask? I don't seem to find it (In reply to Raviv Bar-Tal from comment #12) > Hey, I attached the dnsmask inspect file, which have the version and > CreateCommand in it. > Can you please assign it to dnsmask? I don't seem to find it This isn't the file I was thinking of, you'll have to exec into the dnsmasq container to find it, in the mean time I'll move this bug to dnsmasq as there might be enough here to assess it This is openshift 4.8 so the version of dnsmasq would be dnsmasq-2.79-15.el8.x86_64 Ah, of course. I meant bug #1998448, which is dealing with similar issue for IPv6. It is recommended by DHCP RFC [1] to not offer already offered address. But it is not required, so dnsmasq does not violate the RFC. Citation: While not required for correct operation of DHCP, the server SHOULD NOT reuse the selected network address before the client responds to the server's DHCPOFFER message. The server may choose to record the address as offered to the client. I think solving this issue properly would mean creation of short term reservations after sending DHCPOFFER. Such change would be complex and with possible new regressions. Not sure what configuration exactly was passed to dnsmasq, guessing just from the logs. It would require creation of "soft" lease, which would not be offered again on DISCOVER, but if no free addresses were available, could be still leased on DHCPREQUEST. No similar concept exists either for IPv6 or IPv4 now. 1. https://www.rfc-editor.org/rfc/rfc2131#section-4.3.1 This is very similar to bug #2028704 detected on OpenStack. Because there is already reproducer in that bug and more details, work would be done there. I think non-trivial change to upstream is required, have not found simpler fix so far. Needs to be discussed upstream. Not closing as duplicate (yet), because it was there sooner and is on different component. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |