Bug 1188423
Summary: | RHEL / Centos 7-based instances lose their default IPv4 gateway | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Joe <joe> | ||||
Component: | dhcp | Assignee: | Pavel Zhukov <pzhukov> | ||||
Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.0 | CC: | ihrachys, kbsingh, srevivo | ||||
Target Milestone: | pre-dev-freeze | Keywords: | FastFix | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-07-18 08:27:52 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1380362 | ||||||
Attachments: |
|
Description
Joe
2015-02-02 20:41:14 UTC
dhclient-script belongs to dhcp package in RHEL, not RDO. Moving to appropriate component. (In reply to Joe from comment #0) > The patch that I have used for existing instances can be found here: > https://gist.github.com/jtopjian/589217cee0ba8f09825c I don't want to remove the life-times completely, they have been added due to bug #1032809. > Run RHEL or CentOS 7 in an OpenStack environment, best with the default DHCP > lease of 60 seconds, and just wait. You can also run `watch ip a` and when > the connection drops, you'll see `valid_lft` between 0-2 seconds. > > Actual results: > > The instance loses its IPv4 address and thus its gateway. The DHCP lease is > renewed soon after, but the gateway is never re-added. I still don't understand what's happening there. The only idea I have is that the address is not properly renewed (doesn't get any response to unicast DHCPREQUESTs) and then during rebinding (sending broadcast DHCPREQUESTs) it's being removed prior to rebinding finish. I'd need to see either dhclient output or some packet dump to know more. But if giving the address some more life-time works-around the problem, then I'm probably fine with that. Could you add the following line somewhere after '# ### MAIN' to see if that helps ? [[ "${new_dhcp_lease_time}" -lt "4294967235" ]] && new_dhcp_lease_time=$((new_dhcp_lease_time + 60)) I've added this commit upstream (Fedora) http://pkgs.fedoraproject.org/cgit/dhcp.git/commit/?id=d12e0eb05e510268ce9b8dcb839e27d5eca9aff5 But it'd still be nice to see some dhclient output or packet dump when the problem occurs. Hi Jiri, Thank you for looking this over and for adding the additional time. I'm setting up a test instance in my OpenStack cloud, will run tcpdump, and wait for this issue to happen. It usually manifests itself within 24 hours. From reviewing my notes, the core issue is that when the timeout happens, the default gateway of the instance / vm is dropped. After the timeout has lapsed and the late DHCP renewal arrives at the instance, the instance re-adds its IP but not its default gateway. I'll post some log entries and a packet dump once I have them. Thanks, Joe Created attachment 1017114 [details]
dhcp pcap
Hi Jiri, I was able to semi-reproduce this problem. While leaving "watch ip a" running, my session was cut and the last output on the screen was a valid_lft and preferred_lft of 1 second.... I think it's safe to assume that there was a timeout. What's odd about this case is that when I logged back in, I had a default route. Normally the default route doesn't exist and have to log in through an out-of-band method to re-add it. I used the latest CentOS 7 image from here: http://cloud.centos.org/centos/7/images/ I haven't yet looked if there has been a dhclient update since I first ran into this issue. I'm currently re-running the tests to see if I will run into an occurrence where the default gateway is _not_ re-added. Attached is a pcap file of DHCP traffic. AFAICT, the bump in connectivity happened at packet 1523. Thanks, Joe I was able to reproduce this problem entirely yesterday -- lost gateway and all. It was at the end of the day when I noticed it had happened, so I just killed tcpdump and went home. When I came back in today, I noticed the gateway had returned! I have no idea when, though. I even have a screenshot of my console showing both no default route and a default route if you'd like to double-check. :) The packet dump looks the same as the last one I uploaded. I'd be happy to still upload it, though. This bug was evaluated by the sub-system and was not considered as a priority for the release, so it's being closed now as WONTFIX. Feel free to re-open the bug if there is a business reason to deliver a fix for this issue. Note the workaround for this issue is included in Red Hat Enterprise Linux 8. Please check if the issue exists there and open new bug if it's the case. |