Bug 1391530 - RHV engine loses IP address after DHCP lease expires
Summary: RHV engine loses IP address after DHCP lease expires
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHEV
Version: 1.1
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
: 1.1
Assignee: Fabian von Feilitzsch
QA Contact: Tasos Papaioannou
Dan Macpherson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-03 13:39 UTC by Tasos Papaioannou
Modified: 2017-02-28 01:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-28 01:40:48 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:0335 0 normal SHIPPED_LIVE Red Hat Quickstart Installer 1.1 2017-02-28 06:36:13 UTC

Description Tasos Papaioannou 2016-11-03 13:39:14 UTC
Description of problem:

12 hours after a RHV engine+hypervisor deployment, the provisioning NIC on the RHV engine loses its IP address.

Version-Release number of selected component (if applicable):

QCI-1.1-RHEL-7-20161101.t.0

How reproducible:

100%

Steps to Reproduce:
1.) Deploy RHV.
2.) Wait 12 hours (the default lease time of the Satellite's dhcpd service).
3.) Verify that the RHV engine/manager can no longer be accessed by its provisioning network IP address. Accessing the system via console shows that dhclient is no longer running:

# ps -eF|grep dhclient|grep -v grep
#


Actual results:

RHV engine loses IP address 12 hours after successful deployment.

Expected results:

No loss of IP address.

Additional info:

After the deployment completes, but before the DHCP lease expires, dhclient on the RHV engine is using the NetworkManager dhcp helper script, config file, pid file, and lease file, even though NetworkManager was disabled during the deployment:

# ps -eF|grep dhclient
root       771     1  0 28204 15836   0 19:24 ?        00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0

An strace of this dhclient process shows that dhclient fails when trying to renew the lease because it can't communicate back to the NetworkManager service:

****
771   20:08:30 select(22, [5 6], [], NULL, {18591, 385808}) = 0 (Timeout)
771   01:18:22 sendto(3, "<30>Nov  3 01:18:22 dhclient[771]: DHCPREQUEST on eth0 to 192.168.100.1 port 67 (xid=0x6ba6d292)", 96, MSG_NOSIGNAL, NULL, 0) = 96
[...]
771   01:18:22 sendto(3, "<30>Nov  3 01:18:22 dhclient[771]: DHCPACK from 192.168.100.1 (xid=0x6ba6d292)", 78, MSG_NOSIGNAL, NULL, 0) = 78
771   01:18:22 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fac9b1d6b50) = 32090
[...]
32090 01:18:22 execve("/usr/libexec/nm-dhcp-helper", ["/usr/libexec/nm-dhcp-helper"], [...]
[...]
32090 01:18:22 write(2, "Error: could not connect to NetworkManager D-Bus socket: Could not connect: Connection refused\n", 95) = 95
32090 01:18:22 write(2, "Fatal error occured, killing dhclient instance with pid 771.\n", 61) = 61
32090 01:18:22 kill(771, SIGTERM)       = 0
****

Stopping and disabling NetworkManager is not enough to ensure that dhclient stops using these files. After rebooting or manually restarting eth0, dhclient no longer uses the NetworkManager files:

root     14465     1  0 28204 12780   0 13:25 ?        00:00:00 /sbin/dhclient -H mac525400843f0d -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0

If the deployment reboots or restarts the interface after NetworkManager is disabled on the engine, this will prevent the IP address from disappearing 12 hours after a successful deployment.

I don't see this issue on the hypervisor or on a self-hosted engine.

Comment 2 Fabian von Feilitzsch 2016-11-16 21:09:58 UTC
https://github.com/fusor/ansible-ovirt/pull/11

Restarting the network service seems to make dhclient pick up different networking scripts.

Comment 3 John Matthews 2016-11-22 13:39:27 UTC
Expected in 11/21 ISO

Comment 4 Tasos Papaioannou 2016-11-22 19:34:01 UTC
Verified on QCI-1.1-RHEL-7-20161121.t.0. After a successful RHV (engine+hypervisor) deployment, the engine is using the default non-NetworkManager dhclient configuration:

# ps -eF | grep dhclient | grep -v grep
root     14355     1  0 28206 12780   0 18:28 ?        00:00:00 /sbin/dhclient -H mac52540099e2a1 -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0
#

Comment 7 errata-xmlrpc 2017-02-28 01:40:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0335


Note You need to log in before you can comment on or make changes to this bug.