Hide Forgot
Description of problem: 12 hours after a RHV engine+hypervisor deployment, the provisioning NIC on the RHV engine loses its IP address. Version-Release number of selected component (if applicable): QCI-1.1-RHEL-7-20161101.t.0 How reproducible: 100% Steps to Reproduce: 1.) Deploy RHV. 2.) Wait 12 hours (the default lease time of the Satellite's dhcpd service). 3.) Verify that the RHV engine/manager can no longer be accessed by its provisioning network IP address. Accessing the system via console shows that dhclient is no longer running: # ps -eF|grep dhclient|grep -v grep # Actual results: RHV engine loses IP address 12 hours after successful deployment. Expected results: No loss of IP address. Additional info: After the deployment completes, but before the DHCP lease expires, dhclient on the RHV engine is using the NetworkManager dhcp helper script, config file, pid file, and lease file, even though NetworkManager was disabled during the deployment: # ps -eF|grep dhclient root 771 1 0 28204 15836 0 19:24 ? 00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0 An strace of this dhclient process shows that dhclient fails when trying to renew the lease because it can't communicate back to the NetworkManager service: **** 771 20:08:30 select(22, [5 6], [], NULL, {18591, 385808}) = 0 (Timeout) 771 01:18:22 sendto(3, "<30>Nov 3 01:18:22 dhclient[771]: DHCPREQUEST on eth0 to 192.168.100.1 port 67 (xid=0x6ba6d292)", 96, MSG_NOSIGNAL, NULL, 0) = 96 [...] 771 01:18:22 sendto(3, "<30>Nov 3 01:18:22 dhclient[771]: DHCPACK from 192.168.100.1 (xid=0x6ba6d292)", 78, MSG_NOSIGNAL, NULL, 0) = 78 771 01:18:22 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fac9b1d6b50) = 32090 [...] 32090 01:18:22 execve("/usr/libexec/nm-dhcp-helper", ["/usr/libexec/nm-dhcp-helper"], [...] [...] 32090 01:18:22 write(2, "Error: could not connect to NetworkManager D-Bus socket: Could not connect: Connection refused\n", 95) = 95 32090 01:18:22 write(2, "Fatal error occured, killing dhclient instance with pid 771.\n", 61) = 61 32090 01:18:22 kill(771, SIGTERM) = 0 **** Stopping and disabling NetworkManager is not enough to ensure that dhclient stops using these files. After rebooting or manually restarting eth0, dhclient no longer uses the NetworkManager files: root 14465 1 0 28204 12780 0 13:25 ? 00:00:00 /sbin/dhclient -H mac525400843f0d -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0 If the deployment reboots or restarts the interface after NetworkManager is disabled on the engine, this will prevent the IP address from disappearing 12 hours after a successful deployment. I don't see this issue on the hypervisor or on a self-hosted engine.
https://github.com/fusor/ansible-ovirt/pull/11 Restarting the network service seems to make dhclient pick up different networking scripts.
Expected in 11/21 ISO
Verified on QCI-1.1-RHEL-7-20161121.t.0. After a successful RHV (engine+hypervisor) deployment, the engine is using the default non-NetworkManager dhclient configuration: # ps -eF | grep dhclient | grep -v grep root 14355 1 0 28206 12780 0 18:28 ? 00:00:00 /sbin/dhclient -H mac52540099e2a1 -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0 #
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:0335