Description of problem: Trying to add the hypervisor to the RHEV-M and the installation fails "Command returned failure code 1 during SSH connection" On further investigation we believe its the vdsmd service is unable to start in the hypervisor and failing with dependency error. Version-Release number of selected component (if applicable): RHEV-M 3.6.3.4-0.1.el6 RHEV-H 7.2-20160302.el7ev vdsm-4.17.23-0.el7ev.noarch How reproducible: It's very consistent. Steps to Reproduce: 1.Run with latest RHEV-M and RHEV-H 2.Add you RHEV-H to RHEV-M 3.Installation of RHEV-H in RHEV-M will fail. Actual results: [root@ictm1612s01h2 ~]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: inactive (dead) Apr 01 19:31:22 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Stopped Virtual Desktop Server Manager. Apr 01 19:31:49 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Apr 01 19:31:49 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Apr 01 19:52:16 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Apr 01 19:52:16 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Apr 01 20:00:43 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Apr 01 20:00:43 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Apr 04 16:27:22 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Stopped Virtual Desktop Server Manager. Apr 04 16:27:58 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Apr 04 16:27:58 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. [root@ictm1612s01h2 ~]# systemctl start vdsmd A dependency job for vdsmd.service failed. See 'journalctl -xe' for details. [root@ictm1612s01h2 ~]# journalctl -xe Apr 04 16:41:26 ictm1612s01h2.ict.englab.netapp.com python[16031]: DIGEST-MD5 parse_server_challenge() Apr 04 16:41:26 ictm1612s01h2.ict.englab.netapp.com python[16031]: DIGEST-MD5 ask_user_info() Apr 04 16:41:26 ictm1612s01h2.ict.englab.netapp.com python[16031]: DIGEST-MD5 client step 2 Apr 04 16:41:26 ictm1612s01h2.ict.englab.netapp.com python[16031]: DIGEST-MD5 ask_user_info() Apr 04 16:41:26 ictm1612s01h2.ict.englab.netapp.com python[16031]: DIGEST-MD5 make_client_response() Apr 04 16:41:26 ictm1612s01h2.ict.englab.netapp.com python[16031]: DIGEST-MD5 client step 3 Apr 04 16:41:26 ictm1612s01h2.ict.englab.netapp.com python[16031]: DIGEST-MD5 client mech dispose Apr 04 16:41:26 ictm1612s01h2.ict.englab.netapp.com python[16031]: DIGEST-MD5 common mech dispose Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: Traceback (most recent call last): Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: File "/usr/share/vdsm/vdsm-restore-net-config", line 429, in <module> Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: restore(args) Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: File "/usr/share/vdsm/vdsm-restore-net-config", line 392, in restore Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: unified_restoration() Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: File "/usr/share/vdsm/vdsm-restore-net-config", line 114, in unified_restoration Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: _remove_networks_in_running_config() Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: File "/usr/share/vdsm/vdsm-restore-net-config", line 199, in _remove_networks_in_running_config Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: _inRollback=True) Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: File "/usr/share/vdsm/network/api.py", line 932, in setupNetworks Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: "system" % network) Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: network.errors.ConfigNetworkError: (27, "Cannot delete network 'ovirtmgmt': It doesn't exist in the system") Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: Traceback (most recent call last): Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: File "/usr/bin/vdsm-tool", line 219, in main Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: return tool_command[cmd]["command"](*args) Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: File "/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 41, in restore_command Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: exec_restore(cmd) Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: File "/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 54, in exec_restore Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: raise EnvironmentError('Failed to restore the persisted networks') Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com vdsm-tool[16024]: EnvironmentError: Failed to restore the persisted networks Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com systemd[1]: vdsm-network.service: main process exited, code=exited, status=1/FAILURE Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Failed to start Virtual Desktop Server Manager network restoration. -- Subject: Unit vdsm-network.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit vdsm-network.service has failed. -- -- The result is failed. Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. -- Subject: Unit vdsmd.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit vdsmd.service has failed. -- -- The result is dependency. Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes. -- Subject: Unit mom-vdsm.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit mom-vdsm.service has failed. -- -- The result is dependency. Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'. Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com systemd[1]: Unit vdsm-network.service entered failed state. Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com systemd[1]: vdsm-network.service failed. Apr 04 16:41:27 ictm1612s01h2.ict.englab.netapp.com polkitd[18527]: Unregistered Authentication Agent for unix-process:16010:25191529 (system bus name :1.6163, object path /org/freedesktop/PolicyKit1/Authent lines 4053-4110/4110 (END) Expected results: RHEV-H when attached to RHEV-M should start and be running. Additional info: We belive the below defect from redhat to ovirt looks same as above. http://permalink.gmane.org/gmane.comp.emulators.ovirt.user/32750 http://lists.ovirt.org/pipermail/users/2015-January.txt https://www.mail-archive.com/users@ovirt.org/msg31519.html
Would you please attach /var/log/vdsm/supervdsm.log ?
Created attachment 1146940 [details] supervdsm log attached
Investigation of supervdsm log showed at least two issues. - The first, is related to IPv6 being used and reported at boot. Persistent and kernel config is compared and wrongly seen as different, due to the IPv6 address in kernel. The difference causes these networks to be removed and re-added. This issue has been resolved in vdsm-4.17.25, by ignoring IPv6 addresses in the comparison. As an workaround, IPv6 can be disabled on the host if that is possible. This issue is not severe, as it is limited to the initial host boot and causes juts a reconfiguration of existing networks. - The second issue is caused by a failure to restore or reconfigure a network after the host boot. In our case, the attempt to reconfigure ovirtmgmt network fails after ifup times out (no link?). This failure causes the running config to contain non existence networks and retries to restore the networks will always fail as it is unable to remove them. A fix for this issue has been submitted for review.
Is there a tentative date on when the fix will be released.
Its completely blocking our testing and release for support with our products.If you can provide us a tentative date on the fix coming with ovirt4.0, we can move on with other things and come back to it at the right time.
Prasanth, Next beta build is going to have a fix for the secondary issue, but I wonder how does this block your further testing? Does it reproduce often on other hosts? Can you not disable ipv6 on your host? You may also clear the content of /var/lib/vdsm/persistence and /etc/libvirt/qemu/networks/vdsm-* and reboot to start afresh.
(In reply to Prasanth from comment #6) > Its completely blocking our testing and release for support with our > products.If you can provide us a tentative date on the fix coming with > ovirt4.0, we can move on with other things and come back to it at the right > time. Can you try to work with the nightly builds? It should be fixed there.
(In reply to Dan Kenigsberg from comment #7) > Prasanth, Next beta build is going to have a fix for the secondary issue, > but I wonder how does this block your further testing? Does it reproduce > often on other hosts? Can you not disable ipv6 on your host? > > You may also clear the content of /var/lib/vdsm/persistence and > /etc/libvirt/qemu/networks/vdsm-* and reboot to start afresh. Dan, Its very persistent in all three hypervisors ( 7.2, 7.1 and 7.0 ) im running with. Also IPV6 is not getting disabled by using the GUI option and had problems with sysctl as well > bust once set to static it going fine so far.
Three independent reproductions sounds very odd (and worrying). What happens when you remove /var/lib/vdsm/persistence/* /etc/libvirt/qemu/networks/vdsm-* and reboot? You can also add ipv6.disable=1 to your kernel command line.
(In reply to Dan Kenigsberg from comment #10) > Three independent reproductions sounds very odd (and worrying). What happens > when you remove /var/lib/vdsm/persistence/* > /etc/libvirt/qemu/networks/vdsm-* and reboot? > > You can also add ipv6.disable=1 to your kernel command line. Dan, The Hypervisors are up and optimal after removing the files you have mentioned and by disabling IPv6. As i said before it was very consistent before. Im also building a completely new config in a different lab. Let me see if i see this issue again without using the workaround provided. Will keep you posted.
Verified on - 4.0.0.2-0.1.el7ev with vdsm-4.18.1-11.gita92976e.el7ev.x86_64 rhevh7-ng-4.0-0.20160607.0+1
I ran an update to rhev-m 3.6.6.2-0.1.el6 and installed a new rhev-h 7.2 20160516.0.el7ev Disabled ipv6 at kernel command line on boot and the hypervisor still fails to add. Retried after removing /var/lib/vdsm/persistence/* /etc/libvirt/qemu/networks/vdsm-* and rebooted with ipv6 disabled that didnt work either.
Created attachment 1167972 [details] Supervdsm.log_new
This now looks like a different problem, please open a new bug if the below is not helpful. The setup of the ovirtmgmt network fails because there was no response from the dhcp server for 120sec, then VDSM reverts the changes and returns the original configuration/ifcfg file. The initial em1 iface had no dhcp set on it (I would expect BOOTPROTO=dhcp), so I am not sure what was there before. DEVICE="em1" HWADDR="b8:ca:3a:68:3b:a0" IPV6INIT="no" IPV6_AUTOCONF="no" NM_CONTROLLED="no" ONBOOT="yes" PEERNTP="yes" Perhaps you should try a static IP and check if it works for you and then return back and debug dhcp (the the static one worked).
(In reply to Edward Haas from comment #15) > This now looks like a different problem, please open a new bug if the below > is not helpful. > > The setup of the ovirtmgmt network fails because there was no response from > the dhcp server for 120sec, then VDSM reverts the changes and returns the > original configuration/ifcfg file. > > The initial em1 iface had no dhcp set on it (I would expect BOOTPROTO=dhcp), > so I am not sure what was there before. > DEVICE="em1" > HWADDR="b8:ca:3a:68:3b:a0" > IPV6INIT="no" > IPV6_AUTOCONF="no" > NM_CONTROLLED="no" > ONBOOT="yes" > PEERNTP="yes" > > Perhaps you should try a static IP and check if it works for you and then > return back and debug dhcp (the the static one worked). It seems to work with static configuration. We have plenty of other set up working without any issues on DHCP. Not sure what is going on here. Will try debugging and keep you posted.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1671.html