Description of problem: Failed to Establish Libvirt Connection on RHEV-H 7.2 for 3.6.0 after several times reboot, maybe 2 times reboot, maybe 3 times, maybe 4 times... Version-Release number of selected component (if applicable): # rpm -qa ovirt-node vdsm libvirt libvirt-1.2.17-11.el7.x86_64 vdsm-4.17.8-1.el7ev.noarch ovirt-node-3.3.0-0.10.20150928gite7ee3f1.el7ev.noarch # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 7.2 (20150928.0.el7ev) Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.18.el6 How reproducible: 100% Steps to Reproduce: 1. TUI clean installed rhevh 7.2 successful 2. Login TUI 3. Setup network via dhcp 4. Add rhevh via rhevm portal successful, and host is UP. 5. Maintenance the Host 6. Login RHEV-H TUI 7. Restart RHEV-H check whether failed to establish libvirt connection. reproducible: <5% 8. Restart RHEV-H twice to check whether failed to establish libvirt connection. reproducible: 70% 9. Restart RHEV-H third times to check whether failed to establish libvirt connection. reproducible: 95% ... Actual results: Failed to Establish Libvirt Connection. Expected results: without such error Test machine: All machines HP and Dell server. Additional info: # systemctl status libvirtd vdsmd ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/libvirtd.service.d └─unlimited-core.conf Active: failed (Result: start-limit) since Fri 2015-10-09 07:42:58 UTC; 1h 34min ago Docs: man:libvirtd(8) http://libvirt.org Process: 13855 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=6) Main PID: 13855 (code=exited, status=6) Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: Unit libvirtd.service entered failed state. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: libvirtd.service failed. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: libvirtd.service holdoff time over, scheduling restart. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: start request repeated too quickly for libvirtd.service Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: Failed to start Virtualization daemon. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: Unit libvirtd.service entered failed state. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: libvirtd.service failed. ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: inactive (dead) Oct 09 07:42:57 dhcp-9-114.nay.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Oct 09 07:42:57 dhcp-9-114.nay.redhat.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Oct 09 07:42:57 dhcp-9-114.nay.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Oct 09 07:42:57 dhcp-9-114.nay.redhat.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Oct 09 07:42:57 dhcp-9-114.nay.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Oct 09 07:42:57 dhcp-9-114.nay.redhat.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Oct 09 07:42:58 dhcp-9-114.nay.redhat.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. # systemctl restart libvirtd Job for libvirtd.service failed because the control process exited with error code. See "systemctl status libvirtd.service" and "journalctl -xe" for details. # journalctl -xe -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit vdsm-network.service has failed. -- -- The result is dependency. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. -- Subject: Unit vdsmd.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit vdsmd.service has failed. -- -- The result is dependency. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes. -- Subject: Unit mom-vdsm.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit mom-vdsm.service has failed. -- -- The result is dependency. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Job vdsm-network.service/start failed with result 'dependency'. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Unit libvirtd.service entered failed state. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: libvirtd.service failed. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Stopping Auxiliary vdsm service for running helper functions as root... -- Subject: Unit supervdsmd.service has begun shutting down -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit supervdsmd.service has begun shutting down. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: start request repeated too quickly for supervdsmd.service Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Failed to start Auxiliary vdsm service for running helper functions as root. -- Subject: Unit supervdsmd.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit supervdsmd.service has failed. -- -- The result is failed. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: Unit supervdsmd.service entered failed state. Oct 09 09:17:48 dhcp-9-114.nay.redhat.com systemd[1]: supervdsmd.service failed. # systemctl status supervdsmd ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: failed (Result: start-limit) since Fri 2015-10-09 09:21:54 UTC; 23s ago Process: 17402 ExecStart=/usr/share/vdsm/daemonAdapter /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock (code=killed, signal=TERM) Main PID: 17402 (code=killed, signal=TERM) Oct 09 09:21:54 dhcp-9-114.nay.redhat.com systemd[1]: Started Auxiliary vdsm service for running helper functions as root. Oct 09 09:21:54 dhcp-9-114.nay.redhat.com systemd[1]: Starting Auxiliary vdsm service for running helper functions as root... Oct 09 09:21:54 dhcp-9-114.nay.redhat.com systemd[1]: Stopping Auxiliary vdsm service for running helper functions as root... Oct 09 09:21:54 dhcp-9-114.nay.redhat.com systemd[1]: start request repeated too quickly for supervdsmd.service Oct 09 09:21:54 dhcp-9-114.nay.redhat.com systemd[1]: Failed to start Auxiliary vdsm service for running helper functions as root. Oct 09 09:21:54 dhcp-9-114.nay.redhat.com systemd[1]: Unit supervdsmd.service entered failed state. Oct 09 09:21:54 dhcp-9-114.nay.redhat.com systemd[1]: supervdsmd.service failed. Oct 09 09:22:01 dhcp-9-114.nay.redhat.com systemd[1]: start request repeated too quickly for supervdsmd.service Oct 09 09:22:01 dhcp-9-114.nay.redhat.com systemd[1]: Failed to start Auxiliary vdsm service for running helper functions as root. Oct 09 09:22:01 dhcp-9-114.nay.redhat.com systemd[1]: supervdsmd.service failed.
Created attachment 1081233 [details] rhevh_var_log
Created attachment 1081234 [details] sosreport_on_host
Created attachment 1081235 [details] pic
Additional info: 1. Once we encountered this error, whatever reboot rhevh how many times, we still encountered this error always. 2. can reproduce this issue without rhevm registration, just need to reboot rhevh several times to reproduce this error.
(In reply to Ying Cui from comment #4) > Additional info: > 1. Once we encountered this error, whatever reboot rhevh how many times, we > still encountered this error always. I tested 4 times on two machines for above.
This looks a lot like bug 1251151
*** This bug has been marked as a duplicate of bug 1251151 ***
We have to re-open this bug, because the bug 1251151 is verified, but this bug are still here on build rhev-hypervisor7-7.2-20151009.0(ovirt-node-3.3.0-0.13.20151008git03eefb5.el7ev.noarch) and rhev-hypervisor7-7.2-20151013.76.iso(ovirt-node-3.3.0-0.14.20151013git5f84da0.el7ev.noarch) so reopen this urgent issue now. log will be provided later with build rhev-hypervisor7-7.2-20151013.76.iso(ovirt-node-3.3.0-0.14.20151013git5f84da0.el7ev.noarch)
In the logs I see: Oct 9 07:30:01 dhcp-9-114 journal: libvirt version: 1.2.17, package: 11.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-09-25-04:15:16, x86-036.build.eng.bos.redhat.com) Oct 9 07:30:01 dhcp-9-114 journal: Cannot read certificate '/etc/pki/libvirt/servercert.pem': No such file or directory Oct 9 07:30:01 dhcp-9-114 systemd: libvirtd.service: main process exited, code=exited, status=6/NOTCONFIGURED Oct 9 07:30:01 dhcp-9-114 systemd: Failed to start Virtualization daemon. In my testing that dir is also missing, but libvirtd comes up.
Created attachment 1082318 [details] freshlog_varlog_coment8
Created attachment 1082319 [details] freshlog_sosreport_comment8
Hi, My findings so far. I have reproduced the report, in the 3th or 4th reboot after installation, not even needed to setup network or register the node. In the second reboot VDSM was able to setup itself and configured/persisted the files: /etc/libvirt/libvirtd.conf, /etc/libvirt/qemu-sanlock.conf and /etc/libvirt/qemu.conf. However, in one of the reboots I have noticed that /etc/libvirt/libvirtd.conf was not persisted anymore and in the next reboot it returned to the original state, without the VDSM configuration affecting the system. The libvirt error in the logs, shows an error finding the cacert.pem as VDSM configuration is missing to setup the right path to cacert.pem. ============================================================================== 2015-10-13 15:26:24.596+0000: 17114: error : virNetTLSContextCheckCertFile:120 : Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 7.2 (20151009.0.el7ev) # rpm -qa | grep -i vdsm vdsm-python-4.17.8-1.el7ev.noarch vdsm-infra-4.17.8-1.el7ev.noarch vdsm-yajsonrpc-4.17.8-1.el7ev.noarch vdsm-xmlrpc-4.17.8-1.el7ev.noarch vdsm-jsonrpc-4.17.8-1.el7ev.noarch vdsm-4.17.8-1.el7ev.noarch ovirt-node-plugin-vdsm-0.6.1-1.el7ev.noarch vdsm-cli-4.17.8-1.el7ev.noarch vdsm-hook-ethtool-options-4.17.8-1.el7ev.noarch # rpm -qa | grep -i libvirt libvirt-python-1.2.17-2.el7.x86_64 libvirt-daemon-1.2.17-13.el7.x86_64 libvirt-daemon-driver-nodedev-1.2.17-13.el7.x86_64 libvirt-1.2.17-13.el7.x86_64 libvirt-client-1.2.17-13.el7.x86_64 libvirt-daemon-driver-nwfilter-1.2.17-13.el7.x86_64 libvirt-daemon-driver-interface-1.2.17-13.el7.x86_64 libvirt-daemon-driver-secret-1.2.17-13.el7.x86_64 libvirt-daemon-driver-network-1.2.17-13.el7.x86_64 libvirt-daemon-config-network-1.2.17-13.el7.x86_64 libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64 libvirt-daemon-driver-storage-1.2.17-13.el7.x86_64 libvirt-cim-0.6.3-19.el7.x86_64 libvirt-daemon-config-nwfilter-1.2.17-13.el7.x86_64 libvirt-lock-sanlock-1.2.17-13.el7.x86_64 libvirt-daemon-driver-lxc-1.2.17-13.el7.x86_64 libvirt-daemon-kvm-1.2.17-13.el7.x86_64
Hi Dan, Could VDSM configurator removing itself just in libvirtd.conf?
(In reply to Douglas Schilling Landgraf from comment #15) > Hi Dan, > > Could VDSM configurator removing itself just in libvirtd.conf? I can happen only if someone explicitly asked to do so via the vdsm-tool command line, which I don't think has happened here.
Working in this report, investigating what's causing the vdsm-tool configurator failure (to unpersist?). I don't see it happening in previous versions, like RHEV-H 7.1. Oct 15 21:20:19 Running handler: /usr/libexec/ovirt-node/hooks/on-boot/01-vdsm-configure Traceback (most recent call last): File "/bin/vdsm-tool", line 219, in main return tool_command[cmd]["command"](*args) File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in wrapper File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 156, in configure File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 103, in _configure File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line 65, in configure File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line 104, in removeConf File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line 225, in _removeSection File "/usr/lib/python2.7/site-packages/vdsm/tool/configfile.py", line 179, in __exit__ OSError: [Errno 16] Device or resource busy
(In reply to Douglas Schilling Landgraf from comment #17) > Working in this report, investigating what's causing the vdsm-tool > configurator failure (to unpersist?). I don't see it happening in previous > versions, like RHEV-H 7.1. > > Oct 15 21:20:19 Running handler: > /usr/libexec/ovirt-node/hooks/on-boot/01-vdsm-configure > Traceback (most recent call last): > File "/bin/vdsm-tool", line 219, in main > return tool_command[cmd]["command"](*args) > File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in > wrapper > File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line > 156, in configure > File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line > 103, in _configure > File > "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line > 65, in configure > File > "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line > 104, in removeConf > File > "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line > 225, in _removeSection > File "/usr/lib/python2.7/site-packages/vdsm/tool/configfile.py", line 179, > in __exit__ > OSError: [Errno 16] Device or resource busy Note: I have also experienced this with --force, but not without. I hope to find the time to check the vdsm-tool source to find the difference later tonight.
Appears on 3.5.6 as well.
But maybe a different cause (in comment 19 the ntpd namespace patch was not applied correctly)
Followed the bug description test steps, this bug is fixed and verified on build # rpm -qa ovirt-node ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 7.2 (20151104.0.el7ev) Test steps: 1. TUI clean installed rhevh 7.2 successful 2. Login TUI 3. Setup network via dhcp 4. add rhevh via rhevm 3.6.0.3-0.1 portal successful, and host is UP. 5. Maintenance the Host 6. Login RHEV-H TUI 7. Restart RHEV-H more than 7 times, every time there is NO "Failed to Establish Libvirt Connection " error.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0378.html