Bug 1270203
Summary: | Failed to Establish Libvirt Connection on RHEV-H 7.2 after several times reboot | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ying Cui <ycui> | ||||||||||||
Component: | ovirt-node | Assignee: | Douglas Schilling Landgraf <dougsland> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Ying Cui <ycui> | ||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||
Priority: | urgent | ||||||||||||||
Version: | 3.6.0 | CC: | cshao, cwu, danken, dougsland, fdeutsch, gklein, huiwa, huzhao, leiwang, lsurette, mgoldboi, rbarry, ycui, ykaul | ||||||||||||
Target Milestone: | ovirt-3.6.0-rc3 | Keywords: | Regression, Reopened, TestBlocker | ||||||||||||
Target Release: | 3.6.0 | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | ovirt-node-3.3.0-0.18.20151022git82dc52c.el7ev | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 1276125 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2016-03-09 14:40:07 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1264065, 1276125 | ||||||||||||||
Attachments: |
|
Description
Ying Cui
2015-10-09 09:24:19 UTC
Created attachment 1081233 [details]
rhevh_var_log
Created attachment 1081234 [details]
sosreport_on_host
Created attachment 1081235 [details]
pic
Additional info: 1. Once we encountered this error, whatever reboot rhevh how many times, we still encountered this error always. 2. can reproduce this issue without rhevm registration, just need to reboot rhevh several times to reproduce this error. (In reply to Ying Cui from comment #4) > Additional info: > 1. Once we encountered this error, whatever reboot rhevh how many times, we > still encountered this error always. I tested 4 times on two machines for above. This looks a lot like bug 1251151 *** This bug has been marked as a duplicate of bug 1251151 *** We have to re-open this bug, because the bug 1251151 is verified, but this bug are still here on build rhev-hypervisor7-7.2-20151009.0(ovirt-node-3.3.0-0.13.20151008git03eefb5.el7ev.noarch) and rhev-hypervisor7-7.2-20151013.76.iso(ovirt-node-3.3.0-0.14.20151013git5f84da0.el7ev.noarch) so reopen this urgent issue now. log will be provided later with build rhev-hypervisor7-7.2-20151013.76.iso(ovirt-node-3.3.0-0.14.20151013git5f84da0.el7ev.noarch) In the logs I see: Oct 9 07:30:01 dhcp-9-114 journal: libvirt version: 1.2.17, package: 11.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-09-25-04:15:16, x86-036.build.eng.bos.redhat.com) Oct 9 07:30:01 dhcp-9-114 journal: Cannot read certificate '/etc/pki/libvirt/servercert.pem': No such file or directory Oct 9 07:30:01 dhcp-9-114 systemd: libvirtd.service: main process exited, code=exited, status=6/NOTCONFIGURED Oct 9 07:30:01 dhcp-9-114 systemd: Failed to start Virtualization daemon. In my testing that dir is also missing, but libvirtd comes up. Created attachment 1082318 [details]
freshlog_varlog_coment8
Created attachment 1082319 [details]
freshlog_sosreport_comment8
Hi, My findings so far. I have reproduced the report, in the 3th or 4th reboot after installation, not even needed to setup network or register the node. In the second reboot VDSM was able to setup itself and configured/persisted the files: /etc/libvirt/libvirtd.conf, /etc/libvirt/qemu-sanlock.conf and /etc/libvirt/qemu.conf. However, in one of the reboots I have noticed that /etc/libvirt/libvirtd.conf was not persisted anymore and in the next reboot it returned to the original state, without the VDSM configuration affecting the system. The libvirt error in the logs, shows an error finding the cacert.pem as VDSM configuration is missing to setup the right path to cacert.pem. ============================================================================== 2015-10-13 15:26:24.596+0000: 17114: error : virNetTLSContextCheckCertFile:120 : Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 7.2 (20151009.0.el7ev) # rpm -qa | grep -i vdsm vdsm-python-4.17.8-1.el7ev.noarch vdsm-infra-4.17.8-1.el7ev.noarch vdsm-yajsonrpc-4.17.8-1.el7ev.noarch vdsm-xmlrpc-4.17.8-1.el7ev.noarch vdsm-jsonrpc-4.17.8-1.el7ev.noarch vdsm-4.17.8-1.el7ev.noarch ovirt-node-plugin-vdsm-0.6.1-1.el7ev.noarch vdsm-cli-4.17.8-1.el7ev.noarch vdsm-hook-ethtool-options-4.17.8-1.el7ev.noarch # rpm -qa | grep -i libvirt libvirt-python-1.2.17-2.el7.x86_64 libvirt-daemon-1.2.17-13.el7.x86_64 libvirt-daemon-driver-nodedev-1.2.17-13.el7.x86_64 libvirt-1.2.17-13.el7.x86_64 libvirt-client-1.2.17-13.el7.x86_64 libvirt-daemon-driver-nwfilter-1.2.17-13.el7.x86_64 libvirt-daemon-driver-interface-1.2.17-13.el7.x86_64 libvirt-daemon-driver-secret-1.2.17-13.el7.x86_64 libvirt-daemon-driver-network-1.2.17-13.el7.x86_64 libvirt-daemon-config-network-1.2.17-13.el7.x86_64 libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64 libvirt-daemon-driver-storage-1.2.17-13.el7.x86_64 libvirt-cim-0.6.3-19.el7.x86_64 libvirt-daemon-config-nwfilter-1.2.17-13.el7.x86_64 libvirt-lock-sanlock-1.2.17-13.el7.x86_64 libvirt-daemon-driver-lxc-1.2.17-13.el7.x86_64 libvirt-daemon-kvm-1.2.17-13.el7.x86_64 Hi Dan, Could VDSM configurator removing itself just in libvirtd.conf? (In reply to Douglas Schilling Landgraf from comment #15) > Hi Dan, > > Could VDSM configurator removing itself just in libvirtd.conf? I can happen only if someone explicitly asked to do so via the vdsm-tool command line, which I don't think has happened here. Working in this report, investigating what's causing the vdsm-tool configurator failure (to unpersist?). I don't see it happening in previous versions, like RHEV-H 7.1. Oct 15 21:20:19 Running handler: /usr/libexec/ovirt-node/hooks/on-boot/01-vdsm-configure Traceback (most recent call last): File "/bin/vdsm-tool", line 219, in main return tool_command[cmd]["command"](*args) File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in wrapper File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 156, in configure File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 103, in _configure File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line 65, in configure File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line 104, in removeConf File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line 225, in _removeSection File "/usr/lib/python2.7/site-packages/vdsm/tool/configfile.py", line 179, in __exit__ OSError: [Errno 16] Device or resource busy (In reply to Douglas Schilling Landgraf from comment #17) > Working in this report, investigating what's causing the vdsm-tool > configurator failure (to unpersist?). I don't see it happening in previous > versions, like RHEV-H 7.1. > > Oct 15 21:20:19 Running handler: > /usr/libexec/ovirt-node/hooks/on-boot/01-vdsm-configure > Traceback (most recent call last): > File "/bin/vdsm-tool", line 219, in main > return tool_command[cmd]["command"](*args) > File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in > wrapper > File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line > 156, in configure > File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line > 103, in _configure > File > "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line > 65, in configure > File > "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line > 104, in removeConf > File > "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/libvirt.py", line > 225, in _removeSection > File "/usr/lib/python2.7/site-packages/vdsm/tool/configfile.py", line 179, > in __exit__ > OSError: [Errno 16] Device or resource busy Note: I have also experienced this with --force, but not without. I hope to find the time to check the vdsm-tool source to find the difference later tonight. Appears on 3.5.6 as well. But maybe a different cause (in comment 19 the ntpd namespace patch was not applied correctly) Followed the bug description test steps, this bug is fixed and verified on build # rpm -qa ovirt-node ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 7.2 (20151104.0.el7ev) Test steps: 1. TUI clean installed rhevh 7.2 successful 2. Login TUI 3. Setup network via dhcp 4. add rhevh via rhevm 3.6.0.3-0.1 portal successful, and host is UP. 5. Maintenance the Host 6. Login RHEV-H TUI 7. Restart RHEV-H more than 7 times, every time there is NO "Failed to Establish Libvirt Connection " error. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0378.html |