Bug 1281363 - Failed to start vm on RHEV-H after upgrade
Summary: Failed to start vm on RHEV-H after upgrade
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node
Version: 3.5.6
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ovirt-3.6.1
: 3.6.0
Assignee: Fabian Deutsch
QA Contact: Ying Cui
URL:
Whiteboard: node
Depends On:
Blocks: 1282820
TreeView+ depends on / blocked
 
Reported: 2015-11-12 11:49 UTC by Artyom
Modified: 2016-02-10 19:18 UTC (History)
21 users (show)

Fixed In Version: ovirt-node-3.6.0-0.22.20151126git72acbb2.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1282820 (view as bug list)
Environment:
Last Closed: 2015-11-27 10:07:12 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sos report (7.84 MB, application/x-xz)
2015-11-12 13:21 UTC, Artyom
no flags Details
output of ps eZ (4.06 KB, text/plain)
2015-11-12 13:22 UTC, Artyom
no flags Details
output of ps -eZ (11.65 KB, text/plain)
2015-11-12 13:40 UTC, Artyom
no flags Details
sosreport_iSCSStorage_rhevh67_20151112.0 (7.10 MB, application/x-xz)
2015-11-13 10:32 UTC, Ying Cui
no flags Details
freshlog_varlog_rhevh67_20151112.0 (384.22 KB, application/x-bzip)
2015-11-13 10:32 UTC, Ying Cui
no flags Details
screenshot_VM_Failed_Start (146.03 KB, image/png)
2015-11-13 10:33 UTC, Ying Cui
no flags Details
engine.log (360.53 KB, text/plain)
2015-11-13 11:00 UTC, Ying Cui
no flags Details
ps -eZ output (36.79 KB, text/plain)
2015-11-13 11:07 UTC, Ying Cui
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 48532 0 None MERGED hooks: Run hooks from neutral context to allow SELinux transitions 2020-02-09 10:02:35 UTC

Comment 1 Fabian Deutsch 2015-11-12 11:57:26 UTC
How high is the reproducebility?

Comment 2 Artyom 2015-11-12 12:00:59 UTC
For me it happen always, I also want to specify that host pass upgrade from
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151028.0.el6ev) to
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151029.0.el6ev), so maybe it reason why it failed, because I found that we encountered the same problem in past
http://comments.gmane.org/gmane.comp.emulators.ovirt.user/27482

Comment 3 Fabian Deutsch 2015-11-12 12:59:39 UTC
Please attach a complete sosreport and the output of "ps eZ"

Comment 4 Sandro Bonazzola 2015-11-12 13:16:05 UTC
vdsm-4.16.28-1.el6ev.x86_64 should have been 4.16.29. Please check the RHEV-H content.

Comment 5 Sandro Bonazzola 2015-11-12 13:17:23 UTC
moving downstream being this downstream only bug

Comment 6 Artyom 2015-11-12 13:21:22 UTC
Created attachment 1093267 [details]
sos report

Just take in account we now have problem with VNX storage, and it will be many errors about it in logs.

Comment 7 Artyom 2015-11-12 13:22:12 UTC
Created attachment 1093268 [details]
output of ps eZ

Comment 10 Fabian Deutsch 2015-11-12 13:34:38 UTC
Sorry, ps -eZ please

Comment 11 Artyom 2015-11-12 13:35:45 UTC
See in attachments

Comment 12 Artyom 2015-11-12 13:40:04 UTC
Created attachment 1093276 [details]
output of ps -eZ

Comment 13 Fabian Deutsch 2015-11-12 13:44:36 UTC
libvirtd was started in the wrong context:

system_u:system_r:ovirt_t:s0    14682 ?        00:00:04 libvirtd

It should be in system_u:system_r:virtd_t:s0-s0:c0.c1023 or alike.

Comment 14 Fabian Deutsch 2015-11-12 14:48:48 UTC
I actually wonder if this is the root cause, because that message appears somewhere in the middle, at the end there are several other errors:

Nov 12 10:39:31 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:31 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:31 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:41 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:41 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:51 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:51 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>

Comment 15 Simone Tiraboschi 2015-11-12 15:01:24 UTC
Here it seams that the storage wasn't available:

Nov 12 10:39:41 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt

Comment 16 Fabian Deutsch 2015-11-12 15:02:20 UTC
Thanks.

Artyom, can you please retry to reproduce this bug on other hardware and ensure that the storage is reachable?

Comment 17 Ying Cui 2015-11-13 08:27:50 UTC
I reproduced this bug after upgrade to hypervisor6-6.7-20151029.0 or hypervisor6-6.7-20151112.0, but not 100% reproduce. I will give more info soon because testing is in process, I need to try more scenarios and different hardware.

Comment 19 Ying Cui 2015-11-13 10:16:46 UTC
(In reply to Fabian Deutsch from comment #16)
> Thanks.
> 
> Artyom, can you please retry to reproduce this bug on other hardware and
> ensure that the storage is reachable?

Let me instead on this needinfo.
Yes, storage is reachable, Host and Storage is active.

Comment 20 Ying Cui 2015-11-13 10:32:11 UTC
Created attachment 1093584 [details]
sosreport_iSCSStorage_rhevh67_20151112.0

Comment 21 Ying Cui 2015-11-13 10:32:58 UTC
Created attachment 1093585 [details]
freshlog_varlog_rhevh67_20151112.0

Comment 22 Ying Cui 2015-11-13 10:33:47 UTC
Created attachment 1093586 [details]
screenshot_VM_Failed_Start

Comment 23 Ying Cui 2015-11-13 11:00:30 UTC
Created attachment 1093588 [details]
engine.log

Comment 24 Ying Cui 2015-11-13 11:07:36 UTC
Created attachment 1093589 [details]
ps -eZ output

Comment 25 Ying Cui 2015-11-13 11:08:46 UTC
comment 20 - 24 for fresh logs info, for further investigation.

Comment 26 Fabian Deutsch 2015-11-13 11:53:44 UTC
The cause is probably (as noted above) that libvirtd is running in the wrong context:

[root@dhcp-test-9-205 ~]# ps -eZ | grep libvi
system_u:system_r:ovirt_t:s0    14740 ?        00:00:00 libvirtd

The correct context would be like:

[root@dhcp-test-9-205 ~]# ps -eZ | grep libvirtd
system_u:system_r:virtd_t:s0-s0:c0.c1023 14708 ? 00:00:00 libvirtd

I modified the x.x.x.205 host and the VMs now come up again.

The attached patch fixes this in the code.

Comment 30 Ying Cui 2015-11-18 03:05:58 UTC
This bug only happened on RHEV-H el6, no need clone to zstream. 
so I verified this bug on rhevh 6.7 for 3.5.6

# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151117.0.el6ev)

# rpm -qa ovirt-node
ovirt-node-3.2.3-28.el6.noarch

Test servers:
Local Storage - 2 times, PASS
iSCSI Storage - 2 times, PASS


Test Steps:
1. Install RHEV-H 6.7-20151015.1 the last released version
2. Register to RHEVM 3.5.6.2-0.1.el6ev successful
3. Connect storage
4. Create VM, VM can start successful.
5. Shutdown VM
6. Make Host to maintenance
7. Upgrade RHEVH 6.7 20151117.0 via RHEVM
8. Upgrade successful.
9. Host and Storage is active yet.
10. Start VM which created in step 4 - PASS.
11. Create new VM, and start VM - PASS.

I will change the bug status once the bug is ON_QA.

Comment 32 Ying Cui 2015-11-27 04:34:29 UTC
This bug only happened on RHEV-H 6.7 el6, so we can not verify it on RHEV-H 7.2 for 3.6.0 build.

This issue was verified on RHEV-H 6.7 for 3.5.6 build yet.

Suggest to close this bug. Thanks

Comment 33 Fabian Deutsch 2015-11-27 10:07:12 UTC
Closing this bug according to comment 32.

Thanks Ying.

Comment 34 Ying Cui 2015-11-27 10:17:51 UTC
This bug is valid bug when bug is reported, but just for 3.5.z, So updated the status to current release instead of notabug.


Note You need to log in before you can comment on or make changes to this bug.