Bug 1281363

Summary: Failed to start vm on RHEV-H after upgrade
Product: Red Hat Enterprise Virtualization Manager Reporter: Artyom <alukiano>
Component: ovirt-nodeAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED CURRENTRELEASE QA Contact: Ying Cui <ycui>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.6CC: alukiano, bugs, cshao, cwu, dfediuck, ecohen, fdeutsch, gklein, huiwa, huzhao, leiwang, lsurette, mavital, mgoldboi, pstehlik, sbonazzo, stirabos, virt-bugs, yaniwang, ycui, yeylon
Target Milestone: ovirt-3.6.1Keywords: Regression, ZStream
Target Release: 3.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: node
Fixed In Version: ovirt-node-3.6.0-0.22.20151126git72acbb2.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1282820 (view as bug list) Environment:
Last Closed: 2015-11-27 10:07:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1282820    
Attachments:
Description Flags
sos report
none
output of ps eZ
none
output of ps -eZ
none
sosreport_iSCSStorage_rhevh67_20151112.0
none
freshlog_varlog_rhevh67_20151112.0
none
screenshot_VM_Failed_Start
none
engine.log
none
ps -eZ output none

Comment 1 Fabian Deutsch 2015-11-12 11:57:26 UTC
How high is the reproducebility?

Comment 2 Artyom 2015-11-12 12:00:59 UTC
For me it happen always, I also want to specify that host pass upgrade from
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151028.0.el6ev) to
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151029.0.el6ev), so maybe it reason why it failed, because I found that we encountered the same problem in past
http://comments.gmane.org/gmane.comp.emulators.ovirt.user/27482

Comment 3 Fabian Deutsch 2015-11-12 12:59:39 UTC
Please attach a complete sosreport and the output of "ps eZ"

Comment 4 Sandro Bonazzola 2015-11-12 13:16:05 UTC
vdsm-4.16.28-1.el6ev.x86_64 should have been 4.16.29. Please check the RHEV-H content.

Comment 5 Sandro Bonazzola 2015-11-12 13:17:23 UTC
moving downstream being this downstream only bug

Comment 6 Artyom 2015-11-12 13:21:22 UTC
Created attachment 1093267 [details]
sos report

Just take in account we now have problem with VNX storage, and it will be many errors about it in logs.

Comment 7 Artyom 2015-11-12 13:22:12 UTC
Created attachment 1093268 [details]
output of ps eZ

Comment 10 Fabian Deutsch 2015-11-12 13:34:38 UTC
Sorry, ps -eZ please

Comment 11 Artyom 2015-11-12 13:35:45 UTC
See in attachments

Comment 12 Artyom 2015-11-12 13:40:04 UTC
Created attachment 1093276 [details]
output of ps -eZ

Comment 13 Fabian Deutsch 2015-11-12 13:44:36 UTC
libvirtd was started in the wrong context:

system_u:system_r:ovirt_t:s0    14682 ?        00:00:04 libvirtd

It should be in system_u:system_r:virtd_t:s0-s0:c0.c1023 or alike.

Comment 14 Fabian Deutsch 2015-11-12 14:48:48 UTC
I actually wonder if this is the root cause, because that message appears somewhere in the middle, at the end there are several other errors:

Nov 12 10:39:31 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:31 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:31 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:41 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:41 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:51 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:51 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>

Comment 15 Simone Tiraboschi 2015-11-12 15:01:24 UTC
Here it seams that the storage wasn't available:

Nov 12 10:39:41 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt

Comment 16 Fabian Deutsch 2015-11-12 15:02:20 UTC
Thanks.

Artyom, can you please retry to reproduce this bug on other hardware and ensure that the storage is reachable?

Comment 17 Ying Cui 2015-11-13 08:27:50 UTC
I reproduced this bug after upgrade to hypervisor6-6.7-20151029.0 or hypervisor6-6.7-20151112.0, but not 100% reproduce. I will give more info soon because testing is in process, I need to try more scenarios and different hardware.

Comment 19 Ying Cui 2015-11-13 10:16:46 UTC
(In reply to Fabian Deutsch from comment #16)
> Thanks.
> 
> Artyom, can you please retry to reproduce this bug on other hardware and
> ensure that the storage is reachable?

Let me instead on this needinfo.
Yes, storage is reachable, Host and Storage is active.

Comment 20 Ying Cui 2015-11-13 10:32:11 UTC
Created attachment 1093584 [details]
sosreport_iSCSStorage_rhevh67_20151112.0

Comment 21 Ying Cui 2015-11-13 10:32:58 UTC
Created attachment 1093585 [details]
freshlog_varlog_rhevh67_20151112.0

Comment 22 Ying Cui 2015-11-13 10:33:47 UTC
Created attachment 1093586 [details]
screenshot_VM_Failed_Start

Comment 23 Ying Cui 2015-11-13 11:00:30 UTC
Created attachment 1093588 [details]
engine.log

Comment 24 Ying Cui 2015-11-13 11:07:36 UTC
Created attachment 1093589 [details]
ps -eZ output

Comment 25 Ying Cui 2015-11-13 11:08:46 UTC
comment 20 - 24 for fresh logs info, for further investigation.

Comment 26 Fabian Deutsch 2015-11-13 11:53:44 UTC
The cause is probably (as noted above) that libvirtd is running in the wrong context:

[root@dhcp-test-9-205 ~]# ps -eZ | grep libvi
system_u:system_r:ovirt_t:s0    14740 ?        00:00:00 libvirtd

The correct context would be like:

[root@dhcp-test-9-205 ~]# ps -eZ | grep libvirtd
system_u:system_r:virtd_t:s0-s0:c0.c1023 14708 ? 00:00:00 libvirtd

I modified the x.x.x.205 host and the VMs now come up again.

The attached patch fixes this in the code.

Comment 30 Ying Cui 2015-11-18 03:05:58 UTC
This bug only happened on RHEV-H el6, no need clone to zstream. 
so I verified this bug on rhevh 6.7 for 3.5.6

# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151117.0.el6ev)

# rpm -qa ovirt-node
ovirt-node-3.2.3-28.el6.noarch

Test servers:
Local Storage - 2 times, PASS
iSCSI Storage - 2 times, PASS


Test Steps:
1. Install RHEV-H 6.7-20151015.1 the last released version
2. Register to RHEVM 3.5.6.2-0.1.el6ev successful
3. Connect storage
4. Create VM, VM can start successful.
5. Shutdown VM
6. Make Host to maintenance
7. Upgrade RHEVH 6.7 20151117.0 via RHEVM
8. Upgrade successful.
9. Host and Storage is active yet.
10. Start VM which created in step 4 - PASS.
11. Create new VM, and start VM - PASS.

I will change the bug status once the bug is ON_QA.

Comment 32 Ying Cui 2015-11-27 04:34:29 UTC
This bug only happened on RHEV-H 6.7 el6, so we can not verify it on RHEV-H 7.2 for 3.6.0 build.

This issue was verified on RHEV-H 6.7 for 3.5.6 build yet.

Suggest to close this bug. Thanks

Comment 33 Fabian Deutsch 2015-11-27 10:07:12 UTC
Closing this bug according to comment 32.

Thanks Ying.

Comment 34 Ying Cui 2015-11-27 10:17:51 UTC
This bug is valid bug when bug is reported, but just for 3.5.z, So updated the status to current release instead of notabug.