Bug 1281363 - Failed to start vm on RHEV-H after upgrade
Failed to start vm on RHEV-H after upgrade
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node (Show other bugs)
3.5.6
x86_64 Linux
urgent Severity urgent
: ovirt-3.6.1
: 3.6.0
Assigned To: Fabian Deutsch
Ying Cui
node
: Regression, ZStream
Depends On:
Blocks: 1282820
  Show dependency treegraph
 
Reported: 2015-11-12 06:49 EST by Artyom
Modified: 2016-02-10 14:18 EST (History)
21 users (show)

See Also:
Fixed In Version: ovirt-node-3.6.0-0.22.20151126git72acbb2.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1282820 (view as bug list)
Environment:
Last Closed: 2015-11-27 05:07:12 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Node
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sos report (7.84 MB, application/x-xz)
2015-11-12 08:21 EST, Artyom
no flags Details
output of ps eZ (4.06 KB, text/plain)
2015-11-12 08:22 EST, Artyom
no flags Details
output of ps -eZ (11.65 KB, text/plain)
2015-11-12 08:40 EST, Artyom
no flags Details
sosreport_iSCSStorage_rhevh67_20151112.0 (7.10 MB, application/x-xz)
2015-11-13 05:32 EST, Ying Cui
no flags Details
freshlog_varlog_rhevh67_20151112.0 (384.22 KB, application/x-bzip)
2015-11-13 05:32 EST, Ying Cui
no flags Details
screenshot_VM_Failed_Start (146.03 KB, image/png)
2015-11-13 05:33 EST, Ying Cui
no flags Details
engine.log (360.53 KB, text/plain)
2015-11-13 06:00 EST, Ying Cui
no flags Details
ps -eZ output (36.79 KB, text/plain)
2015-11-13 06:07 EST, Ying Cui
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 48532 None None None Never

  None (edit)
Comment 1 Fabian Deutsch 2015-11-12 06:57:26 EST
How high is the reproducebility?
Comment 2 Artyom 2015-11-12 07:00:59 EST
For me it happen always, I also want to specify that host pass upgrade from
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151028.0.el6ev) to
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151029.0.el6ev), so maybe it reason why it failed, because I found that we encountered the same problem in past
http://comments.gmane.org/gmane.comp.emulators.ovirt.user/27482
Comment 3 Fabian Deutsch 2015-11-12 07:59:39 EST
Please attach a complete sosreport and the output of "ps eZ"
Comment 4 Sandro Bonazzola 2015-11-12 08:16:05 EST
vdsm-4.16.28-1.el6ev.x86_64 should have been 4.16.29. Please check the RHEV-H content.
Comment 5 Sandro Bonazzola 2015-11-12 08:17:23 EST
moving downstream being this downstream only bug
Comment 6 Artyom 2015-11-12 08:21 EST
Created attachment 1093267 [details]
sos report

Just take in account we now have problem with VNX storage, and it will be many errors about it in logs.
Comment 7 Artyom 2015-11-12 08:22 EST
Created attachment 1093268 [details]
output of ps eZ
Comment 10 Fabian Deutsch 2015-11-12 08:34:38 EST
Sorry, ps -eZ please
Comment 11 Artyom 2015-11-12 08:35:45 EST
See in attachments
Comment 12 Artyom 2015-11-12 08:40 EST
Created attachment 1093276 [details]
output of ps -eZ
Comment 13 Fabian Deutsch 2015-11-12 08:44:36 EST
libvirtd was started in the wrong context:

system_u:system_r:ovirt_t:s0    14682 ?        00:00:04 libvirtd

It should be in system_u:system_r:virtd_t:s0-s0:c0.c1023 or alike.
Comment 14 Fabian Deutsch 2015-11-12 09:48:48 EST
I actually wonder if this is the root cause, because that message appears somewhere in the middle, at the end there are several other errors:

Nov 12 10:39:31 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:31 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:31 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:41 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:41 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Nov 12 10:39:51 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Nov 12 10:39:51 localhost vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1707, in _getHaInfo#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 177, in _configure_broker_conn#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '09035925-5377-4024-919e-880eee080fa0'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Comment 15 Simone Tiraboschi 2015-11-12 10:01:24 EST
Here it seams that the storage wasn't available:

Nov 12 10:39:41 localhost ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=09035925-5377-4024-919e-880eee080fa0'#012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 65, in set_storage_domain#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 450, in connect#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012BackendFailureException: path to storage domain 09035925-5377-4024-919e-880eee080fa0 not found in /rhev/data-center/mnt
Comment 16 Fabian Deutsch 2015-11-12 10:02:20 EST
Thanks.

Artyom, can you please retry to reproduce this bug on other hardware and ensure that the storage is reachable?
Comment 17 Ying Cui 2015-11-13 03:27:50 EST
I reproduced this bug after upgrade to hypervisor6-6.7-20151029.0 or hypervisor6-6.7-20151112.0, but not 100% reproduce. I will give more info soon because testing is in process, I need to try more scenarios and different hardware.
Comment 19 Ying Cui 2015-11-13 05:16:46 EST
(In reply to Fabian Deutsch from comment #16)
> Thanks.
> 
> Artyom, can you please retry to reproduce this bug on other hardware and
> ensure that the storage is reachable?

Let me instead on this needinfo.
Yes, storage is reachable, Host and Storage is active.
Comment 20 Ying Cui 2015-11-13 05:32 EST
Created attachment 1093584 [details]
sosreport_iSCSStorage_rhevh67_20151112.0
Comment 21 Ying Cui 2015-11-13 05:32 EST
Created attachment 1093585 [details]
freshlog_varlog_rhevh67_20151112.0
Comment 22 Ying Cui 2015-11-13 05:33 EST
Created attachment 1093586 [details]
screenshot_VM_Failed_Start
Comment 23 Ying Cui 2015-11-13 06:00 EST
Created attachment 1093588 [details]
engine.log
Comment 24 Ying Cui 2015-11-13 06:07 EST
Created attachment 1093589 [details]
ps -eZ output
Comment 25 Ying Cui 2015-11-13 06:08:46 EST
comment 20 - 24 for fresh logs info, for further investigation.
Comment 26 Fabian Deutsch 2015-11-13 06:53:44 EST
The cause is probably (as noted above) that libvirtd is running in the wrong context:

[root@dhcp-test-9-205 ~]# ps -eZ | grep libvi
system_u:system_r:ovirt_t:s0    14740 ?        00:00:00 libvirtd

The correct context would be like:

[root@dhcp-test-9-205 ~]# ps -eZ | grep libvirtd
system_u:system_r:virtd_t:s0-s0:c0.c1023 14708 ? 00:00:00 libvirtd

I modified the x.x.x.205 host and the VMs now come up again.

The attached patch fixes this in the code.
Comment 30 Ying Cui 2015-11-17 22:05:58 EST
This bug only happened on RHEV-H el6, no need clone to zstream. 
so I verified this bug on rhevh 6.7 for 3.5.6

# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20151117.0.el6ev)

# rpm -qa ovirt-node
ovirt-node-3.2.3-28.el6.noarch

Test servers:
Local Storage - 2 times, PASS
iSCSI Storage - 2 times, PASS


Test Steps:
1. Install RHEV-H 6.7-20151015.1 the last released version
2. Register to RHEVM 3.5.6.2-0.1.el6ev successful
3. Connect storage
4. Create VM, VM can start successful.
5. Shutdown VM
6. Make Host to maintenance
7. Upgrade RHEVH 6.7 20151117.0 via RHEVM
8. Upgrade successful.
9. Host and Storage is active yet.
10. Start VM which created in step 4 - PASS.
11. Create new VM, and start VM - PASS.

I will change the bug status once the bug is ON_QA.
Comment 32 Ying Cui 2015-11-26 23:34:29 EST
This bug only happened on RHEV-H 6.7 el6, so we can not verify it on RHEV-H 7.2 for 3.6.0 build.

This issue was verified on RHEV-H 6.7 for 3.5.6 build yet.

Suggest to close this bug. Thanks
Comment 33 Fabian Deutsch 2015-11-27 05:07:12 EST
Closing this bug according to comment 32.

Thanks Ying.
Comment 34 Ying Cui 2015-11-27 05:17:51 EST
This bug is valid bug when bug is reported, but just for 3.5.z, So updated the status to current release instead of notabug.

Note You need to log in before you can comment on or make changes to this bug.