Bug 1459156

Summary: HA VM moves to a new host when blocking connection to host and storage domain with VM's disk even though the lease domain is not (ONLY ISCSI)
Product: [oVirt] ovirt-engine Reporter: Carlos Mestre González <cmestreg>
Component: BLL.StorageAssignee: Tal Nisan <tnisan>
Status: CLOSED DUPLICATE QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.0CC: amureini, bugs, nsoffer, ylavi
Target Milestone: ovirt-4.1.7Keywords: Automation
Target Release: ---Flags: rule-engine: ovirt-4.1+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-04 14:44:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs for both host none

Description Carlos Mestre González 2017-06-06 12:31:51 UTC
Created attachment 1285363 [details]
engine and vdsm logs for both host

Description of problem:
A VM with HA lease with disk and the lease on different domains, blocking connection to the storage domain with the disk (ISCSI) the lease is not there and the vm starts on a different hot.

Version-Release number of selected component (if applicable):
4.2.0-0.0.master.20170531203202.git1bf6667.el7.centos

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with a disk in a ISCSI domain
2. Add a lease to the vm on a different domain (I choose NFS or gluster)
3. Block connection from the engine to the host
4. block connection from the host to the iscsi domain that contains the disk

Actual results:
VM starts in a different host

Expected results:
VM stays in status non responsive and runs on original host since the lease should be there (is not).


Additional info:
Remember this only happens if the disk is in a ISCSI domain.

[root@storage-ge6-vdsm1 ~]# sanlock client status
daemon 10444fa8-2326-4413-b41c-e48dff34a379.storage-ge
p -1 helper
p -1 listener
p 28479 vm_0_TestCase17665_REST_IS_0611461298
p -1 status
s 29a99923-1bcf-4ad2-886e-4e464f027d59:1:/dev/29a99923-1bcf-4ad2-886e-4e464f027d59/ids:0
s 16b18aff-c13a-4018-99b8-4695354eb827:1:/dev/16b18aff-c13a-4018-99b8-4695354eb827/ids:0
s 4415cb08-f3db-4a3d-9f82-67e12c7f334a:1:/dev/4415cb08-f3db-4a3d-9f82-67e12c7f334a/ids:0
s c32d63c0-6292-427c-8825-9e0bf03b2c41:1:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__0/c32d63c0-6292-427c-8825-9e0bf03b2c41/dom_md/ids:0
s 6e29d8de-f7a9-4953-aaea-fe9347145cec:1:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__1/6e29d8de-f7a9-4953-aaea-fe9347145cec/dom_md/ids:0
s 5c67c7db-3ad7-4852-bb3e-a2459c0c897d:1:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__2/5c67c7db-3ad7-4852-bb3e-a2459c0c897d/dom_md/ids:0
s 8f421e75-99b9-424b-8f73-1593415a29b2:1:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__1/8f421e75-99b9-424b-8f73-1593415a29b2/dom_md/ids:0
s 9358bd2c-ed87-4d8e-b1bd-5a3af17a48d4:1:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__2/9358bd2c-ed87-4d8e-b1bd-5a3af17a48d4/dom_md/ids:0
s b15e9ed8-f597-484f-a158-57518541b9cd:1:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__0/b15e9ed8-f597-484f-a158-57518541b9cd/dom_md/ids:0
r c32d63c0-6292-427c-8825-9e0bf03b2c41:7bddbea0-d717-4ea9-b547-f1ea69a34973:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__0/c32d63c0-6292-427c-8825-9e0bf03b2c41/dom_md/xleases:3145728:1 p 28479

after blocking the connection to iscsi domains the lease is gone:

[root@storage-ge6-vdsm1 ~]# sanlock client status
daemon 10444fa8-2326-4413-b41c-e48dff34a379.storage-ge
p -1 helper
p -1 listener
p 28479 vm_0_TestCase17665_REST_IS_0611461298
p -1 status
s c32d63c0-6292-427c-8825-9e0bf03b2c41:1:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__0/c32d63c0-6292-427c-8825-9e0bf03b2c41/dom_md/ids:0
s 6e29d8de-f7a9-4953-aaea-fe9347145cec:1:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__1/6e29d8de-f7a9-4953-aaea-fe9347145cec/dom_md/ids:0
s 5c67c7db-3ad7-4852-bb3e-a2459c0c897d:1:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__2/5c67c7db-3ad7-4852-bb3e-a2459c0c897d/dom_md/ids:0
s 8f421e75-99b9-424b-8f73-1593415a29b2:1:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__1/8f421e75-99b9-424b-8f73-1593415a29b2/dom_md/ids:0
s 9358bd2c-ed87-4d8e-b1bd-5a3af17a48d4:1:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__2/9358bd2c-ed87-4d8e-b1bd-5a3af17a48d4/dom_md/ids:0
s b15e9ed8-f597-484f-a158-57518541b9cd:1:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__0/b15e9ed8-f597-484f-a158-57518541b9cd/dom_md/ids:0


and then the vm starts in another host:

[root@storage-ge6-vdsm2 ~]# sanlock client status
daemon 32bbd0f7-886c-4d71-8a74-9b22c1406c11.storage-ge
p -1 helper
p -1 listener
p 8589 vm_0_TestCase17665_REST_IS_0611461298
p -1 status
s c32d63c0-6292-427c-8825-9e0bf03b2c41:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__0/c32d63c0-6292-427c-8825-9e0bf03b2c41/dom_md/ids:0
s 8f421e75-99b9-424b-8f73-1593415a29b2:2:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__1/8f421e75-99b9-424b-8f73-1593415a29b2/dom_md/ids:0
s 5c67c7db-3ad7-4852-bb3e-a2459c0c897d:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__2/5c67c7db-3ad7-4852-bb3e-a2459c0c897d/dom_md/ids:0
s 6e29d8de-f7a9-4953-aaea-fe9347145cec:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__1/6e29d8de-f7a9-4953-aaea-fe9347145cec/dom_md/ids:0
s 4415cb08-f3db-4a3d-9f82-67e12c7f334a:2:/dev/4415cb08-f3db-4a3d-9f82-67e12c7f334a/ids:0
s 29a99923-1bcf-4ad2-886e-4e464f027d59:2:/dev/29a99923-1bcf-4ad2-886e-4e464f027d59/ids:0
s 16b18aff-c13a-4018-99b8-4695354eb827:2:/dev/16b18aff-c13a-4018-99b8-4695354eb827/ids:0
s 9358bd2c-ed87-4d8e-b1bd-5a3af17a48d4:2:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__2/9358bd2c-ed87-4d8e-b1bd-5a3af17a48d4/dom_md/ids:0
s b15e9ed8-f597-484f-a158-57518541b9cd:2:/rhev/data-center/mnt/glusterSD/gluster-server01.qa.lab.tlv.redhat.com\:_storage__local__ge6__volume__0/b15e9ed8-f597-484f-a158-57518541b9cd/dom_md/ids:0
r c32d63c0-6292-427c-8825-9e0bf03b2c41:7bddbea0-d717-4ea9-b547-f1ea69a34973:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Storage__NFS_storage__local__ge6__nfs__0/c32d63c0-6292-427c-8825-9e0bf03b2c41/dom_md/xleases:3145728:2 p 8589


host_mixed_1 (vdsm_host1_17665.log) is the original host the vm is running, host_mixed_2 (vdsm_host2_17665.log) is the one the vm migrates to.

Comment 2 Nir Soffer 2017-09-04 14:44:48 UTC
This looks like a result of libvirt releasing the lease when a vm is paused:

1. Access to storage is blocked
2. Vm try to write to storage
3. Qemu gets an I/O error and pause the vm
4. libvirt release the lease (expected behavior)
5. engine starts the vm on another host (expected behavior)

This is basically a duplicate of bug 1467893.

*** This bug has been marked as a duplicate of bug 1467893 ***