Bug 1502768

Summary: HA VM (lease) power off fails while the VM is Paused
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: CoreAssignee: Dan Kenigsberg <danken>
Status: CLOSED DUPLICATE QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.20.3CC: amureini, bugs, fromani, tnisan
Target Milestone: ovirt-4.2.2Keywords: Automation
Target Release: ---Flags: rule-engine: ovirt-4.2+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-08 10:10:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs from engine and hypervisor none

Description Elad 2017-10-16 16:04:01 UTC
Created attachment 1339322 [details]
logs from engine and hypervisor

Description of problem:
HA VM power off, with lease on NFS storage, fails on the following exception after it has been paused after disconnection and connection restore between the host and the storage where the lease and the VM disk reside.

2017-10-16 14:44:31,471+0300 ERROR (jsonrpc/7) [api] FINISH destroy error=VM '12f923ff-7a3e-480a-b67c-dbc559770ddb' was not defined yet or was undefined (api:127)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 312, in destroy
    res = self.vm.destroy(gracefulAttempts)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4944, in destroy
    self._deleteVm()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4933, in _deleteVm
    self._undefine_domain()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2259, in _undefine_domain
    self._dom.undefine()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__
    % self.vmid)
NotConnectedError: VM '12f923ff-7a3e-480a-b67c-dbc559770ddb' was not defined yet or was undefined
2017-10-16 14:44:31,478+0300 INFO  (jsonrpc/7) [api.virt] FINISH destroy return={'status': {'message': 'General Exception: ("VM \'12f923ff-7a3e-480a-b67c-dbc559770ddb\' was not defined yet 
or was undefined",)', 'code': 100}} from=::ffff:10.35.161.182,35806, flow_id=cf15a7af-c06d-4d7b-a007-3e68cd57b4a6 (api:52)



Version-Release number of selected component (if applicable):
vdsm-4.20.3-175.git76c0aff.el7.centos.x86_64
libvirt-daemon-3.2.0-14.el7_4.3.x86_64
sanlock-3.5.0-1.el7.x86_64
ovirt-engine-4.2.0-0.0.master.20171013142622.git15e767c.el7.centos.noarch

How reproducible:
Always on RHV automation test case https://polarion.engineering.redhat.com/polarion/redirect/project/RHEVM3/workitem?id=RHEVM-17621

Steps to Reproduce:
1. Create new HA VM with storage lease
2. Start the VM
3. Block connection from all hosts in the DC to the storage domain.
4. Block connection from engine to the host -> VM will become UNKNOWN and
won't failover to another host
5. Power off the VM 

Actual results:
Power off VM fails on the mentioned exception

Expected results:
VM power off should succeed

Additional info:

engine.log:

b4a6] Command 'DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='3ebf2639-0e23-4d8f-85da-cdb4d427d30d', vmId='12f923ff-7a3e-480a-b67c-dbc559770ddb', secondsT
oWait='0', gracefully='false', reason='', ignoreNoVm='false'})' execution failed: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = General Exception: ("VM '12f923ff-7a3
e-480a-b67c-dbc559770ddb' was not defined yet or was undefined",), code = 100


libvirtd.log from around the same time:

2017-10-16 11:44:22.789+0000: 1476: error : qemuOpenFileAs:3176 : Failed to open file '/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__2/e552
5ed6-0970-49da-b022-5d9c342d928b/images/97bfb9f6-850a-4bc7-97b6-2ed78e88a54d/c5c20431-d79d-4b95-b8b2-6685211716bb': No such file or directory
2017-10-16 11:44:22.789+0000: 1476: error : qemuDomainStorageOpenStat:11478 : cannot stat file '/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nf
s__2/e5525ed6-0970-49da-b022-5d9c342d928b/images/97bfb9f6-850a-4bc7-97b6-2ed78e88a54d/c5c20431-d79d-4b95-b8b2-6685211716bb': Bad file descriptor

Comment 1 Nir Soffer 2018-01-07 17:10:44 UTC
Powering of vms is not a storage flow, moving to virt.

Francesco, can you take a look?

Comment 2 Francesco Romani 2018-01-08 10:10:01 UTC
Should be fixed by https://bugzilla.redhat.com/show_bug.cgi?id=1524119, more specifically by commit 764aa7a15d576652388c0d98639b3d8b8ec9005c

*** This bug has been marked as a duplicate of bug 1524119 ***