Bug 893955

Summary: 3.2 - vdsm: cannot remove a vm after failed live migration
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Yeela Kaplan <ykaplan>
Status: CLOSED DUPLICATE QA Contact: Haim <hateya>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, amureini, bazulay, cpelland, hateya, iheim, lpeer, oourfali, scohen, yeylon, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-19 17:53:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-01-10 11:02:13 UTC
Created attachment 676240 [details]
logs

Description of problem:

I failed live migration disk because of a libvirt bug: 
https://bugzilla.redhat.com/show_bug.cgi?id=888426

After I shut down the vm and tried to delete it and we are still failing the delete in vdsm. 

Version-Release number of selected component (if applicable):

libvirt-0.10.2-13.el6.x86_64
vdsm-4.10.2-3.0.el6ev.x86_64

How reproducible:

100%

Steps to Reproduce:
1. run a vm 
2. move the disk
3. shut down the vm and try to remove it 
  
Actual results:

we cannot remove the vm

Expected results:

we should be able to remove the vm regardless of any failure. 

Additional info: logs


move fails because of libvirt bug: 

Thread-361::ERROR::2013-01-10 12:50:08,204::libvirtvm::2333::vm.Vm::(diskReplicateFinish) vmId=`aa56468e-2b64-49bc-8236-c84674b1be15`::Unable to stop the replication for the drive: vda
Traceback (most recent call last):
  File "/usr/share/vdsm/libvirtvm.py", line 2330, in diskReplicateFinish
    self._dom.blockJobAbort(srcDrive.name, blockJobFlags)
  File "/usr/share/vdsm/libvirtvm.py", line 524, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 512, in blockJobAbort
    if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
libvirtError: Requested operation is not valid: pivot of disk 'vda' requires an active copy job
Thread-361::DEBUG::2013-01-10 12:50:08,210::task::568::TaskManager.Task::(_updateState) Task=`c22ea79b-55d5-45ee-8f68-ad33febd50cc`::moving from state init -> state preparing
Thread-361::INFO::2013-01-10 12:50:08,211::logUtils::37::dispatcher::(wrapper) Run and protect: teardownImag

vdsm fails to deactivate logical volume: 

Thread-361::ERROR::2013-01-10 12:50:13,787::task::833::TaskManager.Task::(_setError) Task=`c22ea79b-55d5-45ee-8f68-ad33febd50cc`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3068, in teardownImage
    img.teardown(sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/image.py", line 383, in teardown
    volUUIDs=[vol.volUUID for vol in chain])
  File "/usr/share/vdsm/storage/blockSD.py", line 979, in deactivateVolumes
    lvm.deactivateLVs(self.sdUUID, volUUIDs)
  File "/usr/share/vdsm/storage/lvm.py", line 1106, in deactivateLVs
    _setLVAvailability(vgName, toDeactivate, "n")
  File "/usr/share/vdsm/storage/lvm.py", line 769, in _setLVAvailability
    raise error(str(e))
CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\'  /dev/mapper/1Dafna_test1357720: read failed after 0 of 4096 at 32212189184: Input/output error\', \'  /dev/mapper/1Dafna_test13577

vdsm fails to remove logical volume: 

Thread-370::ERROR::2013-01-10 12:50:24,910::task::833::TaskManager.Task::(_setError) Task=`ded414f4-26ce-45df-8762-b3eadc2173bd`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1455, in deleteImage
    dom.deleteImage(sdUUID, imgUUID, volsByImg)
  File "/usr/share/vdsm/storage/blockSD.py", line 951, in deleteImage
    deleteVolumes(sdUUID, toDel)
  File "/usr/share/vdsm/storage/blockSD.py", line 177, in deleteVolumes
    lvm.removeLVs(sdUUID, vols)
  File "/usr/share/vdsm/storage/lvm.py", line 1058, in removeLVs
    raise se.CannotRemoveLogicalVolume(vgName, str(lvNames))
CannotRemoveLogicalVolume: Cannot remove Logical Volume: ('4230e713-9cc4-40ba-9c96-876ccca30a9d', "('c216c340-7497-414b-a648-0b2f5905a710', '2d79705d-94a1-41fb-aa1c-558b3d7eb72d')")


delete imgage fails with: 

Thread-380::ERROR::2013-01-10 12:50:26,121::task::833::TaskManager.Task::(_setError) Task=`2618c7ee-781c-4458-ae4b-cedfe464f2e1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1424, in deleteImage
    allVols = dom.getAllVolumes()
  File "/usr/share/vdsm/storage/blockSD.py", line 966, in getAllVolumes
    return getAllVolumes(self.sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 169, in getAllVolumes
    and vImg not in res[vPar]['imgs']:
KeyError: 'dd1dce99-8f4a-4763-835a-7d37720295a9'
Thread-380::DEBUG::2013-01-10 12:50:26,123::task::852::TaskManager.Task::(_run) Task=`2618c7ee-781c-4458-ae4b-cedfe464f2e1`::Task._run: 2618c7ee-781c-4458-ae4b-cedfe464f2e1 ('a45fadeb-2db0-4201-9635-0c7ed6c1fd22', 'afcde1c5-6022-4077-ab06-2beed7e5e404', '1425e5a4-df84-4bd3-a79a-19d656b5e63d', 'false', 'false') {} failed - stopping task

after vm shutdown we are still failing on delete image: 

Thread-407::ERROR::2013-01-10 12:50:54,493::task::833::TaskManager.Task::(_setError) Task=`34c0417f-9453-4786-9f21-74f7c8db618e`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1424, in deleteImage
    allVols = dom.getAllVolumes()
  File "/usr/share/vdsm/storage/blockSD.py", line 966, in getAllVolumes
    return getAllVolumes(self.sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 169, in getAllVolumes
    and vImg not in res[vPar]['imgs']:
KeyError: 'dd1dce99-8f4a-4763-835a-7d37720295a9'
Thread-407::DEBUG::2013-01-10 12:50:54,495::task::852::TaskManager.Task::(_run) Task=`34c0417f-9453-4786-9f21-74f7c8db618e`::Task._run: 34c0417f-9453-4786-9f21-74f7c8db618e ('a45fadeb-2db0-4201-9635-0c7ed6c1fd22', 'afcde1c5-6022-4077-ab06-2beed7e5e404', '1425e5a4-df84-4bd3-a79a-19d656b5e63d', 'false', 'false') {} failed - stopping task


Thread-407::ERROR::2013-01-10 12:50:54,501::dispatcher::69::Storage.Dispatcher.Protect::(run) 'dd1dce99-8f4a-4763-835a-7d37720295a9'
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
    result = ctask.prepare(self.func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 1142, in prepare
    raise self.error
KeyError: 'dd1dce99-8f4a-4763-835a-7d37720295a9'
Thread-409::DEBUG::2013-01-10 12:50:55,844::BindingXMLRPC::161::vds::(wrapper) [10.35.97.65]

Comment 2 Dafna Ron 2013-01-10 17:18:16 UTC
I upgraded libvirt to libvirt-0.10.2-14.el6.x86_64 so I can continue with the testing. 

after the migration finished I tried to delete the snapshot and I am getting the same error: 

50c9c655-7a90-4704-8854-32e2b2828d25::ERROR::2013-01-10 19:08:48,976::task::833::TaskManager.Task::(_setError) Task=`50c9c655-7a90-4704-8854-32e2b2828d25`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 307, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1849, in mergeSnapshots
    image.Image(repoPath).merge(sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
  File "/usr/share/vdsm/storage/image.py", line 1133, in merge
    allVols = sdDom.getAllVolumes()
  File "/usr/share/vdsm/storage/blockSD.py", line 966, in getAllVolumes
    return getAllVolumes(self.sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 169, in getAllVolumes
    and vImg not in res[vPar]['imgs']:
KeyError: 'dd1dce99-8f4a-4763-835a-7d37720295a9'

Comment 3 Dafna Ron 2013-01-20 11:19:37 UTC
I seem to be hitting this bug for different scenarios. 
moving to urgent since one live migrate failed I cannot remove the vm. 
I have low disk space on one of my domains and I cannot run any of the vm or clear space on domain -> only action is to force remove the domain.

Comment 6 Yeela Kaplan 2013-02-19 17:53:39 UTC
The first scenario is eliminated after upgrading libvirt to ver 14.
The scenario described in comment #2 is a dup of: 
https://bugzilla.redhat.com/show_bug.cgi?id=910013

Closing as dup.

*** This bug has been marked as a duplicate of bug 910013 ***