Created attachment 790099 [details] logs Description of problem: After vdsm crashed during syncImageData phase of live storage migration, vdsm is trying to remove the volume from the destination SD. vdsm fails to do so due to a failure in lvremove. it happens because the LV remains open. Version-Release number of selected component (if applicable): vdsm-4.12.0-72.git287bb7e.el6ev.x86_64 sanlock-2.8-1.el6.x86_64 libvirt-0.10.2-18.el6_4.9.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64 rhevm-3.3.0-0.16.master.el6ev.noarch How reproducible: unknown Steps to Reproduce: On a block pool with more than 1 SD: - create and run a vm - live migrate vm's disk - restart vdsm during syncImageData task on spm - after vdsm comes up, it tries to perform roll-back and remove the leftover volumes from the destination SD Actual results: vdsm fails in lvremove: Thread-138::ERROR::2013-08-21 15:30:51,464::task::850::TaskManager.Task::(_setError) Task=`40fe8a7f-0e57-4bd3-a180-3543ec5bb79f`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 1546, in deleteImage dom.deleteImage(sdUUID, imgUUID, volsByImg) File "/usr/share/vdsm/storage/blockSD.py", line 978, in deleteImage deleteVolumes(sdUUID, toDel) File "/usr/share/vdsm/storage/blockSD.py", line 189, in deleteVolumes lvm.removeLVs(sdUUID, vols) File "/usr/share/vdsm/storage/lvm.py", line 1073, in removeLVs raise se.CannotRemoveLogicalVolume(vgName, str(lvNames)) CannotRemoveLogicalVolume: Cannot remove Logical Volume: ('4700d87c-0616-4051-880e-6f3d15357bc4', "{'114f06ed-8288-4955-955f-d4708b55c885': ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], parent='a3befb04-1a36-4f16-a9da-3a1d691b6833'), 'a3befb04-1a36-4f16-a9da-3a1d691b6833': ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], parent='00000000-0000-0000-0000-000000000000'), '7ca6a8c5-16d1-4ea9-b184-4a0003865929': ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], parent='114f06ed-8288-4955-955f-d4708b55c885')}") Expected results: vdsm should close the open LV after the failure in LSM Additional info: logs
(In reply to Elad from comment #0) > Actual results: > vdsm fails in lvremove: > > Thread-138::ERROR::2013-08-21 > 15:30:51,464::task::850::TaskManager.Task::(_setError) > Task=`40fe8a7f-0e57-4bd3-a180-3543ec5bb79f`::Unexpected error > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 857, in _run > return fn(*args, **kargs) > File "/usr/share/vdsm/logUtils.py", line 45, in wrapper > res = f(*args, **kwargs) > File "/usr/share/vdsm/storage/hsm.py", line 1546, in deleteImage > dom.deleteImage(sdUUID, imgUUID, volsByImg) > File "/usr/share/vdsm/storage/blockSD.py", line 978, in deleteImage > deleteVolumes(sdUUID, toDel) > File "/usr/share/vdsm/storage/blockSD.py", line 189, in deleteVolumes > lvm.removeLVs(sdUUID, vols) > File "/usr/share/vdsm/storage/lvm.py", line 1073, in removeLVs > raise se.CannotRemoveLogicalVolume(vgName, str(lvNames)) > CannotRemoveLogicalVolume: Cannot remove Logical Volume: > ('4700d87c-0616-4051-880e-6f3d15357bc4', > "{'114f06ed-8288-4955-955f-d4708b55c885': > ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], > parent='a3befb04-1a36-4f16-a9da-3a1d691b6833'), > 'a3befb04-1a36-4f16-a9da-3a1d691b6833': > ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], > parent='00000000-0000-0000-0000-000000000000'), > '7ca6a8c5-16d1-4ea9-b184-4a0003865929': > ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], > parent='114f06ed-8288-4955-955f-d4708b55c885')}") > > > Expected results: > vdsm should close the open LV after the failure in LSM deleteImage is not the rollback for syncImageData, in fact the request comes from the engine: Thread-138::DEBUG::2013-08-21 15:30:41,500::BindingXMLRPC::177::vds::(wrapper) client [10.35.161.52] flowID [72a210fa] Thread-138::DEBUG::2013-08-21 15:30:41,501::task::579::TaskManager.Task::(_updateState) Task=`40fe8a7f-0e57-4bd3-a180-3543ec5bb79f`::moving from state init -> state preparing Thread-138::INFO::2013-08-21 15:30:41,501::logUtils::44::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='4700d87c-0616-4051-880e-6f3d15357bc4', spUUID='e639b114-5960-43e2-b85e-9c1444d3af63', imgUUID='f74dd0fe-150f-4037-8992-71d7ea0ac982', postZero='false', force='false') ... Thread-138::ERROR::2013-08-21 15:30:51,464::task::850::TaskManager.Task::(_setError) Task=`40fe8a7f-0e57-4bd3-a180-3543ec5bb79f`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 1546, in deleteImage dom.deleteImage(sdUUID, imgUUID, volsByImg) File "/usr/share/vdsm/storage/blockSD.py", line 978, in deleteImage deleteVolumes(sdUUID, toDel) File "/usr/share/vdsm/storage/blockSD.py", line 189, in deleteVolumes lvm.removeLVs(sdUUID, vols) File "/usr/share/vdsm/storage/lvm.py", line 1073, in removeLVs raise se.CannotRemoveLogicalVolume(vgName, str(lvNames)) CannotRemoveLogicalVolume: Cannot remove Logical Volume: ('4700d87c-0616-4051-880e-6f3d15357bc4', "{'114f06ed-8288-4955-955f-d4708b55c885': ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], ... I have the feeling that engine didn't send the vmReplicateDiskFinish command to the VM before trying to remove the new image. Daniel can you take a look? If the problem instead is related to a running dd process (syncData) then I think this might be dup of: https://bugzilla.redhat.com/show_bug.cgi?id=713434 Anyway the fix would be the same: http://gerrit.ovirt.org/#/c/18074/
Looks like the engine log has been rotated out, I can find any reference to the LSM commands.
(In reply to Federico Simoncelli from comment #1) > (In reply to Elad from comment #0) > > Actual results: > > vdsm fails in lvremove: > > > > Thread-138::ERROR::2013-08-21 > > 15:30:51,464::task::850::TaskManager.Task::(_setError) > > Task=`40fe8a7f-0e57-4bd3-a180-3543ec5bb79f`::Unexpected error > > Traceback (most recent call last): > > File "/usr/share/vdsm/storage/task.py", line 857, in _run > > return fn(*args, **kargs) > > File "/usr/share/vdsm/logUtils.py", line 45, in wrapper > > res = f(*args, **kwargs) > > File "/usr/share/vdsm/storage/hsm.py", line 1546, in deleteImage > > dom.deleteImage(sdUUID, imgUUID, volsByImg) > > File "/usr/share/vdsm/storage/blockSD.py", line 978, in deleteImage > > deleteVolumes(sdUUID, toDel) > > File "/usr/share/vdsm/storage/blockSD.py", line 189, in deleteVolumes > > lvm.removeLVs(sdUUID, vols) > > File "/usr/share/vdsm/storage/lvm.py", line 1073, in removeLVs > > raise se.CannotRemoveLogicalVolume(vgName, str(lvNames)) > > CannotRemoveLogicalVolume: Cannot remove Logical Volume: > > ('4700d87c-0616-4051-880e-6f3d15357bc4', > > "{'114f06ed-8288-4955-955f-d4708b55c885': > > ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], > > parent='a3befb04-1a36-4f16-a9da-3a1d691b6833'), > > 'a3befb04-1a36-4f16-a9da-3a1d691b6833': > > ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], > > parent='00000000-0000-0000-0000-000000000000'), > > '7ca6a8c5-16d1-4ea9-b184-4a0003865929': > > ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], > > parent='114f06ed-8288-4955-955f-d4708b55c885')}") > > > > > > Expected results: > > vdsm should close the open LV after the failure in LSM > > deleteImage is not the rollback for syncImageData, in fact the request comes > from the engine: > > Thread-138::DEBUG::2013-08-21 > 15:30:41,500::BindingXMLRPC::177::vds::(wrapper) client [10.35.161.52] > flowID [72a210fa] > Thread-138::DEBUG::2013-08-21 > 15:30:41,501::task::579::TaskManager.Task::(_updateState) > Task=`40fe8a7f-0e57-4bd3-a180-3543ec5bb79f`::moving from state init -> state > preparing > Thread-138::INFO::2013-08-21 > 15:30:41,501::logUtils::44::dispatcher::(wrapper) Run and protect: > deleteImage(sdUUID='4700d87c-0616-4051-880e-6f3d15357bc4', > spUUID='e639b114-5960-43e2-b85e-9c1444d3af63', > imgUUID='f74dd0fe-150f-4037-8992-71d7ea0ac982', postZero='false', > force='false') > ... > Thread-138::ERROR::2013-08-21 > 15:30:51,464::task::850::TaskManager.Task::(_setError) > Task=`40fe8a7f-0e57-4bd3-a180-3543ec5bb79f`::Unexpected error > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 857, in _run > return fn(*args, **kargs) > File "/usr/share/vdsm/logUtils.py", line 45, in wrapper > res = f(*args, **kwargs) > File "/usr/share/vdsm/storage/hsm.py", line 1546, in deleteImage > dom.deleteImage(sdUUID, imgUUID, volsByImg) > File "/usr/share/vdsm/storage/blockSD.py", line 978, in deleteImage > deleteVolumes(sdUUID, toDel) > File "/usr/share/vdsm/storage/blockSD.py", line 189, in deleteVolumes > lvm.removeLVs(sdUUID, vols) > File "/usr/share/vdsm/storage/lvm.py", line 1073, in removeLVs > raise se.CannotRemoveLogicalVolume(vgName, str(lvNames)) > CannotRemoveLogicalVolume: Cannot remove Logical Volume: > ('4700d87c-0616-4051-880e-6f3d15357bc4', > "{'114f06ed-8288-4955-955f-d4708b55c885': > ImgsPar(imgs=['f74dd0fe-150f-4037-8992-71d7ea0ac982'], ... > > > I have the feeling that engine didn't send the vmReplicateDiskFinish command > to the VM before trying to remove the new image. > Daniel can you take a look? According to the engine log VmReplicateDiskFinish has been send: 2013-08-21 15:30:42,085 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (pool-5-thread-47) [72a210fa] FINISH, VmReplicateDiskFinishVDSCommand, log id: 645a2707 > > If the problem instead is related to a running dd process (syncData) then I > think this might be dup of: > > https://bugzilla.redhat.com/show_bug.cgi?id=713434 > > Anyway the fix would be the same: > > http://gerrit.ovirt.org/#/c/18074/
Please try to reproduce when: http://gerrit.ovirt.org/#/c/18074/ has reached the build you're testing.
Let's keep in mind that this could be a duplicate of bug 961400
Removing of volumes from dst domain ends successfully after failure in live storage migration during syncImageData. Checked on is15: vdsm-4.12.0-138.gitab256be.el6ev.x86_64 rhevm-3.3.0-0.21.master.el6ev.noarch
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html