Bug 1108577

Summary: Live storage migration Fails and throws 'Vdc Bll exception' when live migrating disk while attempting to power off the vm
Product: Red Hat Enterprise Virtualization Manager Reporter: Ori Gofen <ogofen>
Component: ovirt-engineAssignee: Daniel Erez <derez>
Status: CLOSED DUPLICATE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: acanan, acathrow, amureini, derez, gklein, iheim, lpeer, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-17 14:01:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm+engine logs none

Description Ori Gofen 2014-06-12 09:27:08 UTC
Created attachment 908020 [details]
vdsm+engine logs

Description of problem:

The operations of Live disk migration and while migrating attempting to power off the vm from the webadmin,causes engine failure.

The webadmin first reports:
"Failed to complete snapshot 'Auto-generated for Live Storage Migration' creation for VM 'vm3'."

engine throws a long bll exception starting with:

2014-06-11 19:55:29,530 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-4-thread-20) [7f933e75] Command org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand throw Vdc Bll exception. With error message VdcBLLException: VM de11bca4-9ebc-40d2-901d-638801fd7102 is not running on any VDS (Failed with error down and code 6)
2014-06-11 19:55:29,531 ERROR [org.ovirt.engine.core.bll.lsm.VmReplicateDiskStartTaskHandler] (org.ovirt.thread.pool-4-thread-20) [7f933e75] VM de11bca4-9ebc-40d2-901d-638801fd7102 is not running on any VDS, skipping VmReplicateDiskFinish
2014-06-11 19:55:29,531 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-4-thread-20) [7f933e75] Reverting task deleteImage, handler: org.ovirt.engine.core.bll.lsm.CreateImagePlaceholderTaskHandler
2014-06-11 19:55:29,540 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (org.ovirt.thread.pool-4-thread-20) [7f933e75] START, DeleteImageGroupVDSCommand( storagePoolId = 9b465397-c43c-4533-bb9c-d7f36d2d4a8d, ignoreFailoverLimit = false, storageDomainId = a26e6a36-b4c0-445e-9233-10ebad5dc372, imageGroupId = dde86622-1f21-4e98-93f4-8527bd3688e7, postZeros = false, forceDelete = false), log id: 3c1895dc

After the procedure, image files/lv's remain on host:

nfs-> after 4 sessions:
                                                                           
23c5dd17-223d-4495-865a-970059f4b037
23c5dd17-223d-4495-865a-970059f4b037.lease
23c5dd17-223d-4495-865a-970059f4b037.meta
3ca44069-e909-4c64-972b-3c43f722ea52
3ca44069-e909-4c64-972b-3c43f722ea52.lease
3ca44069-e909-4c64-972b-3c43f722ea52.meta
3df57134-3e8b-44ae-9176-92e9019ca900
3df57134-3e8b-44ae-9176-92e9019ca900.lease
3df57134-3e8b-44ae-9176-92e9019ca900.meta
4ab3fa1e-97fe-4e1d-ba6a-72d5ac989fdb
4ab3fa1e-97fe-4e1d-ba6a-72d5ac989fdb.lease
4ab3fa1e-97fe-4e1d-ba6a-72d5ac989fdb.meta
6169e36d-9675-41bf-a5ab-531d9daaf5e9
6169e36d-9675-41bf-a5ab-531d9daaf5e9.lease
6169e36d-9675-41bf-a5ab-531d9daaf5e9.meta

from psql tables:

engine=# SELECT image_guid,creation_date,size,volume_format FROM images;
              image_guid              |     creation_date      |    size     | volume_format 
--------------------------------------+------------------------+-------------+---------------
 00000000-0000-0000-0000-000000000000 | 2008-04-01 00:00:00+03 | 85899345920 |             4
 9f0599f9-dc38-42ae-b6ef-bc75bfdc1696 | 2014-06-11 18:57:51+03 |  3221225472 |             5
 23c5dd17-223d-4495-865a-970059f4b037 | 2014-06-12 11:30:52+03 |  3221225472 |             4
 efbd0d33-c555-45da-9080-0212df0b59b0 | 2014-06-11 18:55:50+03 |  3221225472 |             5
 520bd075-b52b-454b-a9f7-e677bdc8c29d | 2014-06-11 19:44:29+03 |  3221225472 |             4
 3df57134-3e8b-44ae-9176-92e9019ca900 | 2014-06-11 19:53:48+03 |  3221225472 |             5
 4ab3fa1e-97fe-4e1d-ba6a-72d5ac989fdb | 2014-06-11 19:55:12+03 |  3221225472 |             4
 6169e36d-9675-41bf-a5ab-531d9daaf5e9 | 2014-06-12 11:43:13+03 |  3221225472 |             4
 3ca44069-e909-4c64-972b-3c43f722ea52 | 2014-06-12 11:45:21+03 |  3221225472 |             4
(9 rows)


iscsi->after one session (volume that created during live migration fail, is not cleared)

82d24343-2f4c-4610-8254-25cfba98cce8 aa1c7b01-6c71-4702-8a24-d0d696d193ae -wi-------   1.00g
c717cf54-ba73-4b33-a4c5-453790f8644d aa1c7b01-6c71-4702-8a24-d0d696d193ae -wi-------   1.00g


 

Version-Release number of selected component (if applicable):
vdsm-4.14.7-3.el6ev.x86_64
rhevm-3.4.0-0.21.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
Setup:have 4 domains(2 of each,iscsi,nfs)
1.create vm+disk
2.live migrate it
3.power off the vm during operation
4.repeat stages 1-3 with the second domain type

Actual results:
operation fails on bll exception,file redundancy on nfs and unused or needed lv's on iscsi

Expected results:
powering off the vm should not be possible during migration or operation should succeed.

Additional info:

Comment 1 Allon Mureinik 2014-06-12 12:33:02 UTC
Daniel, didn't we solve this once?

Comment 2 Daniel Erez 2014-06-17 14:01:25 UTC
(In reply to Allon Mureinik from comment #1)
> Daniel, didn't we solve this once?

Similar to the issue in bug 1034856 (see comment #8).

*** This bug has been marked as a duplicate of bug 1034856 ***