Created attachment 872424 [details] logs from engine and vdsm and screenshot Description of problem: In a situation which engine crashes during DownloadImage - importing a glance image (with 'Import as template' = true), engine don't rolls back the action and order vdsm to delete the leftover image. Instead, after vdsm finishes successfully with downloadImage task, the disk remains as 'LOCKED' in the DB. Version-Release number of selected component (if applicable): rhevm-3.4.0-0.3.master.el6ev.noarch vdsm-4.14.2-0.2.el6ev.x86_64 How reproducible: Need engine to crash during DownloadImage Steps to Reproduce: On a shared DC with storage domains attached and integrated glance repository with images: 1. Import an image from glance repository with 'import as template' = true 2. Restart ovirt-engine service during DownloadImage Actual results: DownloadImage starts on engine, the task is being sent to vdsm. Right after that, I restarted engine. 2014-03-09 17:10:51,965 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DownloadImageVDSCommand] (org.ovirt.thread.pool-4-thread-48) START, DownloadImageVDSCommand( storagePool Id = d3d5c88e-075a-47cf-afc2-549964721c55, ignoreFailoverLimit = false, storageDomainId = 746e7ff7-dc76-4e15-b006-0b6ef42e8317, imageGroupId = dce62458-3790-4703-a873-985ecfaa8e a0, imageId = 00000000-0000-0000-0000-000000000000), log id: 3c3a384c 2014-03-09 17:10:51,965 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DownloadImageVDSCommand] (org.ovirt.thread.pool-4-thread-48) -- executeIrsBrokerCommand: calling 'downlo adImage' 2014-03-09 17:10:51,966 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DownloadImageVDSCommand] (org.ovirt.thread.pool-4-thread-48) -- downloadImage parameters: dstSpUUID=d3d5c88e-075a-47cf-afc2-549964721c55 dstSdUUID=746e7ff7-dc76-4e15-b006-0b6ef42e8317 dstImageGUID=dce62458-3790-4703-a873-985ecfaa8ea0 dstVolUUID=00000000-0000-0000-0000-000000000000 When engine recovers from the restart, the task is not reverted, the disk remains stuck in 'LOCKED' state, disk is unattached (although it was imported as template): imagestatus | image_group_id | vm_names -------------+--------------------------------------+------------------------ 2 | dce62458-3790-4703-a873-985ecfaa8ea0 Image exists on storage: [root@green-vdsc images]# tree dce62458-3790-4703-a873-985ecfaa8ea0 dce62458-3790-4703-a873-985ecfaa8ea0 |-- 8bed2ae7-a2f8-4036-b02b-b3c6d855ca22 |-- 8bed2ae7-a2f8-4036-b02b-b3c6d855ca22.lease `-- 8bed2ae7-a2f8-4036-b02b-b3c6d855ca22.meta Not sure if it's related to the fact that the image was imported as template. Expected results: If import image from glance fails do to engine crash, it should revert the DownloadImage task and delete the leftovers in storage. Additional info: logs from engine and vdsm and screenshot
Does it also happen when importing not as a template? It should, iiuc, as this part of the flow is identical.
I checked the scenario also with 'import as template = false', disk is removed from the system, so it doesn't reproduce in that case. Is it possible that the disk remains LOCKED because it was supposed to be wrapped with the template configuration and it didn't happen?
(In reply to Elad from comment #2) > I checked the scenario also with 'import as template = false', disk is > removed from the system, so it doesn't reproduce in that case. > > Is it possible that the disk remains LOCKED because it was supposed to be > wrapped with the template configuration and it didn't happen? Looking at the code, both scenarios should behave the same. Tried to reproduce it, and saw that once the engine was restarted, looking at the tasks you can see that it is still running (the tasks count was 0 for some reason, but expanding it showed the import task). Is it possible that you reported it as locked but it is still running? Also, when you check the host running the downloadImage command, do you see that the download is still working? (ps -ef | grep -i curl-img-wrap). Adding Federico to the CC list as well, as he might have some insight about that.
Also verified that in my environment it happens both when importing the image as a disk and when importing as a template. Allon - it seems like neither endSuccessfully nor endWithFailure are being called in such a case, although the task finishes successfully. Can it be related to the S.E.A.T. mechanism?
Did some more tests and indeed, disk remains LOCKED when importing image as disk as it happens when importing it as template. It seems that the restart to engine doesn't effect the task on vdsm (createVolume). The task keeps running until the import ends.
The task /should/ continue if the engine restarts. This is by design. What should not happen is the disk remaining locked in case of a failure. Daniel, can you please take a look and help out with the SEAT mechanism?
(In reply to Allon Mureinik from comment #6) > The task /should/ continue if the engine restarts. This is by design. > What should not happen is the disk remaining locked in case of a failure. > > Daniel, can you please take a look and help out with the SEAT mechanism? There is no failure. The task finishes correctly, but neither endSuccessfully nor endWithFailure is being called.
Since there is an easy workaround to this corner case scenario, this can be pushed to 3.5, in case this issue reproduces there's the unlocker.sh script that can unlock the template entity after engine restart
Tested the scenario described in comment #0. Disk stuck in 'LOCKED' state. Bug is not fixed, re-opening.
Ignore previous comment, after a failure in engine during downloadImage task in vdsm, disk is moved from LOCKED to OK. Verified using ovirt-engine-3.5.0-0.0.master.20140519181229.gitc6324d4.el6.noarch Against RHEV3.4 av9.2 where the issue is reproduced. There is another issue with importing an image from glance. Disk gets stuck in LOCKED state in case of an engine failure during createVolume phase, as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1101541
RHEV-M 3.5.0 has been released, closing this bug.