Bug 1074311 - [engine-backend] [external-provider] failure to import a glance image (as a template) leaves image in LOCKED state
Summary: [engine-backend] [external-provider] failure to import a glance image (as a t...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Daniel Erez
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks: rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2014-03-09 16:10 UTC by Elad
Modified: 2016-02-10 17:03 UTC (History)
12 users (show)

Fixed In Version: ovirt-engine-3.5.0_alpha1.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:
amureini: Triaged+


Attachments (Terms of Use)
logs from engine and vdsm and screenshot (2.22 MB, application/x-gzip)
2014-03-09 16:10 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 27495 0 master MERGED core: AsyncTaskType enum - added downloadImage Never

Description Elad 2014-03-09 16:10:20 UTC
Created attachment 872424 [details]
logs from engine and vdsm and screenshot

Description of problem:
In a situation which engine crashes during DownloadImage - importing a glance image (with 'Import as template' = true), engine don't rolls back the action and order vdsm to delete the leftover image. Instead, after vdsm finishes successfully with downloadImage task, the disk remains as 'LOCKED' in the DB.

Version-Release number of selected component (if applicable):
rhevm-3.4.0-0.3.master.el6ev.noarch
vdsm-4.14.2-0.2.el6ev.x86_64

How reproducible:
Need engine to crash during DownloadImage

Steps to Reproduce:
On a shared DC with storage domains attached and integrated glance repository with images:
1. Import an image from glance repository with 'import as template' = true
2. Restart ovirt-engine service during DownloadImage


Actual results:
DownloadImage starts on engine, the task is being sent to vdsm. Right after that, I restarted engine.

2014-03-09 17:10:51,965 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DownloadImageVDSCommand] (org.ovirt.thread.pool-4-thread-48) START, DownloadImageVDSCommand( storagePool
Id = d3d5c88e-075a-47cf-afc2-549964721c55, ignoreFailoverLimit = false, storageDomainId = 746e7ff7-dc76-4e15-b006-0b6ef42e8317, imageGroupId = dce62458-3790-4703-a873-985ecfaa8e
a0, imageId = 00000000-0000-0000-0000-000000000000), log id: 3c3a384c
2014-03-09 17:10:51,965 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DownloadImageVDSCommand] (org.ovirt.thread.pool-4-thread-48) -- executeIrsBrokerCommand: calling 'downlo
adImage'
2014-03-09 17:10:51,966 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DownloadImageVDSCommand] (org.ovirt.thread.pool-4-thread-48) -- downloadImage parameters:
                dstSpUUID=d3d5c88e-075a-47cf-afc2-549964721c55
                dstSdUUID=746e7ff7-dc76-4e15-b006-0b6ef42e8317
                dstImageGUID=dce62458-3790-4703-a873-985ecfaa8ea0
                dstVolUUID=00000000-0000-0000-0000-000000000000


When engine recovers from the restart, the task is not reverted, the disk remains stuck in 'LOCKED' state, disk is unattached (although it was imported as template):


 imagestatus |            image_group_id            |        vm_names
-------------+--------------------------------------+------------------------
           2 | dce62458-3790-4703-a873-985ecfaa8ea0





Image exists on storage:

[root@green-vdsc images]# tree  dce62458-3790-4703-a873-985ecfaa8ea0
dce62458-3790-4703-a873-985ecfaa8ea0
|-- 8bed2ae7-a2f8-4036-b02b-b3c6d855ca22
|-- 8bed2ae7-a2f8-4036-b02b-b3c6d855ca22.lease
`-- 8bed2ae7-a2f8-4036-b02b-b3c6d855ca22.meta


Not sure if it's related to the fact that the image was imported as template.

Expected results:
If import image from glance fails do to engine crash, it should revert the DownloadImage task and delete the leftovers in storage.

Additional info:
logs from engine and vdsm and screenshot

Comment 1 Oved Ourfali 2014-03-12 11:11:16 UTC
Does it also happen when importing not as a template?
It should, iiuc, as this part of the flow is identical.

Comment 2 Elad 2014-03-12 16:19:35 UTC
I checked the scenario also with 'import as template = false', disk is removed from the system, so it doesn't reproduce in that case.

Is it possible that the disk remains LOCKED because it was supposed to be wrapped with the template configuration and it didn't happen?

Comment 3 Oved Ourfali 2014-03-12 19:44:48 UTC
(In reply to Elad from comment #2)
> I checked the scenario also with 'import as template = false', disk is
> removed from the system, so it doesn't reproduce in that case.
> 
> Is it possible that the disk remains LOCKED because it was supposed to be
> wrapped with the template configuration and it didn't happen?

Looking at the code, both scenarios should behave the same.
Tried to reproduce it, and saw that once the engine was restarted, looking at the tasks you can see that it is still running (the tasks count was 0 for some reason, but expanding it showed the import task).
Is it possible that you reported it as locked but it is still running?
Also, when you check the host running the downloadImage command, do you see that the download is still working? (ps -ef | grep -i curl-img-wrap).

Adding Federico to the CC list as well, as he might have some insight about that.

Comment 4 Oved Ourfali 2014-03-12 20:30:04 UTC
Also verified that in my environment it happens both when importing the image as a disk and when importing as a template.

Allon - it seems like neither endSuccessfully nor endWithFailure are being called in such a case, although the task finishes successfully. Can it be related to the S.E.A.T. mechanism?

Comment 5 Elad 2014-03-13 09:06:25 UTC
Did some more tests and indeed, disk remains LOCKED when importing image as disk as it happens when importing it as template. It seems that the restart to engine doesn't effect the task on vdsm (createVolume). The task keeps running until the import ends.

Comment 6 Allon Mureinik 2014-03-13 11:22:10 UTC
The task /should/ continue if the engine restarts. This is by design.
What should not happen is the disk remaining locked in case of a failure.

Daniel, can you please take a look and help out with the SEAT mechanism?

Comment 7 Oved Ourfali 2014-03-13 11:30:52 UTC
(In reply to Allon Mureinik from comment #6)
> The task /should/ continue if the engine restarts. This is by design.
> What should not happen is the disk remaining locked in case of a failure.
> 
> Daniel, can you please take a look and help out with the SEAT mechanism?

There is no failure. The task finishes correctly, but neither endSuccessfully nor endWithFailure is being called.

Comment 8 Tal Nisan 2014-04-06 14:03:43 UTC
Since there is an easy workaround to this corner case scenario, this can be pushed to 3.5, in case this issue reproduces there's the unlocker.sh script that can unlock the template entity after engine restart

Comment 9 Elad 2014-05-27 10:27:14 UTC
Tested the scenario described in comment #0. Disk stuck in 'LOCKED' state.



Bug is not fixed, re-opening.

Comment 10 Elad 2014-05-27 14:04:01 UTC
Ignore previous comment, after a failure in engine during downloadImage task in vdsm, disk is moved from LOCKED to OK. 

Verified using ovirt-engine-3.5.0-0.0.master.20140519181229.gitc6324d4.el6.noarch
Against RHEV3.4 av9.2 where the issue is reproduced.

There is another issue with importing an image from glance. Disk gets stuck in LOCKED state in case of an engine failure during createVolume phase, as reported here:   https://bugzilla.redhat.com/show_bug.cgi?id=1101541

Comment 11 Allon Mureinik 2015-02-16 19:12:51 UTC
RHEV-M 3.5.0 has been released, closing this bug.

Comment 12 Allon Mureinik 2015-02-16 19:12:51 UTC
RHEV-M 3.5.0 has been released, closing this bug.


Note You need to log in before you can comment on or make changes to this bug.