Bug 1021230

Summary: [engine-backend] [external-provider] after a connectivity lost with storage, which started during the importing of image from glance, disk enters to status 'Illegal'
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED WONTFIX QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: acanan, acathrow, amureini, iheim, lpeer, Rhev-m-bugs, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.0Flags: amureini: Triaged+
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-30 16:01:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
logs none

Description Elad 2013-10-20 16:02:00 UTC
Created attachment 814237 [details]
logs

Description of problem:
When SPM gets its connection to master domain back, after it lost it during importing an image from glance, the disk enters to status 'Illegal'.  

Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.27.beta1.el6ev.noarch
vdsm-4.13.0-0.3.beta1.el6ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. add a glance images external provider to RHEVM (with images on it)
2. import an image to rhevm. During the import, block connectivity from all hosts to master domain


Actual results:
After reconstruct, the disk status changes from 'locked' to 'Illegal' (in my case, reconstruct didn't take place because of https://bugzilla.redhat.com/show_bug.cgi?id=1017177)

The image exists under /rhev/data-center:

[root@nott-vds1 images]# ll
total 12
drwxr-xr-x. 2 vdsm kvm 4096 Oct 20 17:55 9d34cb46-21cc-42be-8b64-574858c796ee


On DB:
su - postgres -c "psql -U postgres engine -c  'select storage_name , image_group_id , imagestatus from all_disks;'"  | less -S

   storage_name   |            image_group_id            | imagestatus
------------------+--------------------------------------+-------------
 iscsi1-1-xtremio | 9d7bcd53-7ce4-440a-9e14-7973d197d177 |           1
 iscsi1-1-xtremio | 9d34cb46-21cc-42be-8b64-574858c796ee |           4
 iscsi1-1-xtremio | 15c2f67e-1eb3-4cc5-9f3a-f40abed7fbc4 |           1




Expected results:
Disk should be removed from storage and DB when the connection to storage resumes

Additional info:
logs

Comment 1 Federico Simoncelli 2013-10-30 13:29:20 UTC
2013-10-20 18:01:26,976 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-62) SPMAsyncTask::PollTask: Polling task 299f45ce-15de-4a09-abe1-b39709ed2914 (Parent Command ImportRepoImage, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2013-10-20 18:01:26,980 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-62) BaseAsyncTask::LogEndTaskFailure: Task 299f45ce-15de-4a09-abe1-b39709ed2914 (Parent Command ImportRepoImage, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:^M
-- Result: cleanSuccess
-- Message: VDSGenericException: VDSTaskResultNotSuccessException: TaskState contained successful return code, but a non-success result ('cleanSuccess').,
-- Exception: VDSGenericException: VDSTaskResultNotSuccessException: TaskState contained successful return code, but a non-success result ('cleanSuccess').
...
2013-10-20 18:01:27,015 WARN  [org.ovirt.engine.core.bll.RemoveDiskCommand] (pool-5-thread-47) CanDoAction of action RemoveDisk failed. Reasons:VAR__ACTION__REMOVE,VAR__TYPE__VM_DISK,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL
...

At the moment we try only once to remove the image from the storage (and db) but in your case the SPM was unreachable at that time so the operation failed.

Comment 2 Allon Mureinik 2013-10-30 16:01:30 UTC
(In reply to Federico Simoncelli from comment #1)
> At the moment we try only once to remove the image from the storage (and db)
> but in your case the SPM was unreachable at that time so the operation
> failed.

This is the same behavior for a failed ImportVm, e.g.:
1. attempt to import
2. If unsuccessful - attempt to remove the disks
3. If removal failed, change state to ILLEGAL - admin can manually delete them later.

Closing based on the above explanation.