Bug 1021230 - [engine-backend] [external-provider] after a connectivity lost with storage, which started during the importing of image from glance, disk enters to status 'Illegal'
[engine-backend] [external-provider] after a connectivity lost with storage,...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
x86_64 Unspecified
unspecified Severity high
: ---
: 3.3.0
Assigned To: Federico Simoncelli
Elad
storage
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-20 12:02 EDT by Elad
Modified: 2016-02-10 14:39 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-30 12:01:30 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
amureini: Triaged+


Attachments (Terms of Use)
logs (1.11 MB, application/x-gzip)
2013-10-20 12:02 EDT, Elad
no flags Details

  None (edit)
Description Elad 2013-10-20 12:02:00 EDT
Created attachment 814237 [details]
logs

Description of problem:
When SPM gets its connection to master domain back, after it lost it during importing an image from glance, the disk enters to status 'Illegal'.  

Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.27.beta1.el6ev.noarch
vdsm-4.13.0-0.3.beta1.el6ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. add a glance images external provider to RHEVM (with images on it)
2. import an image to rhevm. During the import, block connectivity from all hosts to master domain


Actual results:
After reconstruct, the disk status changes from 'locked' to 'Illegal' (in my case, reconstruct didn't take place because of https://bugzilla.redhat.com/show_bug.cgi?id=1017177)

The image exists under /rhev/data-center:

[root@nott-vds1 images]# ll
total 12
drwxr-xr-x. 2 vdsm kvm 4096 Oct 20 17:55 9d34cb46-21cc-42be-8b64-574858c796ee


On DB:
su - postgres -c "psql -U postgres engine -c  'select storage_name , image_group_id , imagestatus from all_disks;'"  | less -S

   storage_name   |            image_group_id            | imagestatus
------------------+--------------------------------------+-------------
 iscsi1-1-xtremio | 9d7bcd53-7ce4-440a-9e14-7973d197d177 |           1
 iscsi1-1-xtremio | 9d34cb46-21cc-42be-8b64-574858c796ee |           4
 iscsi1-1-xtremio | 15c2f67e-1eb3-4cc5-9f3a-f40abed7fbc4 |           1




Expected results:
Disk should be removed from storage and DB when the connection to storage resumes

Additional info:
logs
Comment 1 Federico Simoncelli 2013-10-30 09:29:20 EDT
2013-10-20 18:01:26,976 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-62) SPMAsyncTask::PollTask: Polling task 299f45ce-15de-4a09-abe1-b39709ed2914 (Parent Command ImportRepoImage, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2013-10-20 18:01:26,980 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-62) BaseAsyncTask::LogEndTaskFailure: Task 299f45ce-15de-4a09-abe1-b39709ed2914 (Parent Command ImportRepoImage, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:^M
-- Result: cleanSuccess
-- Message: VDSGenericException: VDSTaskResultNotSuccessException: TaskState contained successful return code, but a non-success result ('cleanSuccess').,
-- Exception: VDSGenericException: VDSTaskResultNotSuccessException: TaskState contained successful return code, but a non-success result ('cleanSuccess').
...
2013-10-20 18:01:27,015 WARN  [org.ovirt.engine.core.bll.RemoveDiskCommand] (pool-5-thread-47) CanDoAction of action RemoveDisk failed. Reasons:VAR__ACTION__REMOVE,VAR__TYPE__VM_DISK,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL
...

At the moment we try only once to remove the image from the storage (and db) but in your case the SPM was unreachable at that time so the operation failed.
Comment 2 Allon Mureinik 2013-10-30 12:01:30 EDT
(In reply to Federico Simoncelli from comment #1)
> At the moment we try only once to remove the image from the storage (and db)
> but in your case the SPM was unreachable at that time so the operation
> failed.

This is the same behavior for a failed ImportVm, e.g.:
1. attempt to import
2. If unsuccessful - attempt to remove the disks
3. If removal failed, change state to ILLEGAL - admin can manually delete them later.

Closing based on the above explanation.

Note You need to log in before you can comment on or make changes to this bug.