Bug 884635

Summary: engine: I tried to remove an image marked as illegal -> image was not removed but the illegal status changed back to OK
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: abaron, amureini, bazulay, dcaroest, dyasny, eedri, gwatson, hateya, iheim, lpeer, mkenneth, Rhev-m-bugs, scohen, sgrinber, spandura, ybronhei, yeylon, ykaul, yzaslavs
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: sf16 Doc Type: Bug Fix
Doc Text:
Cause: Removing an image marked as illegal, caused the image status to change back to OK . Consequence: Image that was marked as illegal and has failed to be removed, changed to status OK after failing to execute the delete operation. Fix: Images status is saved as illegal for rollback rather then OK before executing the vdsm call for deleting the image. Result: The image status won't change from illegal to ok.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-11 09:15:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 890922, 948448, 949283    
Attachments:
Description Flags
all logs and db dump none

Description Dafna Ron 2012-12-06 13:08:17 UTC
Created attachment 658724 [details]
all logs and db dump

Description of problem:

1. I had a 2 disks created as a Floating disk attached to a vm in a posix domain 
2. I ran the vm and moved the disks
3. the move disks failed because of disk space
4. after a day, I removed all my vms including this vm
5. this vm failed to be deleted 
6. the disks were detached from the vm and its image changed to illegal state. 
7. I tried removing them again
8. the disks fails to be deleted (image does not exist error from vdsm) and the image status changed back to OK. 


looking at the engine logs, I see that the image id marked as illegal is different from the image id I am trying to delete after the disk changed to OK. 
looks like it might have been the snapshot that was changed to illegal. 

also, the images that were marked as illegal still appear in the db. 

Version-Release number of selected component (if applicable):

si24.5

  Actual results:

cannot remove the disk with image does not exist error from vdsm. 

Expected results:

we should be able to remove the disks from the db
we should change both snapshot and disk to illegal 

Additional info: all logs and db dump.

Comment 1 Maor 2013-01-16 20:00:49 UTC
What to fix here?
Should we let disks to be removed while we get VDSM exception/failure on removeImage,
or disks will stay at ILLEGAL state and not turn to OK again,
or both?

Comment 2 Simon Grinberg 2013-01-24 12:37:55 UTC
(In reply to comment #1)
> What to fix here?
> Should we let disks to be removed while we get VDSM exception/failure on
> removeImage,
> or disks will stay at ILLEGAL state and not turn to OK again,
> or both?

- If VDSM reports success - delete from DB
- If VDSM reports image does not exist - delete from DB
- Any other response set to illegal 

Dafna, in any case make sure that the VDSM response was correct or clone this bug to VDSM to report a different error if the image actually exists.

Comment 3 Ayal Baron 2013-01-27 10:30:09 UTC
As Simon said:
- If VDSM reports success - delete from DB
- If VDSM reports image does not exist - delete from DB
- Any other response set to illegal

In any event, disk should not go back to "OK" from illegal.

Comment 4 Ayal Baron 2013-01-31 15:08:50 UTC
*** Bug 891052 has been marked as a duplicate of this bug. ***

Comment 6 Haim 2013-03-11 15:27:20 UTC
dealt with pre-integration ticket, removing need-info.

Comment 7 Yaniv Bronhaim 2013-03-12 22:45:39 UTC
I don't think this change in engine's flow is reasonable, I think it's vdsm side responsibility. 

If removeImage command was sent to vdsm and the image does not exist, vdsm should start deleteImage task, set a taskId, and report success or fail with the reason the image does not exist. That way everything stays the same and act according to all other vds async operations. I don't understand why vdsm does act differently in that case.
 
The flow need to be:
 creating deleteImage task
 start task
 try to delete image and get exception or not image does not exist
 sign task with fail or success status
 end task

Then engine asyncTasks flow stays same as supposed for both sides.

The suggested patch is too hacky in constant flow that we don't want to change.

Comment 8 Maor 2013-03-13 14:05:22 UTC
After Alon, Maor, Yaniv and Yair talked.
The fix should be in the infra tasks structure.
If the task could not be created the command should be aware of that, and create a new task that will be associated with a new state.
This behaviour should be resemble for all kinds of tasks.

Comment 9 Maor 2013-03-14 15:57:34 UTC
*** Bug 916554 has been marked as a duplicate of this bug. ***

Comment 11 Allon Mureinik 2013-04-25 09:05:13 UTC
Patch was reverted as it breaks QE automation tests.
Need to revisit once automations are fixed.

Comment 13 Eyal Edri 2013-04-28 07:24:06 UTC
moving back to POST since patch was reverted on sf14

Comment 14 Eyal Edri 2013-04-28 07:33:35 UTC
changing back to ON_DEV per request from development.

Comment 16 Allon Mureinik 2013-05-08 11:21:44 UTC
Liron - please document the behavior change.

Comment 17 Dafna Ron 2013-05-13 16:17:47 UTC
after speaking to Liron I tested the following: 

1. removing a vm with no disk in the storage (image does not exist from vdsm)
2. removing a vm which fails remove in the vds. 

moving to verified on sf16

however I am adding release notes request because users might be confused seeing the domain reporting used space of objects which are no longer in the DB.

Comment 18 Daniel Erez 2013-06-03 18:04:58 UTC
*** Bug 928902 has been marked as a duplicate of this bug. ***

Comment 19 Itamar Heim 2013-06-11 09:15:43 UTC
3.2 has been released

Comment 20 Itamar Heim 2013-06-11 09:41:14 UTC
3.2 has been released