Created attachment 658724 [details] all logs and db dump Description of problem: 1. I had a 2 disks created as a Floating disk attached to a vm in a posix domain 2. I ran the vm and moved the disks 3. the move disks failed because of disk space 4. after a day, I removed all my vms including this vm 5. this vm failed to be deleted 6. the disks were detached from the vm and its image changed to illegal state. 7. I tried removing them again 8. the disks fails to be deleted (image does not exist error from vdsm) and the image status changed back to OK. looking at the engine logs, I see that the image id marked as illegal is different from the image id I am trying to delete after the disk changed to OK. looks like it might have been the snapshot that was changed to illegal. also, the images that were marked as illegal still appear in the db. Version-Release number of selected component (if applicable): si24.5 Actual results: cannot remove the disk with image does not exist error from vdsm. Expected results: we should be able to remove the disks from the db we should change both snapshot and disk to illegal Additional info: all logs and db dump.
What to fix here? Should we let disks to be removed while we get VDSM exception/failure on removeImage, or disks will stay at ILLEGAL state and not turn to OK again, or both?
(In reply to comment #1) > What to fix here? > Should we let disks to be removed while we get VDSM exception/failure on > removeImage, > or disks will stay at ILLEGAL state and not turn to OK again, > or both? - If VDSM reports success - delete from DB - If VDSM reports image does not exist - delete from DB - Any other response set to illegal Dafna, in any case make sure that the VDSM response was correct or clone this bug to VDSM to report a different error if the image actually exists.
As Simon said: - If VDSM reports success - delete from DB - If VDSM reports image does not exist - delete from DB - Any other response set to illegal In any event, disk should not go back to "OK" from illegal.
*** Bug 891052 has been marked as a duplicate of this bug. ***
dealt with pre-integration ticket, removing need-info.
I don't think this change in engine's flow is reasonable, I think it's vdsm side responsibility. If removeImage command was sent to vdsm and the image does not exist, vdsm should start deleteImage task, set a taskId, and report success or fail with the reason the image does not exist. That way everything stays the same and act according to all other vds async operations. I don't understand why vdsm does act differently in that case. The flow need to be: creating deleteImage task start task try to delete image and get exception or not image does not exist sign task with fail or success status end task Then engine asyncTasks flow stays same as supposed for both sides. The suggested patch is too hacky in constant flow that we don't want to change.
After Alon, Maor, Yaniv and Yair talked. The fix should be in the infra tasks structure. If the task could not be created the command should be aware of that, and create a new task that will be associated with a new state. This behaviour should be resemble for all kinds of tasks.
*** Bug 916554 has been marked as a duplicate of this bug. ***
Patch was reverted as it breaks QE automation tests. Need to revisit once automations are fixed.
moving back to POST since patch was reverted on sf14
changing back to ON_DEV per request from development.
Liron - please document the behavior change.
after speaking to Liron I tested the following: 1. removing a vm with no disk in the storage (image does not exist from vdsm) 2. removing a vm which fails remove in the vds. moving to verified on sf16 however I am adding release notes request because users might be confused seeing the domain reporting used space of objects which are no longer in the DB.
*** Bug 928902 has been marked as a duplicate of this bug. ***
3.2 has been released