Description of problem: When a stateless VM is shutdown, if its disk has 'wipe-after-delete' set, the volume will be zeroed out and then deleted. If the SPM goes non-responsive before the 'dd' completes, the volume does not get deleted. When the SPM switches, the task does get passed to it, but it doesn't delete the volume. The result is that the volume will still exist in the storage domain with its metadata containing; VOLTYPE=LEAF LEGALITY=ILLEGAL The base volume will have "VOLTYPE=INTERNAL". However, in the database, the image will have been deleted. It appears that the snapshot will also remain locked. Therefore, when someone tries to use this VM again (selected from a VM pool, for example), it will fail to start up. Version-Release number of selected component (if applicable): RHV 4.1; rhevm-4.1.1.8-0.1.el7 How reproducible: 100% if thentiming is right. Steps to Reproduce: 1. Create a VM (just a shell, no o/s needed) as stateless and with a 1gb disk in a block-based SD. 2. Set 'wipe-after-delete' on the disk. 3. Start the VM. 4. Power it off. 5. On the SPM, watch for the 'dd if=/dev/zero ....' to start. 6. Then block port 54321 on the SPM, e.g. 'iptables -I INPUT -p tcp --dport 54321 -j REJECT'. 7. The SPM will go non-responsive and the SPM will switch to another host. 8. Check the database, the SD and volume metadata. Actual results: The volume is not deleted. Expected results: When the new SPM is selected and the task is passed to it, it should delete the volume. Additional info:
I wonder if in such a case we can resend the wipe request the next time the VM is started.
This can be fixed when moving volume zeroing out of the spm, similar to copying disks on any host. Engine should manage the state of volumes until they are deleted. When a volume should be wiped before deleting it, engine should retry a wipe operation and/or display the volume so the user can retry the delete operation. Looks like 4.2 RFE to me.
(In reply to Nir Soffer from comment #9) > This can be fixed when moving volume zeroing out of the spm, similar to > copying > disks on any host. > > Engine should manage the state of volumes until they are deleted. When a > volume > should be wiped before deleting it, engine should retry a wipe operation > and/or > display the volume so the user can retry the delete operation. > > Looks like 4.2 RFE to me. Putting aside the bug vs RFE debate, we had this problem since 3.1. It should be handled, but I agree that it probably isn't zstream material. Pushing out.
blkdiscard is now the default wipe method, and is about 100 times faster, so the chance to fail in the middle of the discard operation is smaller. But it can still happen. I did not test it.
Elad, do you want to test if this is reproducible with 4.2?
Kevin is on it
Shir, please give it a try
The bug was reproduced by the following steps: 1. Create a VM (just a shell, no o/s needed) as stateless and with a 1gb disk in a block-based SD. 2. Set 'wipe-after-delete' on the disk. 3. Start the VM. 4. Power it off. 5. On the SPM, watch for the 'dd if=/dev/zero ....' to start. 6. Then block port 54321 on the SPM, e.g. 'iptables -I INPUT -p tcp --dport 54321 -j REJECT'. 7. The SPM will go non-responsive 8. Check in the database if the volume of the vm_snapshot is there (select * from images) 9. Click on "Host has been Rebooted" for the SPM 10. Reboot the host 11. The SPM switch to another host 12. Check in the database if the volume of the vm_snapshot is there (select * from images) Actual results: The volume is not deleted from DB. Expected results: When the new SPM is selected and the task is passed, it should delete the volume.
(In reply to Shir Fishbain from comment #26) > The bug was reproduced by the following steps: ... > 9. Click on "Host has been Rebooted" for the SPM > 10. Reboot the host This is the wrong order, and also hard to test. It should be: - poweroff the host - wait until host is powered off - click on "Host has been Rebooted" for the SPM > Expected results: > > When the new SPM is selected and the task is passed, it should delete the > volume. Makes sense, but I don't know if this was implemented. Looks like RFE. Can we delete the disk manually after switching the SPM? This should be good enough for recovery: 1. user delete a vm 2. operation fail because the SPM becomes non-responsive 3. user switch the SPM to another host 4. user retry the operation We should not promise automatic recovery from fatal errors like SPM becoming non-responsive.
(In reply to Shir Fishbain from comment #26) > The bug was reproduced by the following steps: ... > 5. On the SPM, watch for the 'dd if=/dev/zero ....' to start. We use now blkdiscard -z to wipe volumes. How did you see dd process? Please provide vdsm log showing this flow.
Created attachment 1524024 [details] Logs
(In reply to Shir Fishbain from comment #29) > Created attachment 1524024 [details] Shir, engine logs seems to contain 4 vm remove operations, and old SPM vdsm log contains 4 deleteImage calls. Please add the missing info about the run reproducing the issue: - vm id deleted - disk id deleted (image id in vdsm terms) - time test was started - time SPM access was blocked Also when you say "select * from images" we want to see the result of the query before and after the vm was removed. This would show the missing info.
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
(In reply to Nir Soffer from comment #30) Hi Nir, I attached the logs and the results from DB: vdsm3 - the old SPM vdsm2 - the new SPM vdsm1 engine log DB (step_8) - before I made a reboot to vdsm3 DB (step_12) - after I made a reboot to vdsm3 Disk - details about the disk Important times : 15:49:47 Create a VM (just a shell, no o/s needed) as stateless and with a 1gb disk in a block-based SD. 15:52:33 Start the VM 15:59:36 Power off VM 16:01:40 Non-responsive (old SPM - vdsm3) 16:15:33 The SPM switch to another host (vdsm2) The engine still exists, I can give you all the details.
Created attachment 1524661 [details] New_Logs
Nir, can you please estimate the time required to fix this bug?
(In reply to Tal Nisan from comment #35) Not sure yet what is the issue reproduced, we need to check the logs in attachment 1524661 [details]. If this is the known issue of doing discard/zeroing on the SPM, this requires changing the flow to: 1. Engine marks volume for deletion on the SPM. Engine must keep the removed volume in the db at this point (marked as removed?) 2. Engine run wipe storage job on any available host 3. Engine delete the disk on the SPM. If the operation was successful engine should delete the volume from the db. On vdsm side we need new APIs: - remove_volume - remove volume from the namespace so it cannot be used, but keep the backing storage (SPM only). - wipe_volume - perform zero/discard on a deleted volume (run on any host). If the operation fails engine can retry the operation on another host. - delete_volume - remove volume backing from storage (SPM only). With this if the SPM is not available, engine can retry the remove volume operation when the next SPM is available. If the SPM is not available when doing the discard/zero operation, the system will not be affected. Daniel, what do think?
IT is an old bug that we didn't see recently. The attached KCS has 8 cases attached to it, but they all looks older. So maybe this is something fixed in the current release? https://access.redhat.com/solutions/3034601
This bug/RFE is more than 2 years old and it didn't get enough attention so far and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.
(In reply to Shir Fishbain from comment #26) > The bug was reproduced by the following steps: > > 1. Create a VM (just a shell, no o/s needed) as stateless and with a 1gb > disk in a block-based SD. > 2. Set 'wipe-after-delete' on the disk. > 3. Start the VM. > 4. Power it off. > 5. On the SPM, watch for the 'dd if=/dev/zero ....' to start. Here I used: while true; do ps -ef | grep "blkdiscard" | grep -v grep; sleep 0.1; done; > 6. Then block port 54321 on the SPM, e.g. 'iptables -I INPUT -p tcp --dport > 54321 -j REJECT'. > 7. The SPM will go non-responsive > 8. Check in the database if the volume of the vm_snapshot is there (select * > from images) When the VM was down I saw in the web admin that the stateless snapshot was gone, and also the vm_snapshot_id of the stateless snapshot didn't appear in the DB. Is that good enough for the verification? > 9. Click on "Host has been Rebooted" for the SPM > 10. Reboot the host > 11. The SPM switch to another host > 12. Check in the database if the volume of the vm_snapshot is there (select > * from images) > > Versions: engine-4.4.8.5-0.4.el8ev vdsm-4.40.80.5-1.el8ev