Red Hat Bugzilla – Bug 1467928
Shutdown of a vm during snapshot deletion renders the disk invalid
Last modified: 2017-08-10 08:09:05 EDT
Created attachment 1294645 [details]
engine.log from snapshot deletion
Description of problem:
Taking a snapshot of a vm containing more than one disk and shutting down that vm during live-remove of that snapshot renders at least one disk as invalid.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Created a vm with at least two disks attached.
2. Create offline snapshot (may not be relevant).
3. Put some changes on both disk, large enough to take some time for deletion.
4. delete snapshot, shutdown vm during that activity (issue "init 0" / "halt" on vm)
5. engine does process the snapshot deletion for some time, but aborts tasks finally - without having the snapshot of both disks deleted.
one or several disks are marked as invalid, depending on the amount of additional disks.
Snapshot is removed and disks are not marked as invalid
Starting the vm with an invalid disk will be rejected from the qemu process. This will render the vm to be useless as the only way to get rid of that status is to remove the disk.
Created attachment 1294646 [details]
vdsm.log from SPM-host
I am trying to reproduce this issue.
I did the following:
1. created a VM with 4 disks
2. created a snapshot
3. copied data to each disk
4. deleted the snapshot
5. powered-off the VM while deleting the snapshot.
The delete operation failed, the snapshot marked as OK and the status of each disk was illegal. All as expected and I am able to start the VM again and to delete the snapshot.
Can you please elaborate what do you mean by disks marked as invalid? Where/how do you see that?
after my deletion of the snapshot was marked as failed, the disk within the snapshot was also marked as failed.
See Virtual Machines -> [vm] -> Snapshots -> [snapshot] -> disks.
Currently I have not been able to remove *any* of my snapshots from that test-machine. I also have not been able to start the VM as one of the disk have been reported as invalid (Bad volume specification).
It seems that one of the delete operations succeeded, hence 33b202bd-55e7-4a0f-b6a1-b9057aee8099 doesn't exist.
The attached Vdsm log is partial. Can you please upload full Vdsm log?
Created attachment 1297436 [details]
vdsm.log part 1
Created attachment 1297437 [details]
vdsm.log part 2
Created attachment 1297438 [details]
vdsm.log part 3
Can you please upload the SPM log as well?
It seems that after the VM was shutdown, there was an attempt to delete the snapshot while the VM is down (aka cold merge), is this correct? If so, I'd like to ask you to try the flow again but this time without doing cold merge, and see whether it is possible to start the VM after it was shutdown during live merge.
The engine log seems partial, can you please send the full log?
Created attachment 1303777 [details]
If you have the SPM log of the last failure and can upload it, that could be helpful further analyzing.
those logs have been rotated into nirvana unfortunately. I will append a complete set of logs after I have had time to re-produce that issue.
Are there some more than engine.log and vdsm.log you would need?
Please remember to upload the logs of the SPM and the host running the VM, in addition to the engine.
It would be very helpful if you can document every step you perform - number of disks created, number of snapshot created, the time you perform the shutdown - is it specific time or some random time?
Also, the chain info (vdsm and qemu) would be useful - before the merge and after the shutdown.
After the live merge and the shutdown, do you perform a cold merge?
If yes, please try to run the VM *before* and after the cold merge, and send the chain info after the cold merge.
Any news on this?
Pushing to 4.1.6 until we're able to reproduce.