Bug 1451423 - Shutting down a stateless VM volume with 'wipe-after-delete' set can fail if the SPM goes non-responsive
Summary: Shutting down a stateless VM volume with 'wipe-after-delete' set can fail if ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.1
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ovirt-4.5.0
: 4.4.0
Assignee: Nobody
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks: 1520566
TreeView+ depends on / blocked
 
Reported: 2017-05-16 15:35 UTC by Gordon Watson
Modified: 2021-09-30 06:57 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-30 06:57:39 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs (3.28 MB, application/zip)
2019-01-27 14:57 UTC, Shir Fishbain
no flags Details
New_Logs (3.92 MB, application/zip)
2019-01-29 14:50 UTC, Shir Fishbain
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3034601 0 None None None 2018-08-13 17:48:31 UTC

Description Gordon Watson 2017-05-16 15:35:04 UTC
Description of problem:

When a stateless VM is shutdown, if its disk has 'wipe-after-delete' set, the volume will be zeroed out and then deleted. If the SPM goes non-responsive before the 'dd' completes, the volume does not get deleted. When the SPM switches, the task does get passed to it, but it doesn't delete the volume.

The result is that the volume will still exist in the storage domain with its metadata containing;

   VOLTYPE=LEAF
   LEGALITY=ILLEGAL

The base volume will have "VOLTYPE=INTERNAL".

However, in the database, the image will have been deleted. It appears that the snapshot will also remain locked.

Therefore, when someone tries to use this VM again (selected from a VM pool, for example), it will fail to start up.



Version-Release number of selected component (if applicable):

RHV 4.1;
  rhevm-4.1.1.8-0.1.el7


How reproducible:

100% if thentiming is right.


Steps to Reproduce:

1. Create a VM (just a shell, no o/s needed) as stateless and with a 1gb disk in a block-based SD.
2. Set 'wipe-after-delete' on the disk.
3. Start the VM.
4. Power it off.
5. On the SPM, watch for the 'dd if=/dev/zero ....' to start.
6. Then block port 54321 on the SPM, e.g. 'iptables -I INPUT -p tcp --dport 54321 -j REJECT'.
7. The SPM will go non-responsive and the SPM will switch to another host.
8. Check the database, the SD and volume metadata.



Actual results:

The volume is not deleted.


Expected results:

When the new SPM is selected and the task is passed to it, it should delete the volume.


Additional info:

Comment 6 Allon Mureinik 2017-05-17 07:00:11 UTC
I wonder if in such a case we can resend the wipe request the next time the VM is started.

Comment 9 Nir Soffer 2017-05-28 09:27:52 UTC
This can be fixed when moving volume zeroing out of the spm, similar to copying
disks on any host.

Engine should manage the state of volumes until they are deleted. When a volume 
should be wiped before deleting it, engine should retry a wipe operation and/or
display the volume so the user can retry the delete operation.

Looks like 4.2 RFE to me.

Comment 10 Allon Mureinik 2017-06-13 08:57:08 UTC
(In reply to Nir Soffer from comment #9)
> This can be fixed when moving volume zeroing out of the spm, similar to
> copying
> disks on any host.
> 
> Engine should manage the state of volumes until they are deleted. When a
> volume 
> should be wiped before deleting it, engine should retry a wipe operation
> and/or
> display the volume so the user can retry the delete operation.
> 
> Looks like 4.2 RFE to me.

Putting aside the bug vs RFE debate, we had this problem since 3.1.
It should be handled, but I agree that it probably isn't zstream material.

Pushing out.

Comment 18 Nir Soffer 2018-08-14 16:31:03 UTC
blkdiscard is now the default wipe method, and is about 100 times faster, so the
chance to fail in the middle of the discard operation is smaller. But it can still
happen. I did not test it.

Comment 20 Nir Soffer 2018-08-14 18:44:14 UTC
Elad, do you want to test if this is reproducible with 4.2?

Comment 21 Elad 2018-08-15 14:55:06 UTC
Kevin is on it

Comment 25 Elad 2019-01-27 09:26:16 UTC
Shir, please give it a try

Comment 26 Shir Fishbain 2019-01-27 12:53:29 UTC
The bug was reproduced by the following steps:

1. Create a VM (just a shell, no o/s needed) as stateless and with a 1gb disk in a block-based SD.
2. Set 'wipe-after-delete' on the disk.
3. Start the VM.
4. Power it off.
5. On the SPM, watch for the 'dd if=/dev/zero ....' to start.
6. Then block port 54321 on the SPM, e.g. 'iptables -I INPUT -p tcp --dport 54321 -j REJECT'.
7. The SPM will go non-responsive 
8. Check in the database if the volume of the vm_snapshot is there (select * from images)
9. Click on "Host has been Rebooted" for the SPM 
10. Reboot the host
11. The SPM switch to another host 
12. Check in the database if the volume of the vm_snapshot is there (select * from images)


Actual results:

The volume is not deleted from DB.

Expected results:

When the new SPM is selected and the task is passed, it should delete the volume.

Comment 27 Nir Soffer 2019-01-27 13:28:23 UTC
(In reply to Shir Fishbain from comment #26)
> The bug was reproduced by the following steps:
...
> 9. Click on "Host has been Rebooted" for the SPM 
> 10. Reboot the host

This is the wrong order, and also hard to test.
It should be:

- poweroff the host
- wait until host is powered off
- click on "Host has been Rebooted" for the SPM

> Expected results:
> 
> When the new SPM is selected and the task is passed, it should delete the
> volume.

Makes sense, but I don't know if this was implemented. Looks like RFE.

Can we delete the disk manually after switching the SPM?

This should be good enough for recovery:

1. user delete a vm
2. operation fail because the SPM becomes non-responsive
3. user switch the SPM to another host
4. user retry the operation

We should not promise automatic recovery from fatal errors like SPM becoming
non-responsive.

Comment 28 Nir Soffer 2019-01-27 13:41:57 UTC
(In reply to Shir Fishbain from comment #26)
> The bug was reproduced by the following steps:
...
> 5. On the SPM, watch for the 'dd if=/dev/zero ....' to start.

We use now blkdiscard -z to wipe volumes. How did you see dd process?

Please provide vdsm log showing this flow.

Comment 29 Shir Fishbain 2019-01-27 14:57:21 UTC
Created attachment 1524024 [details]
Logs

Comment 30 Nir Soffer 2019-01-27 19:17:01 UTC
(In reply to Shir Fishbain from comment #29)
> Created attachment 1524024 [details]

Shir, engine logs seems to contain 4 vm remove operations, and old SPM vdsm log
contains 4 deleteImage calls.

Please add the missing info about the run reproducing the issue:

- vm id deleted
- disk id deleted (image id in vdsm terms)
- time test was started
- time SPM access was blocked

Also when you say "select * from images" we want to see the result of the query
before and after the vm was removed. This would show the missing info.

Comment 31 Sandro Bonazzola 2019-01-28 09:41:37 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 33 Shir Fishbain 2019-01-29 14:49:21 UTC
(In reply to Nir Soffer from comment #30)

Hi Nir,

I attached the logs and the results from DB:
vdsm3 -  the old SPM
vdsm2 - the new SPM
vdsm1
engine log
DB (step_8) - before I made a reboot to vdsm3
DB (step_12) - after I made a reboot to vdsm3
Disk - details about the disk

Important times :
15:49:47 Create a VM (just a shell, no o/s needed) as stateless and with a 1gb disk in a block-based SD.
15:52:33 Start the VM
15:59:36 Power off VM
16:01:40 Non-responsive (old SPM - vdsm3)
16:15:33 The SPM switch to another host (vdsm2)

The engine still exists, I can give you all the details.

Comment 34 Shir Fishbain 2019-01-29 14:50:44 UTC
Created attachment 1524661 [details]
New_Logs

Comment 35 Tal Nisan 2019-02-18 11:26:35 UTC
Nir, can you please estimate the time required to fix this bug?

Comment 36 Nir Soffer 2019-08-02 14:52:50 UTC
(In reply to Tal Nisan from comment #35)

Not sure yet what is the issue reproduced, we need to check the logs in 
attachment 1524661 [details].

If this is the known issue of doing discard/zeroing on the SPM, this requires
changing the flow to:

1. Engine marks volume for deletion on the SPM. Engine must keep the removed
  volume in the db at this point (marked as removed?)

2. Engine run wipe storage job on any available host

3. Engine delete the disk on the SPM.  If the operation was successful engine
   should delete the volume from the db.

On vdsm side we need new APIs:

- remove_volume - remove volume from the namespace so it cannot be used,
  but keep the backing storage (SPM only).

- wipe_volume - perform zero/discard on a deleted volume (run on any host).
  If the operation fails engine can retry the operation on another host.

- delete_volume - remove volume backing from storage (SPM only).

With this if the SPM is not available, engine can retry the remove volume
operation when the next SPM is available. If the SPM is not available when
doing the discard/zero operation, the system will not be affected.

Daniel, what do think?

Comment 37 Marina Kalinin 2021-05-24 19:59:20 UTC
IT is an old bug that we didn't see recently.
The attached KCS has 8 cases attached to it, but they all looks older.
So maybe this is something fixed in the current release?
https://access.redhat.com/solutions/3034601

Comment 38 Eyal Shenitzky 2021-08-29 08:35:16 UTC
This bug/RFE is more than 2 years old and it didn't get enough attention so far and is now flagged as pending close. 
Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.

Comment 41 sshmulev 2021-08-31 15:12:53 UTC
(In reply to Shir Fishbain from comment #26)
> The bug was reproduced by the following steps:
> 
> 1. Create a VM (just a shell, no o/s needed) as stateless and with a 1gb
> disk in a block-based SD.
> 2. Set 'wipe-after-delete' on the disk.
> 3. Start the VM.
> 4. Power it off.
> 5. On the SPM, watch for the 'dd if=/dev/zero ....' to start.
Here I used: while true; do ps -ef | grep "blkdiscard" | grep -v grep; sleep 0.1; done;

> 6. Then block port 54321 on the SPM, e.g. 'iptables -I INPUT -p tcp --dport
> 54321 -j REJECT'.
> 7. The SPM will go non-responsive 
> 8. Check in the database if the volume of the vm_snapshot is there (select *
> from images)
When the VM was down I saw in the web admin that the stateless snapshot was gone, and also the vm_snapshot_id of the stateless snapshot didn't appear in the DB.
Is that good enough for the verification?
> 9. Click on "Host has been Rebooted" for the SPM 
> 10. Reboot the host
> 11. The SPM switch to another host 
> 12. Check in the database if the volume of the vm_snapshot is there (select
> * from images)
> 
> 

Versions:
engine-4.4.8.5-0.4.el8ev
vdsm-4.40.80.5-1.el8ev


Note You need to log in before you can comment on or make changes to this bug.