1936185 – [CBT] Scratch disk not removed if a VM goes to 'paused' state during the backup process

Bug 1936185 - [CBT] Scratch disk not removed if a VM goes to 'paused' state during the backup process

Summary: [CBT] Scratch disk not removed if a VM goes to 'paused' state during the back...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Storage
Sub Component:
Version:	4.4.5.7
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.4.6
Target Release:	4.4.6.4
Assignee:	Eyal Shenitzky
QA Contact:	Ilan Zuckerman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-07 13:11 UTC by Ilan Zuckerman
Modified:	2021-05-05 05:35 UTC (History)
CC List:	4 users (show)
Fixed In Version:	ovirt-engine-4.4.6.4
Clone Of:
Environment:
Last Closed:	2021-05-05 05:35:51 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
logs (667.39 KB, application/zip) 2021-03-07 13:11 UTC, Ilan Zuckerman	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	113796	0	master	MERGED	core: remove scratch disks if VM paused	2021-03-15 15:45:07 UTC

Description Ilan Zuckerman 2021-03-07 13:11:32 UTC

Created attachment 1761288 [details]
logs

Description of problem:

Scratch disk remains locked (not being removed by the system) after VM is being paused due to lack of storage during the backup.
This could affect the end user by increased waste of storage space when something unpredictable happens during the backup attempt. More ever, the scratch disk remains locked and can not be removed.

Version-Release number of selected component (if applicable):
rhv-release-4.4.5-7

How reproducible:
100%

Steps to Reproduce:

I assume the the 'pausing' of the VM can be caused by more than one way, but in this reproduction I 'chocked' the SD causing it to run low on storage space (block storage).
The point is to make VMs state 'paused' during the backup phaze.

- Clone VM from template with thin OS disk (10G)
- Create Preallocated disk of 20G and add it to the VM + mount it:
  - device="/dev/"$(lsblk -o NAME,FSTYPE,TYPE -dsn | grep disk | awk '$3 == "" {print $1}')
  - parted $device mktable gpt -s
  - parted -a optimal $device mkpart primary 0% 100% -s
  - mkfs.ext4 $device"1"
  - mount -o discard,defaults $device"1" /mnt
  - echo UUID=$(blkid $device"1" -sUUID -ovalue) /mnt "ext4" "defaults" "0" "1" | tee -a /etc/fstab

- Create additional thin disk 20G on the same SD just to allocate some of the space on the SD
- At this point we still have some free space on SD to start the backup
- Start a full backup for a 20G disk on the VM
- Start DD on the backed up VM disk (open SSH for the VM and cd to the mount point of the disk)
  - dd if=/dev/zero of=big2.raw bs=4k iflag=fullblock,count_bytes count=10G
- If needed, repeat the above step, by making additional big file, till the point when the VM will be paused due to lack of storage on the SD:

2021-03-07 10:45:16,993+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [f59b33d] EVENT_ID: VM_PAUSED_ENOSPC(138), VM 26779 has been paused due to no Storage space error.

- Finalize the backup. At this point the VM will change it's state from 'paused' to 'up'

2021-03-07 10:48:41,709+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-25) [f59b33d] VM 'a2337d6e-9e94-46fe-a5bf-c0ac08b1ee4f'(26779) moved from 'Paused' --> 'Up'

- Now get back to the VM terminal and stop the DD command
- Notice that on the 'disks' tab on the engine UI, there is scratch disk which remained in the locked state, although the back up was finalized. The LV is also there:

[root@storage-ge13-vdsm1 ~]# lvs -o vg_name,lv_name,tags | grep 07d
  9db95765-0fb7-485e-91f2-381354a66d13 5561a136-4126-47dd-b722-b34c1a6277a7 IU_8b50b815-57b3-45b7-9348-698ec1a8a07d,MD_9,PU_00000000-0000-0000-0000-000000000000

- and its size is ~20G :

[root@storage-ge13-vdsm1 ~]# qemu-img measure /dev/9db95765-0fb7-485e-91f2-381354a66d13/5561a136-4126-47dd-b722-b34c1a6277a7 
required size: 21474836480
fully allocated size: 21474836480


Actual results:
Scratch disk remains in 'locked' state on the SD

Expected results:
Scratch Disk should be removed when backup is done

Additional info:
Attaching engine log + vdsm (which is also the SPM) + VM xml dump + image of the 'disks' tab where you will find the locked scratch disk and the VM disks.

Comment 1 Ilan Zuckerman 2021-04-29 07:18:54 UTC

Verified on rhv-release-4.4.6-6

When VM reaches 'paused' state during the backup, and the backup gets finalized, then the scratch disk is being removed as expected.

Comment 2 Sandro Bonazzola 2021-05-05 05:35:51 UTC

This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.