Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1936185

Summary: [CBT] Scratch disk not removed if a VM goes to 'paused' state during the backup process
Product: [oVirt] ovirt-engine Reporter: Ilan Zuckerman <izuckerm>
Component: BLL.StorageAssignee: Eyal Shenitzky <eshenitz>
Status: CLOSED CURRENTRELEASE QA Contact: Ilan Zuckerman <izuckerm>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4.5.7CC: bugs, dfodor, eshenitz, sfishbai
Target Milestone: ovirt-4.4.6   
Target Release: 4.4.6.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.6.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-05 05:35:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Ilan Zuckerman 2021-03-07 13:11:32 UTC
Created attachment 1761288 [details]
logs

Description of problem:

Scratch disk remains locked (not being removed by the system) after VM is being paused due to lack of storage during the backup.
This could affect the end user by increased waste of storage space when something unpredictable happens during the backup attempt. More ever, the scratch disk remains locked and can not be removed.

Version-Release number of selected component (if applicable):
rhv-release-4.4.5-7

How reproducible:
100%

Steps to Reproduce:

I assume the the 'pausing' of the VM can be caused by more than one way, but in this reproduction I 'chocked' the SD causing it to run low on storage space (block storage).
The point is to make VMs state 'paused' during the backup phaze.

- Clone VM from template with thin OS disk (10G)
- Create Preallocated disk of 20G and add it to the VM + mount it:
  - device="/dev/"$(lsblk -o NAME,FSTYPE,TYPE -dsn | grep disk | awk '$3 == "" {print $1}')
  - parted $device mktable gpt -s
  - parted -a optimal $device mkpart primary 0% 100% -s
  - mkfs.ext4 $device"1"
  - mount -o discard,defaults $device"1" /mnt
  - echo UUID=$(blkid $device"1" -sUUID -ovalue) /mnt "ext4" "defaults" "0" "1" | tee -a /etc/fstab

- Create additional thin disk 20G on the same SD just to allocate some of the space on the SD
- At this point we still have some free space on SD to start the backup
- Start a full backup for a 20G disk on the VM
- Start DD on the backed up VM disk (open SSH for the VM and cd to the mount point of the disk)
  - dd if=/dev/zero of=big2.raw bs=4k iflag=fullblock,count_bytes count=10G
- If needed, repeat the above step, by making additional big file, till the point when the VM will be paused due to lack of storage on the SD:

2021-03-07 10:45:16,993+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [f59b33d] EVENT_ID: VM_PAUSED_ENOSPC(138), VM 26779 has been paused due to no Storage space error.

- Finalize the backup. At this point the VM will change it's state from 'paused' to 'up'

2021-03-07 10:48:41,709+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-25) [f59b33d] VM 'a2337d6e-9e94-46fe-a5bf-c0ac08b1ee4f'(26779) moved from 'Paused' --> 'Up'

- Now get back to the VM terminal and stop the DD command
- Notice that on the 'disks' tab on the engine UI, there is scratch disk which remained in the locked state, although the back up was finalized. The LV is also there:

[root@storage-ge13-vdsm1 ~]# lvs -o vg_name,lv_name,tags | grep 07d
  9db95765-0fb7-485e-91f2-381354a66d13 5561a136-4126-47dd-b722-b34c1a6277a7 IU_8b50b815-57b3-45b7-9348-698ec1a8a07d,MD_9,PU_00000000-0000-0000-0000-000000000000

- and its size is ~20G :

[root@storage-ge13-vdsm1 ~]# qemu-img measure /dev/9db95765-0fb7-485e-91f2-381354a66d13/5561a136-4126-47dd-b722-b34c1a6277a7 
required size: 21474836480
fully allocated size: 21474836480


Actual results:
Scratch disk remains in 'locked' state on the SD

Expected results:
Scratch Disk should be removed when backup is done

Additional info:
Attaching engine log + vdsm (which is also the SPM) + VM xml dump + image of the 'disks' tab where you will find the locked scratch disk and the VM disks.

Comment 1 Ilan Zuckerman 2021-04-29 07:18:54 UTC
Verified on rhv-release-4.4.6-6

When VM reaches 'paused' state during the backup, and the backup gets finalized, then the scratch disk is being removed as expected.

Comment 2 Sandro Bonazzola 2021-05-05 05:35:51 UTC
This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.