Created attachment 1761288 [details] logs Description of problem: Scratch disk remains locked (not being removed by the system) after VM is being paused due to lack of storage during the backup. This could affect the end user by increased waste of storage space when something unpredictable happens during the backup attempt. More ever, the scratch disk remains locked and can not be removed. Version-Release number of selected component (if applicable): rhv-release-4.4.5-7 How reproducible: 100% Steps to Reproduce: I assume the the 'pausing' of the VM can be caused by more than one way, but in this reproduction I 'chocked' the SD causing it to run low on storage space (block storage). The point is to make VMs state 'paused' during the backup phaze. - Clone VM from template with thin OS disk (10G) - Create Preallocated disk of 20G and add it to the VM + mount it: - device="/dev/"$(lsblk -o NAME,FSTYPE,TYPE -dsn | grep disk | awk '$3 == "" {print $1}') - parted $device mktable gpt -s - parted -a optimal $device mkpart primary 0% 100% -s - mkfs.ext4 $device"1" - mount -o discard,defaults $device"1" /mnt - echo UUID=$(blkid $device"1" -sUUID -ovalue) /mnt "ext4" "defaults" "0" "1" | tee -a /etc/fstab - Create additional thin disk 20G on the same SD just to allocate some of the space on the SD - At this point we still have some free space on SD to start the backup - Start a full backup for a 20G disk on the VM - Start DD on the backed up VM disk (open SSH for the VM and cd to the mount point of the disk) - dd if=/dev/zero of=big2.raw bs=4k iflag=fullblock,count_bytes count=10G - If needed, repeat the above step, by making additional big file, till the point when the VM will be paused due to lack of storage on the SD: 2021-03-07 10:45:16,993+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [f59b33d] EVENT_ID: VM_PAUSED_ENOSPC(138), VM 26779 has been paused due to no Storage space error. - Finalize the backup. At this point the VM will change it's state from 'paused' to 'up' 2021-03-07 10:48:41,709+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-25) [f59b33d] VM 'a2337d6e-9e94-46fe-a5bf-c0ac08b1ee4f'(26779) moved from 'Paused' --> 'Up' - Now get back to the VM terminal and stop the DD command - Notice that on the 'disks' tab on the engine UI, there is scratch disk which remained in the locked state, although the back up was finalized. The LV is also there: [root@storage-ge13-vdsm1 ~]# lvs -o vg_name,lv_name,tags | grep 07d 9db95765-0fb7-485e-91f2-381354a66d13 5561a136-4126-47dd-b722-b34c1a6277a7 IU_8b50b815-57b3-45b7-9348-698ec1a8a07d,MD_9,PU_00000000-0000-0000-0000-000000000000 - and its size is ~20G : [root@storage-ge13-vdsm1 ~]# qemu-img measure /dev/9db95765-0fb7-485e-91f2-381354a66d13/5561a136-4126-47dd-b722-b34c1a6277a7 required size: 21474836480 fully allocated size: 21474836480 Actual results: Scratch disk remains in 'locked' state on the SD Expected results: Scratch Disk should be removed when backup is done Additional info: Attaching engine log + vdsm (which is also the SPM) + VM xml dump + image of the 'disks' tab where you will find the locked scratch disk and the VM disks.
Verified on rhv-release-4.4.6-6 When VM reaches 'paused' state during the backup, and the backup gets finalized, then the scratch disk is being removed as expected.
This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.