Description of problem: After starting the backup, a COW snapshot is created in /var/run/vdsm/backup/ on the hypervisor's local filesystem. If the VM is writing intensive, the disk may become full and VM will be paused. In addition, VM disk performance degrades. Version-Release number of selected component (if applicable): rhvh 4.4.1 Steps to Reproduce: 1. Start writing on VM disk 2. Start VM backup by python3 ./backup_vm.py full --engine-url https://engine-url --username admin@internal --cafile ca.pem --backup-dir /backup vm-id 3. file /var/run/vdsm/backup/<id> created and growning 4. /var become full Actual results: VM is in pause during IO error Expected results: COW snapshot is created on the VM storage
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
(In reply to Alexander Vasilev from comment #0) > Description of problem: > After starting the backup, a COW snapshot is created in > /var/run/vdsm/backup/ on the hypervisor's local filesystem. If the VM is > writing intensive, the disk may become full and VM will be paused. > In addition, VM disk performance degrades. VM performance is expected to degrade while copying modified data to the scratch disk. Maybe in this case the local disk performance is specially bad? > > Version-Release number of selected component (if applicable): > rhvh 4.4.1 > > Steps to Reproduce: > 1. Start writing on VM disk > 2. Start VM backup by python3 ./backup_vm.py full --engine-url > https://engine-url --username admin@internal --cafile ca.pem > --backup-dir /backup vm-id > 3. file /var/run/vdsm/backup/<id> created and growning This is a unix socket and it is not growing, but maybe you mean the transient disk under: /var/lib/vdsm/storage/transient_disks/ > 4. /var become full > > Actual results: > VM is in pause during IO error > > Expected results: > COW snapshot is created on the VM storage This is not implemented yet, and not easy to implement with block storage, since we need a way to track writes to the scratch disk to extend it. Does it happen in real backup? What the the throughput seen during the backup? Is this real VM used normally or some kind of extreme test?
> VM performance is expected to degrade while copying modified > data to the scratch disk. Maybe in this case the local disk performance > is specially bad? Yes. VM disks are on SSD and NVME Enterprise level storages, because it required by service. Hypervisors has a small 10K sas local disks for system only. I do not expect that local hypervisor disks to be a bottleneck for my virtual machines. > This is a unix socket and it is not growing, but maybe you > mean the transient disk under: > > /var/lib/vdsm/storage/transient_disks/ yes, maybe this is my mistake in the description > This is not implemented yet, and not easy to implement with block > storage, since we need a way to track writes to the scratch disk > to extend it. > Perhaps one dedicated high perfomance shared storage is needed to mount to hypervisors for these purposes? > Does it happen in real backup? What the the throughput seen during > the backup? Is this real VM used normally or some kind of extreme > test? I described a simple extreme example in topic, but i have a real risk because of this implementation. And VM performance degradation.
Alexander, I need more info to understand the severity and priority of this issue. Was this a normal backup of a real VM, or a stress test simulating extreme case? I need more info on the VM and the hypervisor Please provide: - VM XML (sudo virsh -r dumpxml vm-name) - vdsm log (/var/log/vdsm/vdsm.log) showing the backup in which the VM was pasued - ovirt-imageio daemon log (/var/log/ovirt-imageio/daemon.log) showing the transfer - Complete output of backup_vm.py command - output of df -h - output of "lvs --readonly storage-domain-uuid" if the disk was on block storage you may need to add --config 'devices { filter = [ "a|.*|"] }' if you have lvm filter configured. - description of I/O done in the VM during the backup - output of iostat (or similar tool) showing the I/O inside the guest during the backup. Regarding future solution, if we create the scratch disk on the same storage of the original disk, would it work for your use case? I assume that if you have a disk on high end storage, you want the scratch disk to be on the same storage to limit the performance degradation during backup. If you keep another disk on lower end storage, you probably don't want to use your best storage for the scratch disk for the other disk.
(In reply to Nir Soffer from comment #4) > Alexander, I need more info to understand the severity and priority > of this issue. > > Was this a normal backup of a real VM, or a stress test simulating > extreme case? > Stress test simulating in this case > I need more info on the VM and the hypervisor > > Please provide: > - VM XML (sudo virsh -r dumpxml vm-name) > - vdsm log (/var/log/vdsm/vdsm.log) showing the backup in which the VM was > pasued > - ovirt-imageio daemon log (/var/log/ovirt-imageio/daemon.log) showing the > transfer > - Complete output of backup_vm.py command > - output of df -h > - output of "lvs --readonly storage-domain-uuid" if the disk was on block > storage > you may need to add --config 'devices { filter = [ "a|.*|"] }' if you > have lvm > filter configured. > - description of I/O done in the VM during the backup > - output of iostat (or similar tool) showing the I/O inside the guest during > the backup. > A bit later, but is it necessary because the text is below? > Regarding future solution, if we create the scratch disk on the same storage > of > the original disk, would it work for your use case? Yes > > I assume that if you have a disk on high end storage, you want the scratch > disk to > be on the same storage to limit the performance degradation during backup. > If you > keep another disk on lower end storage, you probably don't want to use your > best > storage for the scratch disk for the other disk. You're absolutely right!
(In reply to Alexander Vasilev from comment #5) > (In reply to Nir Soffer from comment #4) > > I need more info on the VM and the hypervisor > > > > Please provide: > > - VM XML (sudo virsh -r dumpxml vm-name) > > - vdsm log (/var/log/vdsm/vdsm.log) showing the backup in which the VM was > > pasued > > - ovirt-imageio daemon log (/var/log/ovirt-imageio/daemon.log) showing the > > transfer > > - Complete output of backup_vm.py command > > - output of df -h > > - output of "lvs --readonly storage-domain-uuid" if the disk was on block > > storage > > you may need to add --config 'devices { filter = [ "a|.*|"] }' if you > > have lvm > > filter configured. > > - description of I/O done in the VM during the backup > > - output of iostat (or similar tool) showing the I/O inside the guest during > > the backup. > > > A bit later, but is it necessary because the text is below? It will help to understand how likely is this issue with real environment. Also, when you say that you experience performance degradation, how do you measure it, and what are the results? For example if you measure the performance by running fio in the guest, writing data to the entire disk, your test will cause all disk contents to be copied to the scratch disk. This is not real world use case. A more relevant case is to measure the guest performance using typical load expected during the backup window.
is this `./backup_vm.py` refer to https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.py
(In reply to Yaning Wang from comment #7) > is this `./backup_vm.py` refer to > https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm. > py Yes, but this is just an example of using the API. The issue is the implementation of scratch disks in vdsm.
(In reply to Nir Soffer from comment #8) > (In reply to Yaning Wang from comment #7) > > is this `./backup_vm.py` refer to > > https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm. > > py > > Yes, but this is just an example of using the API. The issue is the > implementation of scratch disks in vdsm. thanks for info so just to be clear, the `backup_vm.py` in my comment(#7) is the right one to use to reproduce the bug
(In reply to Yaning Wang from comment #9) > thanks for info > so just to be clear, the `backup_vm.py` in my comment(#7) is the right one > to use > to reproduce the bug Yes, this is good way to test the backup APIs.
Adding more info about this issue from backup partners, why this bug is important to fix: 1. Unclear capacity planning for infrastructure. In addition to provide enough space and iops for vms from storage side, customer should think about proper size of var directory and this values isn’t easy for calculate. 2. For hypervisor system disk usually using slow storage. That leading us to potential bottleneck, because when vm will be in backup state, delta io will use this slow disk, and customer won't understand why his very fast storage not loaded but vms works slowly. 3. Simple point how we can easy reach OOS in real life - backup large file server or database server. Backup of this vms can take a hours. So, I really sure that such vms generate more that 15GB for that period, and we again got pause state. Also we should remember that for database server which generated large random IO, use system volume for scratch - became critical in performance question. 4. More complex way. Imagine that we backup 15 vms simultaneously. We generate 15 times more data in var directory and 15 times more IO in this slow location. 5. Just fact, that we can stop production, because we backup them - very risky. For example, I don’t want that my domain controller will be stopped, because I backup it. Raising priority.
While i was verifying this TC: RHEVM-27586 , I encountered the following problem: Scratch disk remains locked (not being removed by the system) after VM is being paused due to lack of storage during the backup. Initial Storage state of block SD on my env: Total space: 74G Free Space: 48G Guaranteed: 48G Steps to reproduce: - Clone VM from template with thin OS disk (10G) - Create Preallocated disk of 20G and add it to the VM + mount it: - device="/dev/"$(lsblk -o NAME,FSTYPE,TYPE -dsn | grep disk | awk '$3 == "" {print $1}') - parted $device mktable gpt -s - parted -a optimal $device mkpart primary 0% 100% -s - mkfs.ext4 $device"1" - mount -o discard,defaults $device"1" /mnt - echo UUID=$(blkid $device"1" -sUUID -ovalue) /mnt "ext4" "defaults" "0" "1" | tee -a /etc/fstab - Create additional thin disk 20G on the same SD just to allocate some of the space on the SD - At this point we still have some free space on SD to start the backup - Start a full backup for a 20G disk on the VM - Start DD on the backed up VM disk (open SSH for the VM and cd to the mount point of the disk) - dd if=/dev/zero of=big2.raw bs=4k iflag=fullblock,count_bytes count=10G - If needed, repeat the above step, by making additional big file, till the point when the VM will be paused due to lack of storage on the SD: 2021-03-07 10:45:16,993+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [f59b33d] EVENT_ID: VM_PAUSED_ENOSPC(138), VM 26779 has been paused due to no Storage space error. - Finalize the backup. At this point the VM will change it's state from 'paused' to 'up' 2021-03-07 10:48:41,709+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-25) [f59b33d] VM 'a2337d6e-9e94-46fe-a5bf-c0ac08b1ee4f'(26779) moved from 'Paused' --> 'Up' - Now get back to the VM terminal and stop the DD command - Notice that on the 'disks' tab on the engine UI, there is scratch disk which remained in the locked state, although the back up was finalized. The LV is also there: [root@storage-ge13-vdsm1 ~]# lvs -o vg_name,lv_name,tags | grep 07d 9db95765-0fb7-485e-91f2-381354a66d13 5561a136-4126-47dd-b722-b34c1a6277a7 IU_8b50b815-57b3-45b7-9348-698ec1a8a07d,MD_9,PU_00000000-0000-0000-0000-000000000000 - and its size is ~20G : [root@storage-ge13-vdsm1 ~]# qemu-img measure /dev/9db95765-0fb7-485e-91f2-381354a66d13/5561a136-4126-47dd-b722-b34c1a6277a7 required size: 21474836480 fully allocated size: 21474836480 Attaching engine log + vdsm (which is also the SPM) + VM xml dump + image of the 'disks' tab where you will find the locked scratch disk and the VM disks.
Created attachment 1761236 [details] Validation logs
Verified on rhv-release-4.4.5-7 according the polarion test plan: RHEVM-27577 Passed RHEVM-27609 Passed RHEVM-27615 Passed RHEVM-27614 Partially passed. BZ opened to track this issue - "Scratch disk not removed if a VM goes to 'paused' state during the backup process" https://bugzilla.redhat.com/1936185
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.