1874483 – [CBT][RFE] During the VM backup, the local storage of the hypervisor is used

Bug 1874483 - [CBT][RFE] During the VM backup, the local storage of the hypervisor is used

Summary: [CBT][RFE] During the VM backup, the local storage of the hypervisor is used

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backup-Restore.VMs
Sub Component:
Version:	4.4.1.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.4.5
Target Release:	---
Assignee:	Eyal Shenitzky
QA Contact:	Ilan Zuckerman
Docs Contact:	bugs@ovirt.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-01 13:13 UTC by Alexander Vasilev
Modified:	2021-11-04 19:28 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2021-03-18 15:13:09 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	arvas: needinfo+ pm-rhel: ovirt-4.4+ pchavva: planning_ack? pchavva: devel_ack? pchavva: testing_ack?

Attachments	(Terms of Use)
Validation logs (667.39 KB, application/zip) 2021-03-07 09:19 UTC, Ilan Zuckerman	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	113021	master	MERGED	constants.py: add backup scratch disk type to content type	2021-02-08 13:21:22 UTC
oVirt gerrit	113022	master	MERGED	backup.py: allow using scratch disks on shared storage	2021-02-09 22:19:20 UTC
oVirt gerrit	113112	master	MERGED	core: add backup scratch disk type to content type	2021-02-17 15:15:18 UTC
oVirt gerrit	113113	master	MERGED	core: create VM backup scratch disks on shared storage	2021-02-17 15:15:24 UTC
oVirt gerrit	113114	master	MERGED	core: remove VM backup scratch disks when the backup finalized	2021-02-17 15:15:28 UTC
oVirt gerrit	113116	master	MERGED	core: introduce CreateScratchDiskCommand	2021-02-17 15:15:20 UTC
oVirt gerrit	113117	master	MERGED	core: introduce CreateScratchDisksCommand	2021-02-17 15:15:22 UTC
oVirt gerrit	113512	master	MERGED	core: introduce RemoveScratchDisksCommand	2021-02-17 15:15:26 UTC
oVirt gerrit	113586	master	MERGED	core: cleanup created scratch disks in case of failure	2021-02-18 11:15:59 UTC

Description Alexander Vasilev 2020-09-01 13:13:25 UTC

Description of problem:
After starting the backup, a COW snapshot is created in /var/run/vdsm/backup/ on the hypervisor's local filesystem. If the VM is writing intensive, the disk may become full and VM will be paused.
In addition, VM disk performance degrades.

Version-Release number of selected component (if applicable):
rhvh 4.4.1

Steps to Reproduce:
1. Start writing on VM disk
2. Start VM backup by python3 ./backup_vm.py full   --engine-url https://engine-url   --username admin@internal   --cafile ca.pem   --backup-dir /backup   vm-id
3. file /var/run/vdsm/backup/<id> created and growning
4. /var become full

Actual results:
VM is in pause during IO error

Expected results:
COW snapshot is created on the VM storage

Comment 1 RHEL Program Management 2020-09-03 06:35:44 UTC

The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Nir Soffer 2020-09-07 14:35:43 UTC

(In reply to Alexander Vasilev from comment #0)
> Description of problem:
> After starting the backup, a COW snapshot is created in
> /var/run/vdsm/backup/ on the hypervisor's local filesystem. If the VM is
> writing intensive, the disk may become full and VM will be paused.
> In addition, VM disk performance degrades.

VM performance is expected to degrade while copying modified 
data to the scratch disk. Maybe in this case the local disk performance
is specially bad?

> 
> Version-Release number of selected component (if applicable):
> rhvh 4.4.1
> 
> Steps to Reproduce:
> 1. Start writing on VM disk
> 2. Start VM backup by python3 ./backup_vm.py full   --engine-url
> https://engine-url   --username admin@internal   --cafile ca.pem  
> --backup-dir /backup   vm-id
> 3. file /var/run/vdsm/backup/<id> created and growning

This is a unix socket and it is not growing, but maybe you
mean the transient disk under:

  /var/lib/vdsm/storage/transient_disks/

> 4. /var become full
> 
> Actual results:
> VM is in pause during IO error
> 
> Expected results:
> COW snapshot is created on the VM storage

This is not implemented yet, and not easy to implement with block
storage, since we need a way to track writes to the scratch disk 
to extend it.

Does it happen in real backup? What the the throughput seen during
the backup? Is this real VM used normally or some kind of extreme
test?

Comment 3 Alexander Vasilev 2020-09-08 03:55:18 UTC

> VM performance is expected to degrade while copying modified 
> data to the scratch disk. Maybe in this case the local disk performance
> is specially bad?
Yes.
VM disks are on SSD and NVME Enterprise level storages, because it required by service. Hypervisors has a small 10K sas local disks for system only. I do not expect that local hypervisor disks to be a bottleneck for my virtual machines.

> This is a unix socket and it is not growing, but maybe you
> mean the transient disk under:
> 
>   /var/lib/vdsm/storage/transient_disks/
yes, maybe this is my mistake in the description


> This is not implemented yet, and not easy to implement with block
> storage, since we need a way to track writes to the scratch disk 
> to extend it.
> 
Perhaps one dedicated high perfomance shared storage is needed to mount to hypervisors for these purposes?

> Does it happen in real backup? What the the throughput seen during
> the backup? Is this real VM used normally or some kind of extreme
> test?
I described a simple extreme example in topic, but i have a real risk because of this implementation. And VM performance degradation.

Comment 4 Nir Soffer 2020-09-08 09:15:04 UTC

Alexander, I need more info to understand the severity and priority
of this issue.

Was this a normal backup of a real VM, or a stress test simulating
extreme case?

I need more info on the VM and the hypervisor

Please provide:
- VM XML (sudo virsh -r dumpxml vm-name)
- vdsm log (/var/log/vdsm/vdsm.log) showing the backup in which the VM was pasued
- ovirt-imageio daemon log (/var/log/ovirt-imageio/daemon.log) showing the transfer
- Complete output of backup_vm.py command
- output of df -h
- output of "lvs --readonly storage-domain-uuid" if the disk was on block storage
  you may need to add --config 'devices { filter = [ "a|.*|"] }'  if you have lvm
  filter configured.
- description of I/O done in the VM during the backup
- output of iostat (or similar tool) showing the I/O inside the guest during the backup.

Regarding future solution, if we create the scratch disk on the same storage of
the original disk, would it work for your use case?

I assume that if you have a disk on high end storage, you want the scratch disk to
be on the same storage to limit the performance degradation during backup. If you 
keep another disk on lower end storage, you probably don't want to use your best
storage for the scratch disk for the other disk.

Comment 5 Alexander Vasilev 2020-09-08 11:03:32 UTC

(In reply to Nir Soffer from comment #4)
> Alexander, I need more info to understand the severity and priority
> of this issue.
> 
> Was this a normal backup of a real VM, or a stress test simulating
> extreme case?
> 
Stress test simulating in this case

> I need more info on the VM and the hypervisor
> 
> Please provide:
> - VM XML (sudo virsh -r dumpxml vm-name)
> - vdsm log (/var/log/vdsm/vdsm.log) showing the backup in which the VM was
> pasued
> - ovirt-imageio daemon log (/var/log/ovirt-imageio/daemon.log) showing the
> transfer
> - Complete output of backup_vm.py command
> - output of df -h
> - output of "lvs --readonly storage-domain-uuid" if the disk was on block
> storage
>   you may need to add --config 'devices { filter = [ "a|.*|"] }'  if you
> have lvm
>   filter configured.
> - description of I/O done in the VM during the backup
> - output of iostat (or similar tool) showing the I/O inside the guest during
> the backup.
> 
A bit later, but is it necessary because the text is below?

> Regarding future solution, if we create the scratch disk on the same storage
> of
> the original disk, would it work for your use case?
Yes

> 
> I assume that if you have a disk on high end storage, you want the scratch
> disk to
> be on the same storage to limit the performance degradation during backup.
> If you 
> keep another disk on lower end storage, you probably don't want to use your
> best
> storage for the scratch disk for the other disk.
You're absolutely right!

Comment 6 Nir Soffer 2020-09-08 11:19:43 UTC

(In reply to Alexander Vasilev from comment #5)
> (In reply to Nir Soffer from comment #4)
> > I need more info on the VM and the hypervisor
> > 
> > Please provide:
> > - VM XML (sudo virsh -r dumpxml vm-name)
> > - vdsm log (/var/log/vdsm/vdsm.log) showing the backup in which the VM was
> > pasued
> > - ovirt-imageio daemon log (/var/log/ovirt-imageio/daemon.log) showing the
> > transfer
> > - Complete output of backup_vm.py command
> > - output of df -h
> > - output of "lvs --readonly storage-domain-uuid" if the disk was on block
> > storage
> >   you may need to add --config 'devices { filter = [ "a|.*|"] }'  if you
> > have lvm
> >   filter configured.
> > - description of I/O done in the VM during the backup
> > - output of iostat (or similar tool) showing the I/O inside the guest during
> > the backup.
> > 
> A bit later, but is it necessary because the text is below?

It will help to understand how likely is this issue with real environment.

Also, when you say that you experience performance degradation, how do you
measure it, and what are the results?

For example if you measure the performance by running fio in the guest,
writing data to the entire disk, your test will cause all disk contents
to be copied to the scratch disk. This is not real world use case.

A more relevant case is to measure the guest performance using typical
load expected during the backup window.

Comment 7 Yaning Wang 2020-09-14 08:28:26 UTC

is this `./backup_vm.py` refer to https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.py

Comment 8 Nir Soffer 2020-09-14 09:20:55 UTC

(In reply to Yaning Wang from comment #7)
> is this `./backup_vm.py` refer to
> https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.
> py

Yes, but this is just an example of using the API. The issue is the
implementation of scratch disks in vdsm.

Comment 9 Yaning Wang 2020-09-14 12:20:30 UTC

(In reply to Nir Soffer from comment #8)
> (In reply to Yaning Wang from comment #7)
> > is this `./backup_vm.py` refer to
> > https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.
> > py
> 
> Yes, but this is just an example of using the API. The issue is the
> implementation of scratch disks in vdsm.

thanks for info
so just to be clear, the `backup_vm.py` in my comment(#7) is the right one to use
to reproduce the bug

Comment 10 Nir Soffer 2020-10-19 10:24:43 UTC

(In reply to Yaning Wang from comment #9)
> thanks for info
> so just to be clear, the `backup_vm.py` in my comment(#7) is the right one
> to use
> to reproduce the bug

Yes, this is good way to test the backup APIs.

Comment 11 Nir Soffer 2020-10-27 14:26:33 UTC

Adding more info about this issue from backup partners, why this
bug is important to fix:

1. Unclear capacity planning for infrastructure. In addition to provide enough
   space and iops for vms from storage side, customer should think about proper
   size of var directory and this values isn’t easy for calculate.

2. For hypervisor system disk usually using slow storage. That leading us to
   potential bottleneck, because when vm will be in backup state, delta io will
   use this slow disk, and customer won't understand why his very fast storage
   not loaded but vms works slowly.

3. Simple point how we can easy reach OOS in real life - backup large file server
   or database server. Backup of this vms can take a hours. So, I really sure that
   such vms generate more that 15GB for that period, and we again got pause state.
   Also we should remember that for database server which generated large random
   IO, use system volume for scratch - became critical in performance question.

4. More complex way. Imagine that we backup 15 vms simultaneously. We generate
   15 times more data in var directory and 15 times more IO in this slow location.

5. Just fact, that we can stop production, because we backup them - very risky.
   For example, I don’t want that my domain controller will be stopped, because
   I backup it.

Raising priority.

Comment 13 Ilan Zuckerman 2021-03-07 09:16:14 UTC

While i was verifying this TC: RHEVM-27586 , I encountered the following problem:

Scratch disk remains locked (not being removed by the system) after VM is being paused due to lack of storage during the backup.

Initial Storage state of block SD on my env:
Total space: 74G
Free Space: 48G
Guaranteed: 48G

Steps to reproduce:
- Clone VM from template with thin OS disk (10G)
- Create Preallocated disk of 20G and add it to the VM + mount it:
  - device="/dev/"$(lsblk -o NAME,FSTYPE,TYPE -dsn | grep disk | awk '$3 == "" {print $1}')
  - parted $device mktable gpt -s
  - parted -a optimal $device mkpart primary 0% 100% -s
  - mkfs.ext4 $device"1"
  - mount -o discard,defaults $device"1" /mnt
  - echo UUID=$(blkid $device"1" -sUUID -ovalue) /mnt "ext4" "defaults" "0" "1" | tee -a /etc/fstab

- Create additional thin disk 20G on the same SD just to allocate some of the space on the SD
- At this point we still have some free space on SD to start the backup
- Start a full backup for a 20G disk on the VM
- Start DD on the backed up VM disk (open SSH for the VM and cd to the mount point of the disk)
  - dd if=/dev/zero of=big2.raw bs=4k iflag=fullblock,count_bytes count=10G
- If needed, repeat the above step, by making additional big file, till the point when the VM will be paused due to lack of storage on the SD:

2021-03-07 10:45:16,993+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [f59b33d] EVENT_ID: VM_PAUSED_ENOSPC(138), VM 26779 has been paused due to no Storage space error.

- Finalize the backup. At this point the VM will change it's state from 'paused' to 'up'

2021-03-07 10:48:41,709+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-25) [f59b33d] VM 'a2337d6e-9e94-46fe-a5bf-c0ac08b1ee4f'(26779) moved from 'Paused' --> 'Up'

- Now get back to the VM terminal and stop the DD command
- Notice that on the 'disks' tab on the engine UI, there is scratch disk which remained in the locked state, although the back up was finalized. The LV is also there:

[root@storage-ge13-vdsm1 ~]# lvs -o vg_name,lv_name,tags | grep 07d
  9db95765-0fb7-485e-91f2-381354a66d13 5561a136-4126-47dd-b722-b34c1a6277a7 IU_8b50b815-57b3-45b7-9348-698ec1a8a07d,MD_9,PU_00000000-0000-0000-0000-000000000000

- and its size is ~20G :

[root@storage-ge13-vdsm1 ~]# qemu-img measure /dev/9db95765-0fb7-485e-91f2-381354a66d13/5561a136-4126-47dd-b722-b34c1a6277a7 
required size: 21474836480
fully allocated size: 21474836480


Attaching engine log + vdsm (which is also the SPM) + VM xml dump + image of the 'disks' tab where you will find the locked scratch disk and the VM disks.

Comment 14 Ilan Zuckerman 2021-03-07 09:19:06 UTC

Created attachment 1761236 [details]
Validation logs

Comment 15 Ilan Zuckerman 2021-03-08 06:40:45 UTC

Verified on rhv-release-4.4.5-7 according the polarion test plan:
RHEVM-27577 Passed
RHEVM-27609 Passed
RHEVM-27615 Passed
RHEVM-27614 Partially passed. BZ opened to track this issue - "Scratch disk not removed if a VM goes to 'paused' state during the backup process"
https://bugzilla.redhat.com/1936185

Comment 16 Sandro Bonazzola 2021-03-18 15:13:09 UTC

This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.