Description: [incremental_backup] RFE: backup job could provide more detailed info when backup failed or provide --wating --verbose as blockcopy does @Peter, pls help to confirm if this is worth implementing Here we used a low space backup target to generate a backup failure. Since the backup in push mode is a async cmd, no error happens. And with 'virsh event' or 'virsh domjobinfo --completed', there is no clear message about why backup failure. But actually, qmp returns a clear error message. Could we add something like 'blockcopy --wating --verbose' to backup cmd? Or could we export qmp error message to user in 'virsh event' or 'domjobinfo --completed'? Versions: libvirt-6.0.0-9.module+el8.2.0+5957+7ae8988e.x86_64 How reproducible: 100% Steps: 1. Prepare a small block device, here I used 100M iscsi disk. And vm's vda taking around 1.6G data [root@hp-dl320eg8-05 images]# lsblk | grep sde sde 8:64 0 100M 0 disk /mnt 2. Use a file as backup target [root@hp-dl320eg8-05 images]# mount | grep mnt /dev/sde on /mnt type ext4 (rw,relatime,seclabel,errors=continue,stripe=8192) 3. Prepare checkpoint and backup xml [root@hp-dl320eg8-05 push]# cat no_space_backup_full.xml <domainbackup mode='push'> <disks> <disk name='vda' type='file'> <target file='/mnt/vda.full.backup'/> <driver type='qcow2'/> </disk> <disk name='vdb' backup='no'/> </disks> </domainbackup> [root@hp-dl320eg8-05 push]# cat checkpoint_push_full.xml <domaincheckpoint> <name>check_full</name> </domaincheckpoint> 4. Do backup [root@hp-dl320eg8-05 push]# virsh backup-begin vm1 no_space_backup_full.xml checkpoint_push_full.xml Backup started <==== This seems to be an async job, so virsh returned immediately [root@hp-dl320eg8-05 push]# virsh domjobinfo vm1 --completed Job type: Failed Operation: Backup 5. Monitor the vm's event in another terminal, it showed the backup failed [root@hp-dl320eg8-05 push]# virsh event vm1 --all --loop event 'job-completed' for domain vm1: operation: 9 time_elapsed: 2074 disk_total: 10737418240 disk_processed: 104529920 disk_remaining: 10632888320 success: 0 event 'block-job' for domain vm1: Backup for /var/lib/libvirt/images/base_os.active failed event 'block-job-2' for domain vm1: Backup for vda failed 6. Actually we received an clear error from qmp, but the error not exported to any user interface. [root@hp-dl320eg8-05 push]# cat /var/log/libvirtd-debug.log | grep -i "no space" 2020-03-12 09:15:01.686+0000: 50756: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"timestamp": {"seconds": 1584004501, "microseconds": 685667}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "backup-vda-libvirt-2-format", "len": 10737418240, "offset": 104529920, "speed": 0, "type": "backup", "error": "No space left on device"}}] 2020-03-12 09:15:01.686+0000: 50756: debug : virJSONValueFromString:1844 : string={"timestamp": {"seconds": 1584004501, "microseconds": 685667}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "backup-vda-libvirt-2-format", "len": 10737418240, "offset": 104529920, "speed": 0, "type": "backup", "error": "No space left on device"}} 2020-03-12 09:15:01.686+0000: 50756: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7f808c048af0 event={"timestamp": {"seconds": 1584004501, "microseconds": 685667}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "backup-vda-libvirt-2-format", "len": 10737418240, "offset": 104529920, "speed": 0, "type": "backup", "error": "No space left on device"}} 2020-03-12 09:15:01.687+0000: 50756: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": [{"current-progress": 104529920, "status": "concluded", "total-progress": 10737418240, "type": "backup", "id": "backup-vda-libvirt-2-format", "error": "No space left on device"}], "id": "libvirt-434"}] 2020-03-12 09:15:01.687+0000: 50756: debug : virJSONValueFromString:1844 : string={"return": [{"current-progress": 104529920, "status": "concluded", "total-progress": 10737418240, "type": "backup", "id": "backup-vda-libvirt-2-format", "error": "No space left on device"}], "id": "libvirt-434"} 2020-03-12 09:15:01.687+0000: 50756: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7f808c048af0 reply={"return": [{"current-progress": 104529920, "status": "concluded", "total-progress": 10737418240, "type": "backup", "id": "backup-vda-libvirt-2-format", "error": "No space left on device"}], "id": "libvirt-434"}
Fixed upstream: commit b37fdfb9d4ead0a01377837e02ecedee48337c97 Author: Peter Krempa <pkrempa> Date: Thu Apr 16 11:23:07 2020 +0200 backup: Store error message for failed backups If a backup job fails midway it's hard to figure out what happened as it's running asynchronous. Use the VIR_DOMAIN_JOB_ERRMSG job statistics field to pass through the error from the first failed backup-blockjob so that both the consumer of the virDomainGetJobStats and the corresponding event can see the error. event 'job-completed' for domain backup-test: operation: 9 time_elapsed: 46 disk_total: 104857600 disk_processed: 10158080 disk_remaining: 94699520 success: 0 errmsg: No space left on device virsh domjobinfo backup-test --completed --anystats Job type: Failed Operation: Backup Time elapsed: 46 ms File processed: 9.688 MiB File remaining: 90.312 MiB File total: 100.000 MiB Error message: No space left on device https://bugzilla.redhat.com/show_bug.cgi?id=1812827 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake> commit 75a66f1076548c240e675976d690835aadb6e79c Author: Peter Krempa <pkrempa> Date: Thu Apr 16 11:21:51 2020 +0200 qemu: domain: Add 'errmsg' field to qemuDomainJobInfo The field can be used by jobs to add an optional error message to a completed (failed) job. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake> commit 4f39774610cac5e8de019b0be2aec2548b1acfd0 Author: Peter Krempa <pkrempa> Date: Wed Apr 15 12:27:53 2020 +0200 API: Add VIR_DOMAIN_JOB_ERRMSG domain job statistics field In some cases it's useful to report the error which caused the domain job to fail. Add an optional field for holding the error message so that it can be later retrieved from statistics of a completed job. Add the field name macro and code for extracting it in virsh. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake> commit 72186f9c8c63983c55adb5194dda739ddc0f4043 Author: Peter Krempa <pkrempa> Date: Thu Mar 26 17:55:00 2020 +0100 qemu: Add free and copy function for qemuDomainJobInfo and use it In order to add a string to qemuDomainJobInfo we must ensure that it's freed and copied properly. Add helpers to copy and free the structure and adjust the code to use them properly for the new semantics. Additionally also allocation is changed to g_new0 as it includes the type and thus it's very easy to grep for all the allocations of a given type. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake> commit e77a0941419554f203febe43ba8cc1c5ac05753e Author: Peter Krempa <pkrempa> Date: Thu Apr 16 11:14:00 2020 +0200 remote: remoteDispatchDomainGetJobStats: Encode typed parameter strings String typed parameter values were introduced in v0.9.7-30-g40624d32fb. virDomainGetJobStats was introduced in v1.0.2-239-g4dd00f4238 so all clients already support typed parameter stings at that time thus we can enable it unconditionally. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake>
Test with libvirt-6.6.0-2.module+el8.3.0+7567+dc41c0a9.x86_64 Result: PASS 1. prepare a 100M block device and mount it to /mnt/ (I used iscsi for convinence) yum install targetcli -y mkdir -p /var/tmp/avocado_glngTO/ dd if=/dev/zero of=/var/tmp/avocado_glngTO/logical-pool count=100 bs=1024K setenforce permissive targetcli /backstores/fileio/ create device.logical-pool /var/tmp/avocado_glngTO/logical-pool targetcli /iscsi/ create iqn.2016-03.com.virttest:logical-pool.target targetcli /iscsi/iqn.2016-03.com.virttest:logical-pool.target/tpg1/portals ls targetcli /iscsi/iqn.2016-03.com.virttest:logical-pool.target/tpg1/luns/ create /backstores/fileio/device.logical-pool setenforce enforcing targetcli /iscsi/iqn.2016-03.com.virttest:logical-pool.target/tpg1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1 targetcli / saveconfig #systemctl restart iscsid.service iscsiadm -m discovery -t sendtargets -p 127.0.0.1 iscsiadm --mode node --login --targetname iqn.2016-03.com.virttest:logical-pool.target --portal 127.0.0.1 iscsiadm --mode session iscsiadm -m session -P 3 root@dell-per730-67 ~]# lsscsi ... [11:0:0:0] disk LIO-ORG device.logical- 4.0 /dev/sdb [root@dell-per730-67 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ... sdb 8:16 0 100M 0 disk [root@dell-per730-67 ~]# mkfs.ext4 /dev/sdb -F mke2fs 1.45.6 (20-Mar-2020) Creating filesystem with 102400 1k blocks and 25688 inodes Filesystem UUID: 8afda60d-40b7-4a67-8e3b-fc5e7ebd14d0 Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729 Allocating group tables: done Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done [root@dell-per730-67 ~]# mount /dev/sdb /mnt/ 2. Using a vm with disk image having more than 100M data [root@dell-per730-67 ~]# virsh domblklist vm1 Target Source -------------------------------------------------------- vda /var/lib/libvirt/images/jeos-27-x86_64.qcow2 [root@dell-per730-67 ~]# qemu-img info /var/lib/libvirt/images/jeos-27-x86_64.qcow2 -U image: /var/lib/libvirt/images/jeos-27-x86_64.qcow2 file format: qcow2 virtual size: 10 GiB (10737418240 bytes) disk size: 900 MiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false 3. Prepare checkpoint and backup xml [root@dell-per730-67 ~]# cat backup.xml <domainbackup mode='push'> <disks> <disk name='vda' type='file'> <target file='/mnt/vda.full.backup'/> <driver type='qcow2'/> </disk> </disks> </domainbackup> [root@dell-per730-67 ~]# cat checkpoint.xml <domaincheckpoint> <name>check_full</name> </domaincheckpoint> 4. Start a event monitor on another terminal [root@dell-per730-67 ~]# virsh event vm1 --all --loop 5. Do the backup [root@dell-per730-67 ~]# virsh backup-begin vm1 backup.xml checkpoint.xml Backup started 6. Check the event monitor and domjobinfo output [root@dell-per730-67 ~]# virsh event vm1 --all --loop event 'job-completed' for domain vm1: operation: 9 time_elapsed: 444 disk_total: 10737418240 disk_processed: 89325568 disk_remaining: 10648092672 success: 0 errmsg: No space left on device <=== Err msg is clear event 'block-job' for domain vm1: Backup for /var/lib/libvirt/images/jeos-27-x86_64.qcow2 failed event 'block-job-2' for domain vm1: Backup for vda failed [root@dell-per730-67 ~]# virsh domjobinfo vm1 --completed --anystats Job type: Failed Operation: Backup Time elapsed: 411 ms File processed: 85.188 MiB File remaining: 9.917 GiB File total: 10.000 GiB Error message: No space left on device <=== Err msg is clear
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137