Descrition: [incremental_backup] After restart libvirtd, pull mode backup with tls enabled causing qemu crashed Versions: qemu-kvm-5.0.0-2.module+el8.3.0+7379+0505d6ca.x86_64 libvirt-6.6.0-2.module+el8.3.0+7567+dc41c0a9.x86_64 How reproducible: 100% Steps: 0. we should have CA signed serverside cert to support the tls enabled backup, detailed info about how to prepare CA/Server/Client keys/certs can refert to: http://pastebin.test.redhat.com/895075 1. vm has 2 disks, will use vdb to reproduce this issue (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# virsh domblklist vm1 Target Source -------------------------------------------------------- vda /var/lib/libvirt/images/jeos-27-x86_64.qcow2 vdb /var/lib/libvirt/images/vdb.qcow2 2. no speical setting for backup tls: (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# cat /etc/libvirt/qemu.conf | grep "#.*backup_tls.*=" # backup_tls_x509_cert_dir = "/etc/pki/libvirt-backup" # backup_tls_x509_verify = 1 # backup_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000" 3. prepare the backup xml with vdb 'tls="yes"' (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# cat backup_full.xml <domainbackup mode="pull"> <server name="dell-per730-67.lab.eng.pek2.redhat.com" port="10809" tls="yes"/> <disks> <disk backup="no" name="vda" /> <disk backup="yes" name="vdb" type="file"> <scratch file="/tmp/scratch_file_0" /> </disk> </disks> </domainbackup> 4. clear libvirtd log (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# echo "" > /var/log/libvirtd-debug.log 5. start the first backup, it's ok (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# virsh backup-begin vm1 backup_full.xml Backup started 6. restart libvirtd daemon (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# systemctl restart libvirtd 7. abort the backup job of step 5 (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# virsh domjobabort vm1 8. Start the backup job again (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# virsh backup-begin vm1 backup_full.xml error: internal error: unexpected async job 7 type expected 0 <=== qemu process crashed 9. gdb backtrace of `pidof /usr/libexec/qemu-kvm` can be found in attchment named "gdb-qemu-kvm-vm1.txt" libvirtd log can be found in attachment named "libvirtd-debug.log" Additional info: 1. if tls not enabled, nothing wrong: (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# cat backup.xml <domainbackup mode="pull"> <server name="dell-per730-67.lab.eng.pek2.redhat.com" port="10809"/> <disks> <disk backup="no" name="vda" /> <disk backup="yes" name="vdb" type="file"> <scratch file="/tmp/scratch_file_0" /> </disk> </disks> </domainbackup> (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# virsh backup-begin vm1 backup.xml; systemctl restart libvirtd; virsh domjobabort vm1; virsh backup-begin vm1 backup.xml Backup started Backup started 2. if not restart libvirtd, nothing wrong (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# cat backup_full.xml <domainbackup mode="pull"> <server name="dell-per730-67.lab.eng.pek2.redhat.com" port="10809" tls="yes"/> <disks> <disk backup="no" name="vda" /> <disk backup="yes" name="vdb" type="file"> <scratch file="/tmp/scratch_file_0" /> </disk> </disks> </domainbackup> (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# virsh backup-begin vm1 backup_full.xml; virsh domjobabort vm1; virsh backup-begin vm1 backup_full.xml; virsh domjobabort vm1; virsh backup-begin vm1 backup_full.xml Backup started Backup started Backup started 3. After qemu crash, even if we restart vm and restart libvirtd, the full backup will always provide wrong data 3.1 vdb has 123MB disk size (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# qemu-img info /var/lib/libvirt/images/vdb.qcow2 -U image: /var/lib/libvirt/images/vdb.qcow2 file format: qcow2 virtual size: 1 GiB (1073741824 bytes) disk size: 123 MiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false 3.2 start the backup job again after the qemu crash (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# virsh backup-begin vm1 backup_full.xml Backup started 3.3 dump backup data to local image from nbd exprot (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# qemu-img convert -O qcow2 --object tls-creds-x509,id=tls0,endpoint=client,dir=/etc/pki/libvirt-backup 'json:{"file":{"driver":"nbd", "server":{"host":"dell-per730-67.lab.eng.pek2.redhat.com", "port":10809, "type":"inet"}, "export":"vdb", "tls-creds":"tls0"}}' test.qcow2 3.4 the test.qcow2 only has 196KB disk size, but not 123MB (.libvirt-ci-venv-ci-runtest-z1MFOW) [root@dell-per730-67 ~]# qemu-img info test.qcow2 image: test.qcow2 file format: qcow2 virtual size: 1 GiB (1073741824 bytes) disk size: 196 KiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false
Created attachment 1711978 [details] gdb-qemu-kvm-vm1.txt
Created attachment 1711979 [details] libvirtd-debug.log
This is a qemu crash, but involved a libvirtd restart, so set component to 'libvirt' for now. If it's a qemu issue after debug, pls help to move to qemu team, thx
The qemu proces abort()s because libvirt didn't delete the TLS_x509 and secret objects when aborting the backup job after restart of libvirtd as their aliases were not written out to the status XML. Note that upstream qemu now reports an error rather than abort()-ing.
Fixed upstream: commit 1a5f35dbd2c4d83f7629579bcd8b23929a492b29 Author: Peter Krempa <pkrempa> Date: Mon Sep 14 17:59:07 2020 +0200 qemu: backup: Write TLS cert and secret object aliases into status XML We've put the aliases into the backup job definition after the status XML was already written so they didn't appear in the on-disk state. Move the code putting them into the private definition earlier, so that the status XML update done by saving blockjobs already writes them out. Also add a note notifying that the block job status update writes the status XML. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1870488 Fixes: 423576679a5 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> commit 5058062b5daa6d841154eda7f6a53c39d64e765e Author: Peter Krempa <pkrempa> Date: Mon Sep 14 17:58:09 2020 +0200 qemu: backup: Remove note that TLS should be implemented Commit 423576679a5 implementing TLS forgot to remove the comment. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> commit 6c2d91118dc99426a79bf48c8d795e243c522dbd Author: Peter Krempa <pkrempa> Date: Mon Sep 14 17:46:42 2020 +0200 qemustatusxml2xml: backup-pull: Test private data formatting/parsing Modify the test case to enable TLS and add private data containing aliases of objects corresponding to a TLS setup. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko>
Verified with: libvirt-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64 Result: PASS [root@dell-per740xd-10 ~]# cat backup_full.xml <domainbackup mode="pull"> <server name="dell-per740xd-10.lab.eng.pek2.redhat.com" port="10809" tls="yes"/> <disks> <disk backup="no" name="vda" /> <disk backup="yes" name="vdb" type="file"> <scratch file="/tmp/scratch_file_0" /> </disk> </disks> </domainbackup> [root@dell-per740xd-10 ~]# virsh backup-begin vm1 backup_full.xml Backup started [root@dell-per740xd-10 ~]# systemctl restart libvirtd [root@dell-per740xd-10 ~]# virsh domjobinfo vm1 Job type: Unbounded Operation: Backup Time elapsed: 7529 ms Temporary disk space use: 0.000 B Temporary disk space total: 5.000 GiB [root@dell-per740xd-10 ~]# virsh domjobabort vm1 [root@dell-per740xd-10 ~]# virsh list Id Name State ---------------------- ... 5 vm1 running [root@dell-per740xd-10 ~]# virsh backup-begin vm1 backup_full.xml Backup started [root@dell-per740xd-10 ~]# virsh domjobinfo vm1 Job type: Unbounded Operation: Backup Time elapsed: 16057 ms Temporary disk space use: 0.000 B Temporary disk space total: 5.000 GiB [root@dell-per740xd-10 ~]# virsh domjobabort vm1 [root@dell-per740xd-10 ~]# virsh list Id Name State ---------------------- ... 5 vm1 running
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137