Bug 1741456
| Summary: | Image cannot be used after blockcommit snapshots to base image and destroy/start vm | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | yisun | |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | |
| Status: | CLOSED ERRATA | QA Contact: | yisun | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 8.1 | CC: | dzheng, fjin, jdenemar, jsuchane, kchamart, lcheng, lmen, mprivozn, mtessun, toneata, xuzhang, yafu, yisun | |
| Target Milestone: | rc | Keywords: | Automation, Regression, Upstream, ZStream | |
| Target Release: | 8.0 | Flags: | knoel:
mirror+
|
|
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | libvirt-5.6.0-9.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1771501 (view as bug list) | Environment: | ||
| Last Closed: | 2020-02-04 18:28:48 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1652078, 1771501 | |||
Patches posted upstream: https://www.redhat.com/archives/libvir-list/2019-August/msg01418.html Fixed upstream as: 16fb3c8b83 qemu_blockjob: Remove secdriver metadata more frequently 7f99d8a739 qemu_blockjob: Print image path on failed security metadata move too 143a0f8b05 qemu_blockjob: Move active commit failed state handling into a function v5.7.0-rc2 The comment 0 scenario 2 is still failing, and this will make the base image /var/lib/libvirt/images/jeos-27-x86_64.qcow2 dirty, which will cause more failures for followed up cases: Changed back to ASSIGNED for now. (.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# virsh start avocado-vt-vm1 'Domain avocado-vt-vm1 started (.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# for i in {snap_1,snap_2}; do virsh snapshot-create-as avocado-vt-vm1 $i --disk-only; done Domain snapshot snap_1 created Domain snapshot snap_2 created (.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# virsh blockcommit avocado-vt-vm1 vda --wait --verbose --top vda[1] Block commit: [100 %] Commit complete (.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# virsh destroy avocado-vt-vm1; virsh start avocado-vt-vm1 Domain avocado-vt-vm1 destroyed Domain avocado-vt-vm1 started (.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/jeos-27-x86_64.qcow2 which is already in use Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2019-September/msg00600.html Another approach implemented (as requested in review): https://www.redhat.com/archives/libvir-list/2019-September/msg00621.html I've just pushed the fix upstream and backported it: http://post-office.corp.redhat.com/archives/rhvirt-patches/2019-September/msg01083.html There's also a scratch build with this patch applied: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23705370 It also contains patches I've proposed for bug 1740024 (but those are not reviewed upstream yet). Hi Michal, The scenario 2 in comment 0 is still reproducible, pls have a check. And folloiwng is a simpler way to reproduce it: 1. having a running vm (.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# virsh domblklist avocado-vt-vm1 Target Source ------------------------------------------------------------------------ vda /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 (.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# virsh domstate avocado-vt-vm1 running (.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# getfattr -m trusted.libvirt.security -d /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 <==== nothing now 2. create some external snapshots for it (.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# for i in {1..2}; do virsh snapshot-create-as avocado-vt-vm1 snap_$i snap1-desc --disk-only; done Domain snapshot snap_1 created Domain snapshot snap_2 created 3. do blockcommit WITHOUT --pivot (.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active Block commit: [100 %] Now in synchronized phase 4. now the image file having extended attrs as follow: (.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# getfattr -m trusted.libvirt.security -d /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 getfattr: Removing leading '/' from absolute path names # file: var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 trusted.libvirt.security.dac="+107:+107" trusted.libvirt.security.ref_dac="1" trusted.libvirt.security.ref_selinux="1" trusted.libvirt.security.selinux="system_u:object_r:svirt_image_t:s0:c229,c326" trusted.libvirt.security.timestamp_dac="1573791864" trusted.libvirt.security.timestamp_selinux="1573791864" 5. destroy the vm (.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# virsh destroy avocado-vt-vm1 Domain avocado-vt-vm1 destroyed 6. even the vm stopped, the file's xattrs still exsting, and if we "virsh edit $VM" to use the original image again, vm cannot be started. (.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# getfattr -m trusted.libvirt.security -d /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 getfattr: Removing leading '/' from absolute path names # file: var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 trusted.libvirt.security.dac="+107:+107" trusted.libvirt.security.ref_dac="1" trusted.libvirt.security.ref_selinux="1" trusted.libvirt.security.selinux="system_u:object_r:svirt_image_t:s0:c229,c326" trusted.libvirt.security.timestamp_dac="1573791864" trusted.libvirt.security.timestamp_selinux="1573791864" due to above comment, I'll set this back to ASSIGNED for now, and for automation scripts, I've submitted a PR to avoid other cases blocked if failure happened https://github.com/autotest/tp-libvirt/pull/2430 Patches proposed upstream for the issue mention in comment 19: https://www.redhat.com/archives/libvir-list/2019-November/msg00851.html Pushed upstream: 8fa0374c5b qemuProcessStop: Remove image metadata for running mirror jobs 1c12b86185 qemu: Separate image metadata removal into a function Verified reproduced with auto case on libvirt-5.6.0-8.virtcov.el8.x86_64 https://libvirt-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/libvirt/view/RHEL-8.1%20x86_64/job/libvirt-RHEL-8.1-runtest-x86_64-function-block_job_commit_pull/52/testReport/rhel.virsh/blockcommit/normal_test_single_chain_file_disk_local_no_ga_notimeout_nobase_top_active_without_pivot/ Fixed with auto case on libvirt-5.6.0-9.module+el8.1.1+4955+f0b25565.x86_64 https://libvirt-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/libvirt/view/RHEL-8.1%20x86_64/job/libvirt-RHEL-8.1-runtest-x86_64-function-block_job_commit_pull/53/testReport/rhel.virsh/blockcommit/normal_test_single_chain_file_disk_local_no_ga_notimeout_nobase_top_active_without_pivot/ And the whole test job has no regression failures (failed cases are not related to current bz) https://libvirt-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/libvirt/view/RHEL-8.1%20x86_64/job/libvirt-RHEL-8.1-runtest-x86_64-function-block_job_commit_pull/53/testReport/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0404 |
Image cannot be used after blockcommit snapshots to base image and destroy/start vm Versions: libvirt-5.6.0-1.module+el8.1.0+3890+4d3d259c.x86_64 qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.x86_64 How reproducible: 100% Scenario 1: Blockcommit from top to base with --active and without --pivot and restart vm 1. Having a running vm with a virtual disk = vda $ virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started $ virsh domblklist avocado-vt-vm1 Target Source ---------------------------------------------------------------- vda /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 2. Create a snapshot for the vm $ virsh snapshot-create-as avocado-vt-vm1 snap_1 --disk-only Domain snapshot snap_1 created 3. Do blockcommit to merge the snapshot to base image $ virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active Block commit: [100 %] Now in synchronized phase $ virsh blockjob avocado-vt-vm1 vda Active Block Commit: [100 %] 4. Abort the block job ( this step is optional, can be skipped. Without this step, the vm will be restarted with a active block job) $ virsh blockjob avocado-vt-vm1 vda --abort $ virsh blockjob avocado-vt-vm1 vda No current block job for vda 5. Restart the vm $ virsh destroy avocado-vt-vm1; virsh start avocado-vt-vm1 Domain avocado-vt-vm1 destroyed Domain avocado-vt-vm1 started 6. Now the base image cannot be used again. We can not do another blockcommit, or use it directly in this vm. $ virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 which is already in use Scenario 2: Blockcommit form middle to base and restart vm Following is the scenario used in our auto case. 1. Having a running vm with a virtual disk = vda $ virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started $ virsh domblklist avocado-vt-vm1 Target Source ---------------------------------------------------------------- vda /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 2. Create 2 disk-only snapshots $ for i in {snap_1,snap_2}; do virsh snapshot-create-as avocado-vt-vm1 $i --disk-only; done Domain snapshot snap_1 created Domain snapshot snap_2 created 3. Do blockcommit from middle image to base image $ virsh blockcommit avocado-vt-vm1 vda --wait --verbose --top vda[1] Block commit: [100 %] Commit complete 4. Destroy and start the vm $ virsh destroy avocado-vt-vm1; virsh start avocado-vt-vm1 Domain avocado-vt-vm1 destroyed Domain avocado-vt-vm1 started 5. 5.1 Try to do a blockcommit to merge everything to base image. $ virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 which is already in use 5.2 virsh edit the vm to use /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 as source file of vda and start vm $ virsh dumpxml avocado-vt-vm1 | awk '/<disk/,/<\/disk/' <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> $ virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 which is already in use Expected result: Original img should have no trouble to be used in scenario1_step6 and scenario2_step5