Bug 2161557
| Summary: | Restore with --reset-nvram from a corrupt nvram failed for vm with vtpm after first restore without flag | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Meina Li <meili> |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
| libvirt sub component: | General | QA Contact: | Meina Li <meili> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | jdenemar, jsuchane, lcheng, lmen, mprivozn, virt-maint, yafu, yalzhang, yanqzhan |
| Version: | 9.2 | Keywords: | Automation, Regression, Triaged, Upstream |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-9.0.0-4.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-09 07:27:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | 9.1.0 |
| Embargoed: | |||
|
Description
Meina Li
2023-01-17 09:16:48 UTC
Patches posted on the list: https://listman.redhat.com/archives/libvir-list/2023-January/237400.html Another scenario to migrate a vm then migrate it back to source host can also reproduce this issue. And the scratch build in comment 2 fix it. Test on libvirt-9.0.0-2.el9.x86_64 to reproduce it: 1. Start vm with tpm device: # virsh dumpxml rhel --xpath //tpm <tpm model="tpm-crb"> <backend type="emulator" version="2.0"/> <alias name="tpm0"/> </tpm> # virsh start rhel Domain 'rhel' started # ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/ total 8.0K drwx------. 2 tss tss system_u:object_r:svirt_image_t:s0:c131,c932 42 Jan 28 03:09 . drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0 18 Jan 28 01:21 .. -rw-r-----. 1 tss tss system_u:object_r:svirt_image_t:s0:c131,c932 0 Jan 28 03:09 .lock -rw-------. 1 tss tss system_u:object_r:svirt_image_t:s0:c131,c932 6.0K Jan 28 03:09 tpm2-00.permall 2. Migrate the vm, then check the tpm .lock file on source host, the lable do not recover: # virsh migrate rhel --live --verbose qemu+ssh://xxx/system root@xxx's password: Migration: [100 %] # ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/ total 12K drwx------. 2 tss tss system_u:object_r:svirt_image_t:s0:c131,c932 42 Jan 28 03:10 . drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0 18 Jan 28 01:21 .. -rw-r-----. 1 tss tss system_u:object_r:svirt_image_t:s0:c131,c932 0 Jan 28 03:09 .lock -rw-------. 1 tss tss system_u:object_r:svirt_image_t:s0:c131,c932 9.1K Jan 28 03:10 tpm2-00.permall 3. on target host, try to migrate the vm back to the source host, but it failed # virsh migrate rhel --live --verbose qemu+ssh://yyy/system root@yyy's password: error: Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/.lock which is already in use check on source host, the lable is recovered: # ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/ total 12K drwx------. 2 tss tss system_u:object_r:virt_var_lib_t:s0 42 Jan 28 03:25 . drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0 18 Jan 28 01:21 .. -rw-r-----. 1 tss tss system_u:object_r:virt_var_lib_t:s0 0 Jan 28 03:24 .lock -rw-------. 1 tss tss system_u:object_r:virt_var_lib_t:s0 9.1K Jan 28 03:25 tpm2-00.permall try to migrate again, succeed. Update libvirt to the scratch build libvirt-9.0.0-3.el9_rc.dd8fc4c135.x86_64 and test again, the resule it as expected: After migration, the lable recover to be virt_var_lib_t and migrate back to source host succeed. # virsh start rhel Domain 'rhel' started # ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/ total 8.0K drwx------. 2 tss tss system_u:object_r:svirt_image_t:s0:c479,c1016 42 Jan 28 03:31 . drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0 18 Jan 28 01:21 .. -rw-r-----. 1 tss tss system_u:object_r:svirt_image_t:s0:c479,c1016 0 Jan 28 03:31 .lock -rw-------. 1 tss tss system_u:object_r:svirt_image_t:s0:c479,c1016 6.0K Jan 28 03:31 tpm2-00.permall # virsh migrate rhel qemu+ssh://xxx/system --live --verbose root@xxx's password: Migration: [100 %] # ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/ total 12K drwx------. 2 tss tss system_u:object_r:virt_var_lib_t:s0 42 Jan 28 03:33 . drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0 18 Jan 28 01:21 .. -rw-r-----. 1 tss tss system_u:object_r:virt_var_lib_t:s0 0 Jan 28 03:31 .lock -rw-------. 1 tss tss system_u:object_r:virt_var_lib_t:s0 9.1K Jan 28 03:33 tpm2-00.permall Then migrate it back to source host, succeed. (In reply to yalzhang from comment #3) > Another scenario to migrate a vm then migrate it back to source host can > also reproduce this issue. > And the scratch build in comment 2 fix it. Perfect! Thank you for preliminary testing! Merged upstream as: 5c4007ddc6 qemuProcessLaunch: Tighten rules for external devices wrt incoming migration 794fddf866 qemuExtTPMStop: Restore TPM state label more often 88f0fbf638 qemuProcessStop: Fix detection of outgoing migration for external devices v9.0.0-181-g5c4007ddc6 Test Version: libvirt-9.0.0-4.el9.x86_64 qemu-kvm-7.2.0-8.el9.x86_64 Test steps: Test the scenarios in Description and comment 3. Test Result: PASS Test Version: libvirt-9.0.0-6.el9.x86_64 qemu-kvm-7.2.0-9.el9.x86_64 Test Steps: S1: Reset the NVRAM state when starting a managedsave guest with a tpm device. 1. Prepare a running guest and then managedsave it. # virsh start rhel Domain 'rhel' started # virsh managedsave rhel Domain 'rhel' state saved by libvirt 2. Modify the nvram file to make it invalid # echo > /var/lib/libvirt/qemu/nvram/rhel.fd 3. # virsh start rhel error: Failed to start domain 'rhel' error: internal error: process exited while connecting to monitor: 2023-02-20T03:36:02.209956Z qemu-kvm: system firmware block device has invalid size 512 2023-02-20T03:36:02.209991Z qemu-kvm: info: its size must be a non-zero multiple of 0x1000 ---------This is as expected. 4. Start the guest with --reset-nvram. # virsh start rhel --reset-nvram Domain 'rhel' started S2: Reset the NVRAM state when restoring a guest with a tpm device 1. Prepare a running guest and then save it. # virsh start rhel Domain 'rhel' started # virsh save rhel rhel.save 2. Modify the nvram file to make it invalid # echo > /var/lib/libvirt/qemu/nvram/rhel.fd 3. Restore the guest. # virsh restore rhel.save error: Failed to restore domain from rhel.save error: internal error: qemu unexpectedly closed the monitor: 2023-02-20T03:39:43.454613Z qemu-kvm: system firmware block device has invalid size 512 2023-02-20T03:39:43.454636Z qemu-kvm: info: its size must be a non-zero multiple of 0x1000 ---This is as expected 4. Restore the guest with --reset-nvram. # virsh restore rhel.save --reset-nvram Domain restored from rhel.save S3: Migrate the guest with tpm and migrate back according to comment 3 steps. All the test scenarios are passed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2171 |