2161557 – Restore with --reset-nvram from a corrupt nvram failed for vm with vtpm after first restore without flag

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2161557 - Restore with --reset-nvram from a corrupt nvram failed for vm with vtpm after first restore without flag

Summary: Restore with --reset-nvram from a corrupt nvram failed for vm with vtpm after...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	9.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Privoznik
QA Contact:	Meina Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-01-17 09:16 UTC by Meina Li
Modified:	2023-05-09 08:11 UTC (History)
CC List:	9 users (show)
Fixed In Version:	libvirt-9.0.0-4.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-05-09 07:27:45 UTC
Type:	Bug
Target Upstream Version:	9.1.0
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-145288	0	None	None	None	2023-01-17 09:21:23 UTC
Red Hat Product Errata	RHBA-2023:2171	0	None	None	None	2023-05-09 07:28:15 UTC

Description Meina Li 2023-01-17 09:16:48 UTC

Description of problem:
Restore with --reset-nvram from a corrupt nvram failed for vm with vtpm device after first restore without --reset-nvram

Version-Release number of selected component (if applicable):
libvirt-9.0.0-1.el9.x86_64
qemu-kvm-7.2.0-4.el9.x86_64
swtpm-0.8.0-1.el9.x86_64 
libtpms-0.9.1-2.20211126git1ff6fe1f43.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Prepare a running guest and then save it.
# virsh start avocado-vt-vm1
Domain 'avocado-vt-vm1' started
# virsh dumpxml avocado-vt-vm1 | xmllint --xpath //tpm -
<tpm model="tpm-crb">
  <backend type="emulator" version="2.0"/>
</tpm>
# virsh save avocado-vt-vm1 test.save
# ll -Z /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/.lock 
-rw-r-----. 1 tss tss system_u:object_r:virt_var_lib_t:s0 0 Jan 17 04:11 /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/.lock
# ll -Z /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/tpm2-00.permall 
-rw-------. 1 tss tss system_u:object_r:virt_var_lib_t:s0 9292 Jan 17 04:11 /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/tpm2-00.permall

2.Modify the nvram file to make it invalid
# echo > /var/lib/libvirt/qemu/nvram/avocado-vt-vm1_VARS.fd                         

3.Restore the guest.
# virsh restore test.save
error: Failed to restore domain from test.save
error: internal error: qemu unexpectedly closed the monitor: 2023-01-17T02:07:22.685132Z qemu-kvm: system firmware block device  has invalid size 512
2023-01-17T02:07:22.685185Z qemu-kvm: info: its size must be a non-zero multiple of 0x1000
# ll -Z /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/tpm2-00.permall 
-rw-------. 1 tss tss system_u:object_r:svirt_image_t:s0:c254,c279 9292 Jan 17 04:11 /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/tpm2-00.permall
# ll -Z /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/.lock 
-rw-r-----. 1 tss tss system_u:object_r:svirt_image_t:s0:c254,c279 0 Jan 17 04:11 /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/.lock

4. Restore the guest with --reset-nvram.
# virsh restore test.save --reset-nvram
error: Failed to restore domain from test.save
error: Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/tpm2-00.permall which is already in use.
# ll -Z /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/.lock 
-rw-r-----. 1 tss tss system_u:object_r:svirt_image_t:s0:c254,c279 0 Jan 17 04:11 /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/.lock
# ll -Z /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/tpm2-00.permall 
-rw-------. 1 tss tss system_u:object_r:svirt_image_t:s0:c254,c279 9292 Jan 17 04:11 /var/lib/libvirt/swtpm/d8567b91-de85-4e29-8703-6b6907017245/tpm2/tpm2-00.permall

Actual results:
The test failed as above step 4.

Expected results:
Can reset the nvram state successfully.

Additional info:
1) Start a managedsave guest with --reset-nvram also failed according to the above configuration.
2) The guest can start successfully if the nvram file of it doesn't be modified.

Comment 1 Michal Privoznik 2023-01-27 15:05:35 UTC

Patches posted on the list:

https://listman.redhat.com/archives/libvir-list/2023-January/237400.html

Comment 3 yalzhang@redhat.com 2023-01-28 08:35:38 UTC

Another scenario to migrate a vm then migrate it back to source host can also reproduce this issue.
And the scratch build in comment 2 fix it.

Test on libvirt-9.0.0-2.el9.x86_64 to reproduce it:
1. Start vm with tpm device:
# virsh dumpxml rhel --xpath //tpm
<tpm model="tpm-crb">
  <backend type="emulator" version="2.0"/>
  <alias name="tpm0"/>
</tpm>

# virsh start rhel
Domain 'rhel' started

# ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/
total 8.0K
drwx------. 2 tss  tss  system_u:object_r:svirt_image_t:s0:c131,c932   42 Jan 28 03:09 .
drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0            18 Jan 28 01:21 ..
-rw-r-----. 1 tss  tss  system_u:object_r:svirt_image_t:s0:c131,c932    0 Jan 28 03:09 .lock
-rw-------. 1 tss  tss  system_u:object_r:svirt_image_t:s0:c131,c932 6.0K Jan 28 03:09 tpm2-00.permall

2. Migrate the vm, then check the tpm .lock file on source host, the lable do not recover:
# virsh migrate rhel --live --verbose qemu+ssh://xxx/system 
root@xxx's password: 
Migration: [100 %]

# ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/
total 12K
drwx------. 2 tss  tss  system_u:object_r:svirt_image_t:s0:c131,c932   42 Jan 28 03:10 .
drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0            18 Jan 28 01:21 ..
-rw-r-----. 1 tss  tss  system_u:object_r:svirt_image_t:s0:c131,c932    0 Jan 28 03:09 .lock
-rw-------. 1 tss  tss  system_u:object_r:svirt_image_t:s0:c131,c932 9.1K Jan 28 03:10 tpm2-00.permall

3. on target host, try to migrate the vm back to the source host, but it failed
# virsh migrate rhel --live --verbose qemu+ssh://yyy/system 
root@yyy's password: 
error: Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/.lock which is already in use

check on source host, the lable is recovered:
#  ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/
total 12K
drwx------. 2 tss  tss  system_u:object_r:virt_var_lib_t:s0   42 Jan 28 03:25 .
drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0   18 Jan 28 01:21 ..
-rw-r-----. 1 tss  tss  system_u:object_r:virt_var_lib_t:s0    0 Jan 28 03:24 .lock
-rw-------. 1 tss  tss  system_u:object_r:virt_var_lib_t:s0 9.1K Jan 28 03:25 tpm2-00.permall

try to migrate again, succeed.

Update libvirt to the scratch build libvirt-9.0.0-3.el9_rc.dd8fc4c135.x86_64 and test again, the resule it as expected:
After migration, the lable recover to be virt_var_lib_t and migrate back to source host succeed.

# virsh start rhel 
Domain 'rhel' started

# ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/
total 8.0K
drwx------. 2 tss  tss  system_u:object_r:svirt_image_t:s0:c479,c1016   42 Jan 28 03:31 .
drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0             18 Jan 28 01:21 ..
-rw-r-----. 1 tss  tss  system_u:object_r:svirt_image_t:s0:c479,c1016    0 Jan 28 03:31 .lock
-rw-------. 1 tss  tss  system_u:object_r:svirt_image_t:s0:c479,c1016 6.0K Jan 28 03:31 tpm2-00.permall

# virsh migrate rhel qemu+ssh://xxx/system --live --verbose 
root@xxx's password: 
Migration: [100 %]

# ll -Zha /var/lib/libvirt/swtpm/aebe7136-bf38-4056-87e4-2e9ca65c3dd1/tpm2/
total 12K
drwx------. 2 tss  tss  system_u:object_r:virt_var_lib_t:s0   42 Jan 28 03:33 .
drwx--x--x. 3 root root system_u:object_r:virt_var_lib_t:s0   18 Jan 28 01:21 ..
-rw-r-----. 1 tss  tss  system_u:object_r:virt_var_lib_t:s0    0 Jan 28 03:31 .lock
-rw-------. 1 tss  tss  system_u:object_r:virt_var_lib_t:s0 9.1K Jan 28 03:33 tpm2-00.permall

Then migrate it back to source host, succeed.

Comment 4 Michal Privoznik 2023-01-31 08:53:00 UTC

(In reply to yalzhang from comment #3)
> Another scenario to migrate a vm then migrate it back to source host can
> also reproduce this issue.
> And the scratch build in comment 2 fix it.

Perfect! Thank you for preliminary testing!

Comment 5 Michal Privoznik 2023-02-06 15:34:56 UTC

Merged upstream as:

5c4007ddc6 qemuProcessLaunch: Tighten rules for external devices wrt incoming migration
794fddf866 qemuExtTPMStop: Restore TPM state label more often
88f0fbf638 qemuProcessStop: Fix detection of outgoing migration for external devices

v9.0.0-181-g5c4007ddc6

Comment 8 Meina Li 2023-02-10 07:17:00 UTC

Test Version:
libvirt-9.0.0-4.el9.x86_64
qemu-kvm-7.2.0-8.el9.x86_64

Test steps:
Test the scenarios in Description and comment 3.

Test Result:
PASS

Comment 11 Meina Li 2023-02-20 03:55:15 UTC

Test Version:
libvirt-9.0.0-6.el9.x86_64
qemu-kvm-7.2.0-9.el9.x86_64

Test Steps:
S1: Reset the NVRAM state when starting a managedsave guest with a tpm device.
1. Prepare a running guest and then managedsave it.
# virsh start rhel
Domain 'rhel' started
# virsh managedsave rhel
Domain 'rhel' state saved by libvirt
2. Modify the nvram file to make it invalid
# echo > /var/lib/libvirt/qemu/nvram/rhel.fd
3. # virsh start rhel
error: Failed to start domain 'rhel'
error: internal error: process exited while connecting to monitor: 2023-02-20T03:36:02.209956Z qemu-kvm: system firmware block device has invalid size 512
2023-02-20T03:36:02.209991Z qemu-kvm: info: its size must be a non-zero multiple of 0x1000
---------This is as expected.

4. Start the guest with --reset-nvram.
# virsh start rhel --reset-nvram
Domain 'rhel' started

S2: Reset the NVRAM state when restoring a guest with a tpm device
1. Prepare a running guest and then save it.
# virsh start rhel
Domain 'rhel' started
# virsh save rhel rhel.save
2. Modify the nvram file to make it invalid
# echo > /var/lib/libvirt/qemu/nvram/rhel.fd
3. Restore the guest.
# virsh restore rhel.save
error: Failed to restore domain from rhel.save
error: internal error: qemu unexpectedly closed the monitor: 2023-02-20T03:39:43.454613Z qemu-kvm: system firmware block device has invalid size 512
2023-02-20T03:39:43.454636Z qemu-kvm: info: its size must be a non-zero multiple of 0x1000
---This is as expected
4. Restore the guest with --reset-nvram.
# virsh restore rhel.save --reset-nvram
Domain restored from rhel.save

S3: Migrate the guest with tpm and migrate back according to comment 3 steps.

All the test scenarios are passed.

Comment 13 errata-xmlrpc 2023-05-09 07:27:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171

Note You need to log in before you can comment on or make changes to this bug.