Bug 2111433

Summary: Failed to restore vm after creating a snapshot for a booting vm with vtpm device
Product: Red Hat Enterprise Linux 8 Reporter: Lili Zhu <lizhu>
Component: libtpmsAssignee: Marc-Andre Lureau <marcandre.lureau>
Status: CLOSED ERRATA QA Contact: Qinghua Cheng <qcheng>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.7CC: coli, fjin, jinzhao, lmen, marcandre.lureau, qcheng, smitterl, stefanb, tzheng, virt-maint, xiaohli, xuzhang, yanqzhan, yfu
Target Milestone: rcKeywords: AutomationTriaged, Triaged
Target Release: 8.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libtpms-0.9.1-1.20211126git1ff6fe1f43.module+el8.7.0+16161+74369000 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2035731 Environment:
Last Closed: 2022-11-08 09:20:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2035731    
Bug Blocks:    

Description Lili Zhu 2022-07-27 10:12:07 UTC
Tested with:
libvirt-daemon-8.0.0-9.module+el8.7.0+15830+85788ab7.x86_64
qemu-kvm-6.2.0-16.module+el8.7.0+15743+c774064d.x86_64
libtpms-0.9.1-0.20211126git1ff6fe1f43.module+el8.6.0+13725+61ae1949.x86_64.rpm

1. start a guest with vtpm device, and create an external snapshot immediately
# virsh start rhel8.7-q35; virsh snapshot-create-as rhel8.7-q35 s2 --memspec file=/tmp/rhel8.7-q35.mem --diskspec vda,file=/tmp/rhel8.7-q35.s2
Domain 'rhel8.7-q35' started

Domain snapshot s2 created


2. upgrade the host to RHEL9


3. restore the guest from snapshot
# virsh restore /tmp/rhel8.7-q35.mem rhel8.7-q35.xml 
error: Failed to restore domain from /tmp/rhel8.7-q35.mem
error: internal error: qemu unexpectedly closed the monitor: 2022-07-27T09:48:32.577737Z qemu-kvm: Machine type 'pc-q35-rhel8.6.0' is deprecated: machine types for previous major
2022-07-27T09:48:37.613513Z qemu-kvm: -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
2022-07-27T09:48:37.640415Z qemu-kvm: warning: netdev channel1 has no peer
2022-07-27T09:48:38.663843Z qemu-kvm: tpm-emulator: Setting the stateblob (type 2) failed with a TPM error 0x3 a parameter is bad
2022-07-27T09:48:38.663869Z qemu-kvm: error while loading state for instance 0x0 of device 'tpm-emulator'
2022-07-27T09:48:38.848779Z qemu-kvm: load of migration failed: Input/output error

Actual results:
Failed to restore vm

Additional info:
1. This bug can reproduced in RHEL8.7 with the steps in Bug #2035731
2. The guest also can not be restored from the snapshot before upgrading. 
3. As it is suggested to create snapshots for the the guests before upgrading:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html-single/upgrading_from_rhel_8_to_rhel_9/index
I think this bug would have bigger impact in the scenario of upgrade from RHEL8 to RHEL9.


+++ This bug was initially created as a clone of Bug #2035731 +++

Description of problem:
Start a vm with vtpm device, then save the vm immediately. Restoring vm will fail.

Version-Release number of selected component (if applicable):
# rpm -q libvirt qemu-kvm libtpms swtpm
libvirt-7.10.0-1.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64
libtpms-0.9.1-0.20211126git1ff6fe1f43.el9.x86_64
swtpm-0.7.0-1.20211109gitb79fd91.el9.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Start a vm with vtpm device, and save it immediately
# virsh start vm1; virsh managedsave vm1
    <tpm model='tpm-crb'>
      <backend type='emulator' version='2.0'/>
    </tpm>

2. Restore the vm:
# virsh start vm1
error: Failed to start domain 'vm1'
error: internal error: qemu unexpectedly closed the monitor: 2021-12-27T08:52:14.261131Z qemu-kvm: tpm-emulator: Setting the stateblob (type 2) failed with a TPM error 0x3 a parameter is bad
2021-12-27T08:52:14.261145Z qemu-kvm: error while loading state for instance 0x0 of device 'tpm-emulator'
2021-12-27T08:52:14.261235Z qemu-kvm: load of migration failed: Input/output error


3. Remove state file, try to restore vm again:
# rm /var/lib/libvirt/swtpm/505ee98d-e9af-4597-9eff-168e66f5f6ce/tpm2/tpm2-00.permall
rm: remove regular file '/var/lib/libvirt/swtpm/505ee98d-e9af-4597-9eff-168e66f5f6ce/tpm2/tpm2-00.permall'? y

# virsh start vm1
error: Failed to start domain 'vm1'
error: internal error: qemu unexpectedly closed the monitor: 2021-12-27T09:05:28.928729Z qemu-kvm: tpm-emulator: Setting the stateblob (type 2) failed with a TPM error 0x3 a parameter is bad
2021-12-27T09:05:28.928743Z qemu-kvm: error while loading state for instance 0x0 of device 'tpm-emulator'
2021-12-27T09:05:28.928832Z qemu-kvm: load of migration failed: Input/output error


Actual results:
Failed to restore vm


Expected results:
Restore vm successfully


Additional info:
1. Can't reproduce if save vm after vm is fully booted up.
2. swtpm log:
# cat /var/log/swtpm/libvirt/qemu/vm1-swtpm.log

libtpms/tpm2: STATE_RESET_DATA: s_ContextSlotMask has bad value: 0x0000
Data client disconnected
libtpms/tpm2: STATE_RESET_DATA: s_ContextSlotMask has bad value: 0x0000
Data client disconnected

--- Additional comment from Qinghua Cheng on 2022-01-04 10:21:03 UTC ---

We don't reproduce it with Win11 and Win2022 guests

Environment:
libtpms-0.9.1-0.20211126git1ff6fe1f43.el9.x86_64
swtpm-0.7.0-1.20211109gitb79fd91.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64
libvirt-7.10.0-1.el9.x86_64
edk2-ovmf-20210527gite1999b264f1f-7.el9.noarch

--- Additional comment from Stefan Berger on 2022-01-04 20:10:32 UTC ---

The fix for this is in a PR here now: https://github.com/stefanberger/libtpms/pull/287

--- Additional comment from Marc-Andre Lureau on 2022-01-06 14:17:07 UTC ---

Unfortunately, the series from Stefan B. is stuck at this point. Last iteration I could find is "[PATCH v3 1/3] selftests: tpm2: Probe for available PCR bank External"

--- Additional comment from Stefan Berger on 2022-01-06 14:19:49 UTC ---

Patches haven't received a verdict from the kernel test maintainer yet: https://lkml.org/lkml/2021/12/23/681

--- Additional comment from Yanan Fu on 2022-01-14 03:28:09 UTC ---

Hi there,

We i test migration with Win11 + tpm device,  if src and dst vm use same tpm daemon by mistake, can hit the same error info:

18:23:27 INFO | [qemu output] qemu-kvm: tpm-emulator: Setting the stateblob (type 2) failed with a TPM error 0x84
18:23:27 INFO | [qemu output] qemu-kvm: error while loading state for instance 0x0 of device 'tpm-emulator'
18:23:27 INFO | [qemu output] qemu-kvm: load of migration failed: Input/output error
18:23:27 INFO | [qemu output] (Process terminated with status 1)


Just for a reference, thanks!

--- Additional comment from Stefan Berger on 2022-01-14 15:06:37 UTC ---

(In reply to Yanan Fu from comment #5)
> Hi there,
> 
> We i test migration with Win11 + tpm device,  if src and dst vm use same tpm
> daemon by mistake, can hit the same error info:

What is 'same tpm daemon by mistake'? 

> 
> 18:23:27 INFO | [qemu output] qemu-kvm: tpm-emulator: Setting the stateblob
> (type 2) failed with a TPM error 0x84
> 18:23:27 INFO | [qemu output] qemu-kvm: error while loading state for
> instance 0x0 of device 'tpm-emulator'
> 18:23:27 INFO | [qemu output] qemu-kvm: load of migration failed:
> Input/output error
> 18:23:27 INFO | [qemu output] (Process terminated with status 1)
> 
> 
> Just for a reference, thanks!

Is this with the fix applied to libptms on both machines or without?

--- Additional comment from Yanan Fu on 2022-01-14 15:35:03 UTC ---

(In reply to Stefan Berger from comment #6)
> (In reply to Yanan Fu from comment #5)
> > Hi there,
> > 
> > We i test migration with Win11 + tpm device,  if src and dst vm use same tpm
> > daemon by mistake, can hit the same error info:
> 
> What is 'same tpm daemon by mistake'? 


It is for local migration test:
1. src vm, qemu cli:

    -chardev socket,id=char_vtpm_tpm0,path=/tmp/avocado_5l5j2ys4/avocado-vt-vm1_tpm0_swtpm.sock \    <--- here
    -tpmdev emulator,chardev=char_vtpm_tpm0,id=emulator_vtpm_tpm0 \
    -device tpm-crb,id=tpm-crb_vtpm_tpm0,tpmdev=emulator_vtpm_tpm0 \


2. dst vm qemu cli:
    -chardev socket,id=char_vtpm_tpm0,path=/tmp/avocado_5l5j2ys4/avocado-vt-vm1_tpm0_swtpm.sock \    <--- here
    -tpmdev emulator,chardev=char_vtpm_tpm0,id=emulator_vtpm_tpm0 \
    -device tpm-crb,id=tpm-crb_vtpm_tpm0,tpmdev=emulator_vtpm_tpm0 \

I hit the same error when use same 'path' for both src and dst vm.
This is incorrect usage,  so i say 'by mistake'.

The correct usage is using unique path for each one, in this way, it works normally.

> 
> > 
> > 18:23:27 INFO | [qemu output] qemu-kvm: tpm-emulator: Setting the stateblob
> > (type 2) failed with a TPM error 0x84
> > 18:23:27 INFO | [qemu output] qemu-kvm: error while loading state for
> > instance 0x0 of device 'tpm-emulator'
> > 18:23:27 INFO | [qemu output] qemu-kvm: load of migration failed:
> > Input/output error
> > 18:23:27 INFO | [qemu output] (Process terminated with status 1)
> > 
> > 
> > Just for a reference, thanks!
> 
> Is this with the fix applied to libptms on both machines or without?

Without any fix.

--- Additional comment from Stefan Berger on 2022-01-14 20:14:21 UTC ---

Anyway, thanks for finding the bug!

--- Additional comment from RHEL Program Management on 2022-06-20 13:13:11 UTC ---

The release+ flag was dropped due to a missing devel_ack+ and/or qa_ack+

--- Additional comment from Marc-Andre Lureau on 2022-06-20 13:14:12 UTC ---

please qa ack, thanks

--- Additional comment from errata-xmlrpc on 2022-06-21 12:03:35 UTC ---

This bug has been added to advisory RHBA-2022:96997 by Marc-Andre Lureau (mlureau)

--- Additional comment from errata-xmlrpc on 2022-06-21 12:03:52 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2022:96997-01
https://errata.devel.redhat.com/advisory/96997

--- Additional comment from Yanan Fu on 2022-06-24 02:48:02 UTC ---

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

--- Additional comment from Qinghua Cheng on 2022-06-28 03:20:26 UTC ---

Verified for on rhel 9.1

kernel: 5.14.0-114.el9.x86_64
qemu: qemu-kvm-7.0.0-6.el9.x86_64
libtpms-0.9.1-2.20211126git1ff6fe1f43.el9.x86_64
swtpm-0.7.0-3.20211109gitb79fd91.el9.x86_64
edk2-ovmf-20220526git16779ede2d36-1.el9.noarch

guest: win11

Looped the actions 

virsh start <vm> (save it immediately)
virsh managedsave <vm>
virsh start <vm>

40 times, guest vm works normally. 

Set bug status to verified.

Comment 7 Yanan Fu 2022-08-01 02:35:03 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 10 Qinghua Cheng 2022-08-08 02:28:11 UTC
Verified on rhel 8.7

kernel: 4.18.0-414.el8.x86_64
qemu: qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64
libtpms: libtpms-0.9.1-1.20211126git1ff6fe1f43.module+el8.7.0+16161+74369000.x86_64
swtpm-0.7.0-4.20211109gitb79fd91.module+el8.7.0+15999+d24f860e.x86_64

Verified by virsh commands:

virsh start vm1; virsh managedsave vm1

Guest: Win10

Error message is not reproduced.

Comment 12 errata-xmlrpc 2022-11-08 09:20:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7472