2087538 – [TPM] Guest will hang on "Please enter passphrase for disk QEMU_HARDDISK" for more than 10 mins after repeat "system_reset" several times

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2087538 - [TPM] Guest will hang on "Please enter passphrase for disk QEMU_HARDDISK" for more than 10 mins after repeat "system_reset" several times

Summary: [TPM] Guest will hang on "Please enter passphrase for disk QEMU_HARDDISK" for...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	swtpm
Sub Component:
Version:	9.1
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	9.2
Assignee:	Marc-Andre Lureau
QA Contact:	Qinghua Cheng
Docs Contact:
URL:
Whiteboard:
Depends On:	2092944
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-18 01:50 UTC by Yihuang Yu
Modified:	2023-05-09 09:05 UTC (History)
CC List:	10 users (show)
Fixed In Version:	swtpm-0.8.0-1.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-05-09 07:47:50 UTC
Type:	---
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
vTPM log (9.54 MB, application/zip) 2022-05-26 14:13 UTC, Yihuang Yu	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-122401	0	None	None	None	2022-05-18 01:56:07 UTC
Red Hat Product Errata	RHBA-2023:2348	0	None	None	None	2023-05-09 07:48:08 UTC

Description Yihuang Yu 2022-05-18 01:50:08 UTC

Description of problem:
Install a guest with LUKs system and decrypt it via vTPM policy. If boot up the guest and repeat "system_reset" several times, the guest will hang on "Please enter passphrase for disk QEMU_HARDDISK" for more than 10 mins and enter the system.

Version-Release number of selected component (if applicable):
Host kernel: kernel-5.14.0-90.el9.aarch64
Guest kernel: kernel-5.14.0-86.el9.aarch64
qemu version: qemu-kvm-7.0.0-3.el9.aarch64
edk2 version: edk2-aarch64-20220221gitb24306f15d-1.el9.noarch
swtpm version: swtpm-0.7.0-2.20211109gitb79fd91.el9.aarch64
libtpms version: libtpms-0.9.1-0.20211126git1ff6fe1f43.el9.aarch64
openssl version: openssl-3.0.1-27.el9_0.aarch64

How reproducible:
Always

Steps to Reproduce:
1. Setup the vTPM socket
swtpm_setup --tpm2 --tpm-state /root/avocado/data/avocado-vt/swtpm/tpm0_state --createek --create-ek-cert --create-platform-cert --lock-nvram --not-overwrite

swtpm socket --ctrl type=unixio,path=/root/avocado/data/avocado-vt/swtpm/tpm0_swtpm.sock,mode=0600 --tpmstate dir=/root/avocado/data/avocado-vt/swtpm/tpm0_state,mode=0600 --terminate --tpm2

2. Install a guest using kickstart which contains the following lines:
autopart --encrypted --type=lvm --passphrase=kvmautotest
......
......
%packages --ignoremissing
clevis-dracut
clevis-luks
clevis-systemd
%end

%post
luks_dev_list=`blkid -t TYPE=crypto_LUKS -o device`
for luks_dev in ${luks_dev_list}
do
    clevis luks bind -f -y -k - -d ${luks_dev} tpm2 '{"pcr_bank":"sha256","pcr_ids":"7"}' <<< "kvmautotest"
done
......
......
%end

3. After the guest is installed, shut down the guest and re-launch the qemu instance with the same vTPM socket
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \
    -blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel910-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \
    -machine virt,gic-version=host,memory-backend=mem-machine_mem,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0 \
    -m 8192 \
    -object memory-backend-ram,size=8192M,id=mem-machine_mem  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'host' \
    -serial unix:'/tmp/serial-serial0',server=on,wait=off \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel910-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
    -device virtio-net-pci,mac=9a:45:54:ba:c4:5c,rombar=0,id=idAEQLJb,netdev=idPxZveC,bus=pcie-root-port-4,addr=0x0  \
    -netdev tap,id=idPxZveC,vhost=on  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew \
    -chardev socket,id=char_vtpm_tpm0,path=/root/avocado/data/avocado-vt/swtpm/tpm0_swtpm.sock \
    -tpmdev emulator,chardev=char_vtpm_tpm0,id=emulator_vtpm_tpm0 \
    -device tpm-tis-device,id=tpm-tis-device_vtpm_tpm0,tpmdev=emulator_vtpm_tpm0 \
    -enable-kvm \
    -monitor stdio

4. Issue "system_reset" from QMP/HMP and then wait for the guest to start up, repeat this step several times
(qemu) system_reset
dhcp156-126 login: UEFI firmware starting.
SyncPcrAllocationsAndPcrMask!
Tpm2GetCapabilityPcrs - 00000004

(qemu) system_reset
dhcp156-126 login: UEFI firmware starting.
SyncPcrAllocationsAndPcrMask!
Tpm2GetCapabilityPcrs - 00000004

(qemu) system_reset
dhcp156-126 login: UEFI firmware starting.
SyncPcrAllocationsAndPcrMask!
Tpm2GetCapabilityPcrs - 00000004

Actual results:
The guest will hang on "Please enter passphrase for disk QEMU_HARDDISK" for more than 10 mins and enter the system.

[  OK  ] Started Device-Mapper Multipath Device Controller.

Please enter passphrase for disk QEMU_HARDDISK (luks-9e62b180-74c6-421c-b21e-3cdcd296947a)::

Expected results:
The guest can enter the system after short decryption.

Additional info:
I suppose the problem is that "system_reset" will restart the guest directly without stopping some system services, so "tpm2_shutdown" is also missed.

Comment 7 Stefan Berger 2022-05-25 22:23:43 UTC

I haven't worked much with aarch64 and don't have hardware for it.

Would it be possible to repeat the test on x86_64 to create a baseline for us to see whether this is an aarch64-only problem or also exists on x86_64?

Comment 8 Eric Auger 2022-05-26 10:06:59 UTC

(In reply to Stefan Berger from comment #7)
> I haven't worked much with aarch64 and don't have hardware for it.
> 
> Would it be possible to repeat the test on x86_64 to create a baseline for
> us to see whether this is an aarch64-only problem or also exists on x86_64?

Hi Stefan, our QE showed that the problem also exists on x86_64. 
After system_reset, the guest vm is blocked for ~20 min at:
Please enter passphrase for disk QEMU_HARDDISK....

Comment 9 Stefan Berger 2022-05-26 12:06:22 UTC

Can we share the disk you have installed somehow so I can repeat this myself here. With --log [file=/tmp/tpm/log,]level=20 you could display what is going on with the communication with swtpm, what the last commands were and what responses were sent back. There's nothing in swtpm that would take 20 minutes.

Is this problem only repeatable via monitor command system_reset or does it also happen if a clean start of the VM is done?

Comment 10 Yihuang Yu 2022-05-26 14:13:55 UTC

Created attachment 1883853 [details]
vTPM log

(In reply to Stefan Berger from comment #9)
> Can we share the disk you have installed somehow so I can repeat this myself
> here. With --log [file=/tmp/tpm/log,]level=20 you could display what is
> going on with the communication with swtpm, what the last commands were and
> what responses were sent back. There's nothing in swtpm that would take 20
> minutes.
I just use a kickstart which contains some content as I mentioned in comment0, and I am sorry I cannot provide a public IP to connect the ENV. Also, the disk is more than 4G, so I cannot find net storage to share it and the swtpm state file.
> 
> Is this problem only repeatable via monitor command system_reset or does it
> also happen if a clean start of the VM is done?
I retested this scenario and here is some info.

swtpm_setup --tpm2 --tpm-state /root/avocado/data/avocado-vt/swtpm/avocado-vt-vm1_tpm0_state --createek --create-ek-cert --create-platform-cert --lock-nvram --not-overwrite

swtpm socket --ctrl type=unixio,path=/root/avocado/data/avocado-vt/swtpm/avocado-vt-vm1_tpm0_swtpm.sock,mode=0600 --tpmstate dir=/root/avocado/data/avocado-vt/swtpm/avocado-vt-vm1_tpm0_state,mode=0600 --terminate --tpm2 --log file=/tmp/vtpm.log,level=20

During the guest hangs on "Please enter passphrase for disk QEMU_HARDDISK", I can see similar logs like the following. It's my fault that forgot to upload the vtpm log for debugging, attaching here.
 SWTPM_IO_Read: length 468
 80 01 00 00 01 D4 00 00 01 61 00 00 00 00 00 00
 1D C3 80 00 00 00 40 00 00 01 01 B8 00 40 FE D0
 70 A3 44 EB EE D6 94 48 55 F2 0E BF 01 1D 2A BD
 5D 06 5C 05 DA ED CC 45 EE 5F 03 FD C4 C1 10 C3
 CE 71 B6 DD 72 AE 34 E5 FD 86 36 A9 6C 31 A2 B5
 F3 27 AD 81 09 62 33 6C A1 60 F6 91 2C 49 B8 E6
 3E 31 4B 0E 64 03 C1 E3 0A 38 9D D6 19 B3 6A 09
 33 7E 1D 78 EE 3C 31 C9 99 84 05 D4 73 C8 49 A3
 47 6C E2 62 A1 ED 7B F4 17 2B 22 A8 42 4C 19 98
 A3 C9 B7 40 CA 69 32 17 5B AE 0E DF 10 E4 61 6C
 F9 CE ED 67 74 DA 6E CA 03 DE BC AF 4A AA 01 34
 27 3B 9E 84 21 42 47 20 B6 BE 59 A4 9C FF CF 5E
 61 31 E6 81 FE 79 BD 10 6A F7 AE 2C EB FE 85 76
 3A A4 3B BC 8F 14 41 E9 7A 20 49 10 86 7E 68 F3
 BC 97 96 15 B8 B6 E8 57 E9 CF 08 E9 B5 50 8D CB
 6B D5 12 84 4A DB 37 BD 18 72 F4 88 E3 62 A7 94
 E3 23 3D B0 89 3C 73 87 FA EA 63 37 70 D8 9C EF
 B3 1E 6E 2A BC 94 52 6E 97 E7 5B EB 27 79 CA 36
 FF 3D E8 64 35 CD FA 9C F1 D0 80 4C 1E CC 5D 87
 05 6C E9 4F 4E DB 13 07 ED 19 BF 83 21 69 91 E3
 C2 7D B3 D0 D1 BC 2B 5C C3 EC A1 C1 0C B6 0E 27
 B7 95 D0 17 BC 8B 92 D0 38 3E A2 BE CD 1A F3 B4
 EC 9C A2 7D 3E 02 5C 35 B1 66 68 30 06 9F D7 A7
 18 5D B2 E7 B8 81 2F 37 53 18 A4 45 E0 0F D4 90
 17 47 62 43 51 69 15 60 B6 E9 BB 6C 6F D1 55 F9
 BA 8C A5 AE 90 40 0B 28 9E 08 49 02 D6 B6 2B BF
 10 33 DA 91 5D 2C AF F9 4F 8B 1B 97 8D 61 FA 7B
 4F 3A 50 63 21 F6 92 83 4B D7 6D 8A BE 09 8D 19
 FF 41 DE 6C E7 7B 16 31 E1 7F CC 82 DB AF F8 D5
 7F BF 44 01
 SWTPM_IO_Write: length 14
 80 01 00 00 00 0E 00 00 00 00 80 00 00 00
 SWTPM_IO_Read: length 414
 80 01 00 00 01 9E 00 00 01 61 00 00 00 00 00 00
 19 86 02 00 00 00 40 00 00 07 01 82 00 40 C3 BF
 50 71 E2 1B D4 04 E8 8A 5F 56 1C 5F C2 2E CB 49
 76 CD 24 51 27 6A D7 CF C6 00 92 9F 86 B9 AF 35
 ED 48 94 4A D0 85 14 B3 05 D8 25 4A 8C E3 FE 17
 33 DA D2 CA E0 CA 32 C3 17 BA 73 6F 62 A3 F4 27
 4D AF DB A2 F7 CB BF 27 6E 98 37 E4 60 25 68 DA
 96 8B F3 BE 7E F6 DC C4 58 91 A9 F2 D5 36 E6 B0
 C4 BF 56 3E 39 23 34 DD 25 BA 8B 5F 32 1A 8D 99
 A9 55 03 2A 35 CB CB 9F 89 31 E0 85 94 B4 39 C0
 E1 82 4D AF 1E FD B4 66 95 BE CA B1 80 86 BC 55
 9C 13 D1 63 61 84 50 21 A3 64 2A 56 AE 51 73 48
 52 F1 54 EC 59 63 65 23 FA CE D6 44 AD A3 EB AD
 79 82 EB 2F 18 C6 40 CF D2 EC 2D 92 BC 87 6E 18
 8F B5 3B B9 21 B3 19 3B 93 4E 47 B9 65 F4 E8 14
 87 F1 EE AC D0 66 D3 7E 28 17 A8 10 2C 7E 8B D8
 5E 48 52 E5 E3 D2 30 C8 AA D5 6C A4 4B ED 10 E0
 82 54 3D A2 A2 73 49 4C 51 FC 70 79 F1 97 DE 62
 D6 E9 11 62 DF 22 C2 96 0B C0 D9 C3 7B 59 40 4C
 52 D4 9A 2C 58 D9 2B 8A 30 A4 82 6D D5 6B A2 43
 DA 2C 1E 20 DF 36 BB E4 D5 25 F9 B3 75 D5 BC 3A
 A5 38 22 39 04 D4 29 09 65 A1 1F 3E D7 43 D4 77
 7A 09 CB A9 DB 2F 85 84 76 76 8A C1 9D 8F 30 01
 ED 54 74 9D 9F 3D E0 A4 4C 5E BC 2C 06 3C 34 40
 2E 91 D3 BE D2 A7 27 63 62 A7 93 25 8E 0A 76 C0
 A2 5B 20 68 93 F0 7D 9C 16 40 E5 1C 46 EC
 SWTPM_IO_Write: length 14
 80 01 00 00 00 0E 00 00 00 00 02 00 00 00
 SWTPM_IO_Read: length 14
 80 01 00 00 00 0E 00 00 01 65 02 00 00 00
 SWTPM_IO_Write: length 10
 80 01 00 00 00 0A 00 00 00 00
 SWTPM_IO_Read: length 14
 80 01 00 00 00 0E 00 00 01 62 80 00 00 00
 SWTPM_IO_Write: length 468

Comment 11 Stefan Berger 2022-05-26 17:59:27 UTC

Thanks for the trace. What I would be curious is what the log shows at the time when the VM gets stuck for 20 minutes.


I am trying to reproduce this on one of my existing systems. Once I run QEMU 7.0 and try to reset it (hmp system_reset) sooner or later QEMU gets stuck completely and doesn't execute anymore at all. This may be due to my old hardware and I am currently running Fedora 36 to reproduce this issue. I see this fault here on my system:

[ 2028.129517] BUG: kernel NULL pointer dereference, address: 000000000000000b
[ 2028.129524] #PF: supervisor write access in kernel mode
[ 2028.129527] #PF: error_code(0x0002) - not-present page
[ 2028.129529] PGD 0 P4D 0
[ 2028.129532] Oops: 0002 [#1] PREEMPT SMP PTI
[ 2028.129535] CPU: 10 PID: 37305 Comm: qemu-system-x86 Tainted: G          I       5.17.9-300.fc36.x86_64 #1
[ 2028.129538] Hardware name: IBM BladeCenter HS22 -[7870AC1]-/59Y5682, BIOS -[P9E165BUS-1.29]- 06/07/2018
[ 2028.129540] RIP: 0010:kvm_replace_memslot+0xc0/0x380 [kvm]
[ 2028.129611] Code: 04 00 00 48 85 c0 0f 84 3b 02 00 00 48 89 d9 48 c1 e1 04 48 01 c1 48 8b 71 08 48 85 f6 74 1e 48 8b 39 48 89 3e 48 85 ff 74 04 <48> 89 77 08 48 c7 01 00 00 00 00 48 c7 41 08 00 00 00 00 48 8d 0c
[ 2028.129614] RSP: 0018:ffffb1dc87387d90 EFLAGS: 00010206
[ 2028.129617] RAX: ffff910842074a00 RBX: 0000000000000000 RCX: ffff910842074a00
[ 2028.129618] RDX: 0000000000000001 RSI: ffffb1dc88ecd970 RDI: 0000000000000003
[ 2028.129620] RBP: 0000000000000000 R08: 0000000008000000 R09: 0000000000000000
[ 2028.129622] R10: 0000000000000008 R11: 000000000000000a R12: 0000000000000000
[ 2028.129623] R13: 0000000000000001 R14: 0000000000000860 R15: ffffb1dc88ecd000
[ 2028.129625] FS:  00007f58bbfe4fc0(0000) GS:ffff910727d80000(0000) knlGS:0000000000000000
[ 2028.129628] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2028.129629] CR2: 000000000000000b CR3: 0000000180842000 CR4: 00000000000026e0
[ 2028.129632] Call Trace:
[ 2028.129635]  <TASK>
[ 2028.129636]  ? _raw_read_unlock+0x18/0x30
[ 2028.129636]  kvm_set_memslot+0x335/0x490 [kvm]
[ 2028.129636]  kvm_vm_ioctl+0x2a7/0xd40 [kvm]
[ 2028.129636]  ? __rseq_handle_notify_resume+0x93/0x440
[ 2028.129636]  ? __seccomp_filter+0x29b/0x4c0
[ 2028.129636]  __x64_sys_ioctl+0x8d/0xc0
[ 2028.129636]  do_syscall_64+0x3a/0x80
[ 2028.129636]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Comment 12 Stefan Berger 2022-05-26 20:22:34 UTC

On another machine with an older Fedora install I have been able to reproduce the following with:

- QEMU 7.0
- swtpm 0.7.0
- libtpms 0.9.1

After several resets (via virt-manager 'force reset') at various points during the boot the VM does not get past the point of prompting for the LUKS password anymore that was previously automated via Clevis/Tang and TPM 2 and required a few seconds wait time normally.

I then did the following as root after entering the password and booting to the prompt:

# clevis luks unbind -d /dev/vda2 -s 1
# clevis luks bind -d /dev/vda2 tpm2 '{}'
Enter existing LUKS password:
Warning: Value 512 is outside of the allowed entropy range, adjusting it.
WARNING:esys:src/tss2-esys/api/Esys_Create.c:375:Esys_Create_Finish() Received TPM Error
ERROR:esys:src/tss2-esys/api/Esys_Create.c:120:Esys_Create() Esys Finish ErrorCode (0x00000921)
Error executing command: TPM error: authorizations for objects subject to DA protection are not allowed at this time because the TPM is in DA lockout mode
Error reading from standard input
Error saving metadata to LUKSMeta slot 1 from /dev/vda2
Unable to update metadata; operation cancelled
Error adding new binding to /dev/vda2


In my case the TPM 2 has been put into DA lockout mode that is visible by this error message. 

CAN YOU please reproduce this in your case as well so that we see your issue is the same? 


In this case the solution is to run either one of the following commands from IBM or Intel TSS2:

# tssdictionaryattacklockreset
# tpm2_dictionarylockout --clear-lockout

After that the `luks bind` above works again.
The 'unbind' / 'bind' sequence however is not necessary when the VM doesn't boot and the password has to be entered. Only either one of the 'da lockout mode reset tools' should be used and the VM will again boot without user interaction upon reboot.


To show what the current DA lockout counter is one can enter the following command:

# tssgetcapability -cap 6 -pr 20e -pc 2 

moreData: 1
2 properties
TPM_PT 0000020e value 00000002 TPM_PT_LOCKOUT_COUNTER - the current value of the lockout counter (failedTries)
TPM_PT 0000020f value 00000003 TPM_PT_MAX_AUTH_FAIL - the number of authorization failures before DA lockout is invoked

This shows that the current lockout counter is at 2 and once 3 has been reached the TPM 2 will be in DA lockout mode. So it takes a few well-timed resets to trigger the DA lockout mode.

The easiest way to trigger an increase of the lockout counter *seems to be* to reset the VM while the password prompt is shown and while Clevis & IntelTSS2 interact with the TPM 2. I am not sure how real hardware would behave if one was to reset the machine (remove battery) at this point.

Comment 13 Stefan Berger 2022-05-26 21:38:22 UTC

The increase of the lockout counter happens in TPM2_Startup:  https://github.com/stefanberger/libtpms/blob/stable-0.9/src/tpm2/StartupCommands.c#L125

It calls DAStartup(startup): https://github.com/stefanberger/libtpms/blob/stable-0.9/src/tpm2/StartupCommands.c#L222

In DA Startup the following code evaluates to true: https://github.com/stefanberger/libtpms/blob/stable-0.9/src/tpm2/DA.c#L138-L140

    if(gp.recoveryTime != 0
       && gp.failedTries < gp.maxTries
       && !IS_ORDERLY(g_prevOrderlyState))
	{
#if USE_DA_USED
	    gp.failedTries += g_daUsed;
	    g_daUsed = FALSE;
#else
	    gp.failedTries++;
#endif
	    // Record the change to NV
	    NV_SYNC_PERSISTENT(failedTries);
	}

This increase the failedTries counter.

The likely reason is that the TPM emulator hasn't received a TPM2_Shutdown() since the power was suddenly removed. Typically the OS driver has to send this command.

Comment 14 Yihuang Yu 2022-05-27 02:29:22 UTC

(In reply to Stefan Berger from comment #12)
> On another machine with an older Fedora install I have been able to
> reproduce the following with:
> 
> - QEMU 7.0
> - swtpm 0.7.0
> - libtpms 0.9.1
> 
> After several resets (via virt-manager 'force reset') at various points
> during the boot the VM does not get past the point of prompting for the LUKS
> password anymore that was previously automated via Clevis/Tang and TPM 2 and
> required a few seconds wait time normally.
> 
> I then did the following as root after entering the password and booting to
> the prompt:
> 
> # clevis luks unbind -d /dev/vda2 -s 1
> # clevis luks bind -d /dev/vda2 tpm2 '{}'
> Enter existing LUKS password:
> Warning: Value 512 is outside of the allowed entropy range, adjusting it.
> WARNING:esys:src/tss2-esys/api/Esys_Create.c:375:Esys_Create_Finish()
> Received TPM Error
> ERROR:esys:src/tss2-esys/api/Esys_Create.c:120:Esys_Create() Esys Finish
> ErrorCode (0x00000921)
> Error executing command: TPM error: authorizations for objects subject to DA
> protection are not allowed at this time because the TPM is in DA lockout mode
> Error reading from standard input
> Error saving metadata to LUKSMeta slot 1 from /dev/vda2
> Unable to update metadata; operation cancelled
> Error adding new binding to /dev/vda2
Yes, I had this problem before!!! If I remember correctly, at that time I try to re-bind the luks disk again but got this DA lockout error.
Here is the test result I just tested.

# clevis luks unbind -d /dev/sda3 -s 1
The unbind operation will wipe a slot. This operation is unrecoverable.
Do you wish to erase LUKS slot 1 on /dev/sda3? [ynYN] y
Enter any remaining passphrase:
[root@dhcp156-100 ~]# clevis luks bind -f -y -k - -d /dev/sda3 tpm2 '{"pcr_bank":"sha256","pcr_ids":"7"}' <<< "kvmautotest"
Warning: Value 512 is outside of the allowed entropy range, adjusting it.
WARNING:esys:src/tss2-esys/api/Esys_Create.c:375:Esys_Create_Finish() Received TPM Error
ERROR:esys:src/tss2-esys/api/Esys_Create.c:120:Esys_Create() Esys Finish ErrorCode (0x00000921)
ERROR: Esys_Create(0x921) - tpm:warn(2.0): authorizations for objects subject to DA protection are not allowed at this time because the TPM is in DA lockout mode
ERROR: Unable to run tpm2_create
Creating TPM2 object for jwk failed!
Invalid input!
Usage: jose jwe fmt -i JWE [-I CT] [-o JWE] [-O CT] [-c]
......
......
Failed to import token from file.
Error saving metadata to LUKS2 header in device /dev/sda3
Unable to update metadata; operation cancelled
Error adding new binding to /dev/sda3
> 
> 
> In my case the TPM 2 has been put into DA lockout mode that is visible by
> this error message. 
> 
> CAN YOU please reproduce this in your case as well so that we see your issue
> is the same? 
> 
> 
> In this case the solution is to run either one of the following commands
> from IBM or Intel TSS2:
> 
> # tssdictionaryattacklockreset
> # tpm2_dictionarylockout --clear-lockout
> 
After this, it works.
# tpm2_dictionarylockout --clear-lockout
# clevis luks bind -f -y -k - -d /dev/sda3 tpm2 '{"pcr_bank":"sha256","pcr_ids":"7"}' <<< "kvmautotest"
Warning: Value 512 is outside of the allowed entropy range, adjusting it.

> After that the `luks bind` above works again.
> The 'unbind' / 'bind' sequence however is not necessary when the VM doesn't
> boot and the password has to be entered. Only either one of the 'da lockout
> mode reset tools' should be used and the VM will again boot without user
> interaction upon reboot.
> 
> 
> To show what the current DA lockout counter is one can enter the following
> command:
> 
> # tssgetcapability -cap 6 -pr 20e -pc 2 
Sorry, I cannot find this command on my system, so I don't get the result of it.
> 
> moreData: 1
> 2 properties
> TPM_PT 0000020e value 00000002 TPM_PT_LOCKOUT_COUNTER - the current value of
> the lockout counter (failedTries)
> TPM_PT 0000020f value 00000003 TPM_PT_MAX_AUTH_FAIL - the number of
> authorization failures before DA lockout is invoked
> 
> This shows that the current lockout counter is at 2 and once 3 has been
> reached the TPM 2 will be in DA lockout mode. So it takes a few well-timed
> resets to trigger the DA lockout mode.
> 
> The easiest way to trigger an increase of the lockout counter *seems to be*
> to reset the VM while the password prompt is shown and while Clevis &
> IntelTSS2 interact with the TPM 2. I am not sure how real hardware would
> behave if one was to reset the machine (remove battery) at this point.

Comment 15 Stefan Berger 2022-05-27 17:52:40 UTC

Posted a patch resolving this issue to qemu-devel mailing list: https://lists.nongnu.org/archive/html/qemu-devel/2022-05/msg05533.html

Comment 16 Stefan Berger 2022-05-31 18:20:34 UTC

Instead of implementing it in QEMU, this PR here for swtpm resolves it via swtpm: https://github.com/stefanberger/swtpm/pull/701

If we could get this tested before it's being merged, that would be helpful.

Comment 17 John Ferlan 2022-06-02 15:16:26 UTC

I see Stefan pushed this upstream yesterday as part of https://github.com/stefanberger/swtpm/pull/701

I created a swtpm rebase bug, altered this one to depend on it, and set release flags.

Comment 18 Stefan Berger 2022-06-02 19:32:00 UTC

(In reply to John Ferlan from comment #17)
> I see Stefan pushed this upstream yesterday as part of
> https://github.com/stefanberger/swtpm/pull/701
> 
> I created a swtpm rebase bug, altered this one to depend on it, and set
> release flags.

It's not upstream, yet. We're still internally discussing how to best approach this issue, on which level, swtpm or libtpms or both.

Comment 19 Stefan Berger 2022-06-17 16:40:58 UTC

We'll leave TPM 2 reference code untouched and go with the swtpm solution: https://github.com/stefanberger/swtpm/pull/701

Comment 20 John Ferlan 2022-06-21 11:46:52 UTC

removing the rebase as a blocker - will look to backport separately if/when patches accepted/merged upstream

Comment 27 John Ferlan 2022-11-22 15:50:59 UTC

Upstream MR https://github.com/stefanberger/swtpm/pull/701 was merged prior to swtpm-0.8.0, so I'll move this to POST restoring the rebase bug 2092944 as the dependent.

Comment 33 Yihuang Yu 2023-01-16 12:36:21 UTC

I tested with the following package on aarch64:

# rpm -qa | grep -E "qemu-kvm|tpm"
tpm2-tss-3.0.3-8.el9.aarch64
tpm2-tools-5.2-2.el9_1.aarch64
qemu-kvm-common-7.2.0-3.el9.aarch64
qemu-kvm-device-display-virtio-gpu-7.2.0-3.el9.aarch64
libtpms-0.9.1-2.20211126git1ff6fe1f43.el9.aarch64
qemu-kvm-device-display-virtio-gpu-pci-7.2.0-3.el9.aarch64
qemu-kvm-audio-pa-7.2.0-3.el9.aarch64
qemu-kvm-device-usb-host-7.2.0-3.el9.aarch64
qemu-kvm-core-7.2.0-3.el9.aarch64
qemu-kvm-block-rbd-7.2.0-3.el9.aarch64
qemu-kvm-tools-7.2.0-3.el9.aarch64
qemu-kvm-docs-7.2.0-3.el9.aarch64
qemu-kvm-7.2.0-3.el9.aarch64
swtpm-libs-0.8.0-1.el9.aarch64
swtpm-0.8.0-1.el9.aarch64
swtpm-tools-0.8.0-1.el9.aarch64

system_reset 20 times, the guest's disk can still be decrypted via the tpm.
debug log:
[stdlog] 2023-01-16 07:22:37,238 avocado.test INFO | Context: Reset guest system for 20 times
[stdlog] 2023-01-16 07:22:37,239 avocado.virttest.qemu_monitor DEBUG| (monitor avocado-vt-vm1.qmpmonitor1) Sending command 'system_reset'
[stdlog] 2023-01-16 07:22:37,240 avocado.virttest.qemu_monitor DEBUG| Send command: {'execute': 'system_reset', 'id': 'NT07d1Qw'}
[stdlog] 2023-01-16 07:22:37,241 avocado.test DEBUG| Reset the system by monitor cmd after 54.538secs
[stdlog] 2023-01-16 07:23:31,834 avocado.test INFO | Context: Try to login guest after reset
[stdlog] 2023-01-16 07:23:31,835 avocado.virttest.virt_vm DEBUG| Attempting to log into 'avocado-vt-vm1' (timeout 360s)
[stdlog] 2023-01-16 07:23:31,836 avocado.virttest.virt_vm DEBUG| Found/Verified IP 192.168.3.68 for VM avocado-vt-vm1 NIC 0
[stdlog] 2023-01-16 07:23:41,823 avocado.virttest.qemu_vm DEBUG| Destroying VM avocado-vt-vm1 (PID 22245)
[stdlog] 2023-01-16 07:23:41,824 avocado.virttest.qemu_vm DEBUG| Shutting down VM avocado-vt-vm1 (shell)
[stdlog] 2023-01-16 07:23:50,449 avocado.virttest.qemu_vm INFO | [qemu output] (Process terminated with status 0)

serial log:
2023-01-16 07:22:28: [^[[0;32m  OK  ^[[0m] Found device ^[[0;1;39mQEMU_HARDDISK 3^[[0m.
2023-01-16 07:22:28:          Starting ^[[0;1;39mCryptography Setu5d7f-4b39-aa9a-fdc72bf62aa0^[[0m...
2023-01-16 07:22:28: [^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mWait for udev To Complete Device Initialization^[[0m.
2023-01-16 07:22:28:
2023-01-16 07:22:28: Please enter passphrase for disk QEMU_HARDDISK (luks-f45cc118-5d7f-4b39-aa9a-fdc72bf62aa0)::
2023-01-16 07:22:31: [   22.152139] Key type trusted registered

Comment 34 Qinghua Cheng 2023-01-17 02:26:39 UTC

Verified on rhel 9.2 

qemu-kvm-7.2.0-1.el9.x86_64
libtpms-0.9.1-2.20211126git1ff6fe1f43.el9.x86_64
swtpm-0.8.0-1.el9.x86_64

System_reset test PASS.

The guest disk can be decrypted. 

debug log:
[stdlog] 2023-01-16 21:19:16,057 avocado.test INFO | Context: Reset guest system for 20 times
[stdlog] 2023-01-16 21:19:16,058 avocado.virttest.qemu_monitor DEBUG| (monitor avocado-vt-vm1.qmpmonitor1) Sending command 'system_reset'
[stdlog] 2023-01-16 21:19:16,059 avocado.virttest.qemu_monitor DEBUG| Send command: {'execute': 'system_reset', 'id': 'NbKWuJxL'}
[stdlog] 2023-01-16 21:19:16,059 avocado.test DEBUG| Reset the system by monitor cmd after 13.596secs
[stdlog] 2023-01-16 21:19:29,660 avocado.test INFO | Context: Try to login guest after reset
[stdlog] 2023-01-16 21:19:29,660 avocado.virttest.virt_vm DEBUG| Attempting to log into 'avocado-vt-vm1' (timeout 360s)
[stdlog] 2023-01-16 21:19:53,803 avocado.virttest.virt_vm DEBUG| Found/Verified IP 10.73.115.87 for VM avocado-vt-vm1 NIC 0
[stdlog] 2023-01-16 21:19:55,356 avocado.virttest.qemu_vm DEBUG| Destroying VM avocado-vt-vm1 (PID 856140)
[stdlog] 2023-01-16 21:19:55,356 avocado.virttest.qemu_vm DEBUG| Shutting down VM avocado-vt-vm1 (shell)

serial log:
2023-01-16 21:11:10: [^[[0;32m  OK  ^[[0m] Found device ^[[0;1;39mQEMU_HARDDISK 3^[[0m.
2023-01-16 21:11:10:          Starting ^[[0;1;39mCryptography Setu27bc-4d8f-aed9-1721bd673cee^[[0m...
2023-01-16 21:11:10: [^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mWait for udev To Complete Device Initialization^[[0m.
2023-01-16 21:11:10:
2023-01-16 21:11:10: Please enter passphrase for disk QEMU_HARDDISK (luks-1581ddb9-27bc-4d8f-aed9-1721bd673cee)::
2023-01-16 21:11:13: [^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mCryptography Setu9-27bc-4d8f-aed9-1721bd673cee^[[0m.

Comment 36 errata-xmlrpc 2023-05-09 07:47:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (swtpm bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2348

Note You need to log in before you can comment on or make changes to this bug.