2211060 – SEV-es guest randomly stuck at boot to hard drive screen from powerdown and boot again

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2211060 - SEV-es guest randomly stuck at boot to hard drive screen from powerdown and boot again

Summary: SEV-es guest randomly stuck at boot to hard drive screen from powerdown and b...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	edk2
Sub Component:
Version:	9.3
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Gerd Hoffmann
QA Contact:	zixchen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-05-30 12:04 UTC by zixchen
Modified:	2023-11-07 09:04 UTC (History)
CC List:	12 users (show)
Fixed In Version:	edk2-20230524-3.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-07 08:24:30 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
OVMF log (15.81 MB, text/plain) 2023-05-30 12:04 UTC, zixchen	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	redhat/centos-stream/src edk2 merge_requests 45	None	opened	OvmfPkg/IoMmuDxe: add locking to IoMmuAllocateBounceBuffer	2023-08-23 11:17:53 UTC
Red Hat Issue Tracker	RHELPLAN-158534	None	None	None	2023-05-30 12:08:01 UTC
Red Hat Product Errata	RHSA-2023:6330	None	None	None	2023-11-07 08:24:45 UTC

Description zixchen 2023-05-30 12:04:51 UTC

Created attachment 1967857 [details]
OVMF log

Description of problem:
SEV-es guest randomly stuck on boot to hard drive screen from powerdown and boot again 

Version-Release number of selected component (if applicable):
kernel-5.14.0-316.el9.x86_64
edk2-ovmf-20230301gitf80f052277c8-5.el9.noarch
qemu-kvm-8.0.0-3.el9.x86_64

How reproducible:
<20%

Steps to Reproduce:
1.Boot a sev-es guest with multi disks
/usr/libexec/qemu-kvm \
     -S  \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -object sev-guest,id=lsec0,policy=7,cbitpos=51,reduced-phys-bits=1 \
     -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/edk2/ovmf/OVMF_CODE.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
     -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio-scsi-ovmf_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
     -machine q35,memory-backend=mem-machine_mem,confidential-guest-support=lsec0,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie-pci-bridge-0", "addr": "0x1"}' \
     -device '{"driver": "i6300esb", "bus": "pcie-pci-bridge-0", "addr": "0x2"}' \
     -watchdog-action reset \ 
     -m 4096 \
     -object '{"size": 4294967296, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
     -smp 48,maxcpus=48,cores=24,threads=1,dies=1,sockets=2  \
     -cpu 'host',+kvm_pv_unhalt \
     -device '{"driver": "intel-hda", "bus": "pcie-pci-bridge-0", "addr": "0x3"}' \
     -device '{"driver": "hda-duplex"}' \
     -chardev socket,path=/var/tmp/avocado_bawr24p4/monitor-qmpmonitor1-20230525-040741-cCKZcimN,id=qmp_id_qmpmonitor1,server=on,wait=off  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,path=/var/tmp/avocado_bawr24p4/monitor-catch_monitor-20230525-040741-cCKZcimN,id=qmp_id_catch_monitor,server=on,wait=off  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idxwhk7t"}' \
     -chardev socket,path=/var/tmp/avocado_bawr24p4/serial-serial0-20230525-040741-cCKZcimN,id=chardev_serial0,server=on,wait=off \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}' \
     -object '{"qom-type": "rng-random", "filename": "/dev/urandom", "id": "passthrough-Jti99k41"}' \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "virtio-rng-pci", "id": "virtio-rng-iNEnLoli", "rng": "passthrough-Jti99k41", "bus": "pcie-root-port-1", "addr": "0x0"}'  \
     -chardev socket,id=seabioslog_id_20230525-040741-cCKZcimN,path=/var/tmp/avocado_bawr24p4/seabios-20230525-040741-cCKZcimN,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230525-040741-cCKZcimN,iobase=0x402 \
     -device '{"driver": "ich9-usb-ehci1", "id": "usb1", "addr": "0x1d.0x7", "multifunction": true, "bus": "pcie.0"}' \
     -device '{"driver": "ich9-usb-uhci1", "id": "usb1.0", "multifunction": true, "masterbus": "usb1.0", "addr": "0x1d.0x0", "firstport": 0, "bus": "pcie.0"}' \
     -device '{"driver": "ich9-usb-uhci2", "id": "usb1.1", "multifunction": true, "masterbus": "usb1.0", "addr": "0x1d.0x2", "firstport": 2, "bus": "pcie.0"}' \
     -device '{"driver": "ich9-usb-uhci3", "id": "usb1.2", "multifunction": true, "masterbus": "usb1.0", "addr": "0x1d.0x4", "firstport": 4, "bus": "pcie.0"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"driver": "qemu-xhci", "id": "usb2", "bus": "pcie-root-port-2", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb2.0", "port": "1"}' \ 
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-3", "addr": "0x0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio-scsi-ovmf.qcow2", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1", "discard": "unmap"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -blockdev '{"node-name": "host_device_scsi_debug", "driver": "host_device", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/dev/sdb", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_scsi_debug", "driver": "raw", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "host_device_scsi_debug", "discard": "unmap"}' \
     -device '{"driver": "scsi-block", "id": "scsi_debug", "drive": "drive_scsi_debug"}' \
     -device '{"id": "pcie-root-port-4", "port": 4, "driver": "pcie-root-port", "addr": "0x1.0x4", "bus": "pcie.0", "chassis": 5}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:90:6a:05:03:ea", "id": "idoBoi5W", "netdev": "idoIkyQ3", "bus": "pcie-root-port-4", "addr": "0x0"}'  \
     -netdev tap,id=idoIkyQ3,vhost=on,vhostfd=16,fd=13  \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off  \
     -no-hpet \
     -chardev socket,id=char_vtpm_avocado-vt-vm1_tpm0,path=/root/avocado/data/avocado-vt/swtpm/avocado-vt-vm1_tpm0_swtpm.sock \
     -tpmdev emulator,chardev=char_vtpm_avocado-vt-vm1_tpm0,id=emulator_vtpm_avocado-vt-vm1_tpm0 \
     -device '{"id": "tpm-crb_vtpm_avocado-vt-vm1_tpm0", "tpmdev": "emulator_vtpm_avocado-vt-vm1_tpm0", "driver": "tpm-crb"}' \
     -enable-kvm \

2. Do some test/operation on disk, powerdown guest and boot again
e.g. Context: Create partition on '/dev/sdb' in guest
 Sending command: parted -s /dev/sdb mklabel msdos && sleep 1s && parted -s /dev/sdb mkpart primary xfs 2048s 100%
 Sending command: echo $?
 Sending command: ls /sys/dev/block -l | grep "/pci"
 Sending command: echo $?
 Sending command: lsblk -o KNAME,SIZE | grep "sda "
 Sending command: echo $?
 Sending command: udevadm info -q all -n sda
 Sending command: echo $?
 Sending command: lsblk -o KNAME,SIZE | grep "sda1 "
 Sending command: echo $?
 Sending command: udevadm info -q all -n sda1
 Sending command: echo $?
 Sending command: lsblk -o KNAME,SIZE | grep "sdb "
 Sending command: echo $?
 Sending command: udevadm info -q all -n sdb
 Sending command: echo $?
 Sending command: lsblk -o KNAME,SIZE | grep "sdb1 "
 Sending command: echo $?
 Sending command: udevadm info -q all -n sdb1
 Sending command: echo $?
 Sending command: lsblk -o KNAME,SIZE | grep "sda2 "
 Sending command: echo $?
 Sending command: udevadm info -q all -n sda2
 Sending command: echo $?
 Sending command: lsblk -o KNAME,SIZE | grep "sda3 "
 Sending command: echo $?
 Sending command: udevadm info -q all -n sda3
 Sending command: echo $?

3. Guest has a chance to stuck at boot to hard drive screen

Actual results:
Guest boots fail

Expected results:
Guest always can boot

Additional info:
Add edk2 log and guest console log, no error info in host kernel log.

Comment 3 Gerd Hoffmann 2023-05-31 13:37:23 UTC

2023-05-25 04:10:03: ASSERT /builddir/build/BUILD/edk2-f80f052277c8/OvmfPkg/VirtioNetDxe/SnpSharedHelpers.c(177): ((BOOLEAN)(0==1))

  switch (Status) {
    case EFI_OUT_OF_RESOURCES:
      goto UnmapTxBuf;
    case EFI_ALREADY_STARTED:
      //
      // This should never happen: it implies
      //
      // - an identity-mapping VIRTIO_DEVICE_PROTOCOL.MapSharedBuffer()
      //   implementation -- which is fine,
      //
      // - and an SNP client that queues multiple instances of the exact same
      //   buffer address with SNP.Transmit() -- which is undefined behavior,
      //   based on the TxBuf language in UEFI-2.7,
      //   EFI_SIMPLE_NETWORK.GetStatus().
      //
      ASSERT (FALSE);                    <= THIS is triggered.
      Status = EFI_INVALID_PARAMETER;
      goto UnmapTxBuf;
    default:
      ASSERT_EFI_ERROR (Status);
      break;
  }

Comment 4 Gerd Hoffmann 2023-06-01 08:46:52 UTC

https://kojihub.stream.centos.org/koji/taskinfo?taskID=2275930
Test build with some debug logging added.
Can you try this and attach a log of a failed boot?

Comment 5 zixchen 2023-06-01 12:07:23 UTC

(In reply to Gerd Hoffmann from comment #4)
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2275930
> Test build with some debug logging added.
> Can you try this and attach a log of a failed boot?

Thanks, Gerd, I will have a try ASAP.

Comment 8 Gerd Hoffmann 2023-06-22 10:12:22 UTC

2023-06-01 10:12:05: IoMmuMap: Operation=Read Host=0x7F7BA61D Bytes=0x33
2023-06-01 10:12:05: InternalAllocateBuffer: range-size: 1000, start-address=0x7E06C000, pages=0x1, bits=0x1, bitmap: BFF2 => BFF3

2023-06-01 10:12:05: VirtioNetTransmit/77: Buffer 7D1F5041, Size 78
2023-06-01 10:12:05: VirtioNetMapTxBuf/142: << Buffer 7D1F5041
2023-06-01 10:12:05: IoMmuMap: Operation=Read Host=0x7D1F5041 Bytes=0x4E
2023-06-01 10:12:05: InternalAllocateBuffer: range-size: 1000, start-address=0x7E06C000, pages=0x1, bits=0x1, bitmap: BFF2 => BFF3
2023-06-01 10:12:05: IoMmuMap: Mapping=0x7D1F6D98 Device(PlainText)=0x7E06C000 Crypted=0x7D1F5041 Pages=0x1, ReservedMemBitmap=0x1
2023-06-01 10:12:05: VirtioNetMapTxBuf/157: >> Address 7E06C000

2023-06-01 10:12:05: IoMmuMap: Mapping=0x7D1FC398 Device(PlainText)=0x7E06C000 Crypted=0x7F7BA61D Pages=0x1, ReservedMemBitmap=0x1

That looks fishy.  iommu driver hands out the same bounce buffer twice.
Probably blocks 1+3 are one function which gets interrupted by block 2.

Guess we need some locking:

--- a/OvmfPkg/IoMmuDxe/IoMmuBuffer.c
+++ b/OvmfPkg/IoMmuDxe/IoMmuBuffer.c
@@ -367,7 +367,9 @@ IoMmuAllocateBounceBuffer (
 {
   EFI_STATUS  Status;
   UINT32      ReservedMemBitmap;
+  EFI_TPL     OldTpl;
 
+  OldTpl = gBS->RaiseTPL (TPL_CALLBACK);
   ReservedMemBitmap = 0;
   Status            = InternalAllocateBuffer (
                         Type,
@@ -378,6 +380,7 @@ IoMmuAllocateBounceBuffer (
                         );
   MapInfo->ReservedMemBitmap = ReservedMemBitmap;
   mReservedMemBitmap        |= ReservedMemBitmap;
+  gBS->RestoreTPL (OldTpl);
 
   ASSERT (Status == EFI_SUCCESS);
 
New scratch build:
https://kojihub.stream.centos.org/koji/taskinfo?taskID=2385804

Any change in behavior?

Comment 9 zixchen 2023-06-28 08:26:24 UTC

(In reply to Gerd Hoffmann from comment #8)
> 2023-06-01 10:12:05: IoMmuMap: Operation=Read Host=0x7F7BA61D Bytes=0x33
> 2023-06-01 10:12:05: InternalAllocateBuffer: range-size: 1000,
> start-address=0x7E06C000, pages=0x1, bits=0x1, bitmap: BFF2 => BFF3
> 
> 2023-06-01 10:12:05: VirtioNetTransmit/77: Buffer 7D1F5041, Size 78
> 2023-06-01 10:12:05: VirtioNetMapTxBuf/142: << Buffer 7D1F5041
> 2023-06-01 10:12:05: IoMmuMap: Operation=Read Host=0x7D1F5041 Bytes=0x4E
> 2023-06-01 10:12:05: InternalAllocateBuffer: range-size: 1000,
> start-address=0x7E06C000, pages=0x1, bits=0x1, bitmap: BFF2 => BFF3
> 2023-06-01 10:12:05: IoMmuMap: Mapping=0x7D1F6D98
> Device(PlainText)=0x7E06C000 Crypted=0x7D1F5041 Pages=0x1,
> ReservedMemBitmap=0x1
> 2023-06-01 10:12:05: VirtioNetMapTxBuf/157: >> Address 7E06C000
> 
> 2023-06-01 10:12:05: IoMmuMap: Mapping=0x7D1FC398
> Device(PlainText)=0x7E06C000 Crypted=0x7F7BA61D Pages=0x1,
> ReservedMemBitmap=0x1
> 
> That looks fishy.  iommu driver hands out the same bounce buffer twice.
> Probably blocks 1+3 are one function which gets interrupted by block 2.
> 
> Guess we need some locking:
> 
> --- a/OvmfPkg/IoMmuDxe/IoMmuBuffer.c
> +++ b/OvmfPkg/IoMmuDxe/IoMmuBuffer.c
> @@ -367,7 +367,9 @@ IoMmuAllocateBounceBuffer (
>  {
>    EFI_STATUS  Status;
>    UINT32      ReservedMemBitmap;
> +  EFI_TPL     OldTpl;
>  
> +  OldTpl = gBS->RaiseTPL (TPL_CALLBACK);
>    ReservedMemBitmap = 0;
>    Status            = InternalAllocateBuffer (
>                          Type,
> @@ -378,6 +380,7 @@ IoMmuAllocateBounceBuffer (
>                          );
>    MapInfo->ReservedMemBitmap = ReservedMemBitmap;
>    mReservedMemBitmap        |= ReservedMemBitmap;
> +  gBS->RestoreTPL (OldTpl);
>  
>    ASSERT (Status == EFI_SUCCESS);
>  
> New scratch build:
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2385804
> 
> Any change in behavior?

edk2-ovmf-20230301gitf80f052277c8-4.el9.bz2211060.20230622.1203.noarch has other issues, it couldn't boot sev-es guest. Check the console, it stucks after tiancore, please check the attachment for the firmware log. 

Version:
edk2-ovmf-20230301gitf80f052277c8-4.el9.bz2211060.20230622.1203.noarch
qemu-kvm-8.0.0-5.el9.x86_64
kernel-5.14.0-331.el9.x86_64

Steps:
/usr/libexec/qemu-kvm \
     -S  \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/edk2/ovmf/OVMF_CODE.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
     -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel9-64-virtio-scsi-ovmf_qcow2_filesystem_VARS.raw", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
     -machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,confidential-guest-support=lsec0,memory-backend=mem-machine_mem \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}' \
     -object sev-guest,id=lsec0,policy=7,cbitpos=51,reduced-phys-bits=1  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 4096 \
     -object '{"size": 4294967296, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
     -smp 48,maxcpus=48,cores=24,threads=1,dies=1,sockets=2  \
     -cpu 'EPYC-Milan',x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,vaes=on,vpclmulqdq=on,spec-ctrl=on,stibp=on,arch-capabilities=on,ssbd=on,cmp-legacy=on,virt-ssbd=on,lbrv=on,tsc-scale=on,vmcb-clean=on,pause-filter=on,pfthreshold=on,v-vmsave-vmload=on,vgif=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,kvm_pv_unhalt=on \
     -chardev socket,server=on,wait=off,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_as4vdz88/monitor-qmpmonitor1-20230628-030129-xipuDste  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,server=on,wait=off,id=qmp_id_catch_monitor,path=/var/tmp/avocado_as4vdz88/monitor-catch_monitor-20230628-030129-xipuDste  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idr8y3qB"}' \
     -chardev socket,server=on,wait=off,id=chardev_serial0,path=/var/tmp/avocado_as4vdz88/serial-serial0-20230628-030129-xipuDste \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -chardev socket,id=seabioslog_id_20230628-030129-xipuDste,path=/var/tmp/avocado_as4vdz88/seabios-20230628-030129-xipuDste,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230628-030129-xipuDste,iobase=0x402 \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel9-64-virtio-scsi-ovmf.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:8f:81:2a:7b:a8", "id": "idkuetK7", "netdev": "idHAEXFR", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=idHAEXFR,vhost=on  \
     -monitor stdio \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \

Comment 11 Gerd Hoffmann 2023-06-29 13:12:36 UTC

2023-06-28 03:50:27: FATAL ERROR - RaiseTpl with OldTpl(0x10) > NewTpl(0x8)

Ok, picked wrong TPL level apparently, lets try s/TPL_CALLBACK/TPL_NOTIFY/.

New scratch build (also updated to 2023-05 rebase):
https://kojihub.stream.centos.org/koji/taskinfo?taskID=2426028

Comment 12 Laszlo Ersek 2023-07-07 08:53:05 UTC

(In reply to Gerd Hoffmann from comment #8)
> 2023-06-01 10:12:05: IoMmuMap: Operation=Read Host=0x7F7BA61D Bytes=0x33
> 2023-06-01 10:12:05: InternalAllocateBuffer: range-size: 1000,
> start-address=0x7E06C000, pages=0x1, bits=0x1, bitmap: BFF2 => BFF3
> 
> 2023-06-01 10:12:05: VirtioNetTransmit/77: Buffer 7D1F5041, Size 78
> 2023-06-01 10:12:05: VirtioNetMapTxBuf/142: << Buffer 7D1F5041
> 2023-06-01 10:12:05: IoMmuMap: Operation=Read Host=0x7D1F5041 Bytes=0x4E
> 2023-06-01 10:12:05: InternalAllocateBuffer: range-size: 1000,
> start-address=0x7E06C000, pages=0x1, bits=0x1, bitmap: BFF2 => BFF3
> 2023-06-01 10:12:05: IoMmuMap: Mapping=0x7D1F6D98
> Device(PlainText)=0x7E06C000 Crypted=0x7D1F5041 Pages=0x1,
> ReservedMemBitmap=0x1
> 2023-06-01 10:12:05: VirtioNetMapTxBuf/157: >> Address 7E06C000
> 
> 2023-06-01 10:12:05: IoMmuMap: Mapping=0x7D1FC398
> Device(PlainText)=0x7E06C000 Crypted=0x7F7BA61D Pages=0x1,
> ReservedMemBitmap=0x1
> 
> That looks fishy.  iommu driver hands out the same bounce buffer twice.
> Probably blocks 1+3 are one function which gets interrupted by block 2.

This is an extremely good observation / hunch.

I've been thinking a bit about TPL usage in protocol implementations:

In UEFI 2.10, "Table 7.3: TPL Restrictions", pick any line -- that line tells you the max TPL a protocol member function may be invoked at, which is also the max TPL that the protocol member function may raise the TPL to, internally. (If the TPL is already that high, then RaiseTPL will have no effect.)

However, there's a third meaning to that table, which is kind of occurring to me only now. If the max TPL for a particular protocol (or protocol interface) is TPL_CALLBACK or TPL_NOTIFY (i.e., it's not restricted to TPL_APPLICATION), then that protocol interface *MUST* raise the TPL internally to that particular level as well! So it's effectively a requirement too, on implementations. Here's why: assume the protocol member is already executing at a particular TPL that's strictly below its permitted maximum TPL.

(This means that,

- for a max=TPL_CALLBACK interface, an agent is calling the interface directly at TPL_APPLICATION,
- or for a max=TPL_NOTIFY interface, an agent is calling the interface directly at TPL_APPLICATION,
- or for a max=TPL_NOTIFY interface, the interface is being called from an event notification function (callback) that is executing at TPL_CALLBACK.

These are the three cases where the current TPL is strictly below the called interface's permitted max TPL.)

Assume further that a timer interrupt occurs. The timer interrupt handler will consult the registered events, and may enqueue event notification functions at *higher* TPLs than the current TPL. This means the currently executing protocol interface will be *interrupted*.

Assume yet further that this asynchronously invoked notification function was enqueued at a TPL that precisely equals the permitted max TPL of the interrupted protocol member function. That means the notification function is *permitted* to call the protocol member function. End to end, this means that *EVERY* protocol member function must EITHER be fully reentrant, OR must bracket itself with a RaiseTPL/RestoreTPL to its per-spec maximum permitted TPL! (So that, in the latter case, it couldn't be interrupted to begin with.)

So, it's either that my reading of the UEFI spec has not been thorough enough thus far, *or* that the UEFI spec is not *clear enough* on this issue -- but either way, the TPL restrictions table effectively means that whenever we implement a protocol, we *MUST* kick the TPL up to the permitted maximum in *each* member function of the protocol. Otherwise there are going to be obscure reentrancy issues.

My own treatment of this requirement has not been consistent over the years. In the VirtioNetDxe driver, I did protect every member function -- the table says for SNP: "must be <= TPL_CALLBACK", and I did raise the TPL to CALLBACK in every protocol member function. However, I didn't do the same (for example) in VirtioFsDxe, even though table 7.3 specifies <=TPL_CALLBACK for SimpleFS.

Now, if a particular protocol is not listed by name, then the default (last row) takes effect: <=TPL_NOTIFY.

With that, we can argue that, whenever we see a protocol member func implementation that lacks a RaiseTPL/RestoreTPL bracket, that's *in general* almost certainly a reentrancy bug!

What's a further complication here BTW is that EDKII_IOMMU_PROTOCOL is not a standard protocol, so table 7.3 has no chance of covering it specifically. I guess we can still apply the default restriction: <=TPL_NOTIFY. Therefore, all member functions of an EDKII_IOMMU_PROTOCOL implementation should raise the TPL to NOTIFY upon entry.

Comment 13 Laszlo Ersek 2023-07-07 09:05:37 UTC

(In reply to Gerd Hoffmann from comment #11)
> 2023-06-28 03:50:27: FATAL ERROR - RaiseTpl with OldTpl(0x10) > NewTpl(0x8)
> 
> Ok, picked wrong TPL level apparently, lets try s/TPL_CALLBACK/TPL_NOTIFY/.

Indeed. The following must have happened:

(1) Your original attempt with gBS->RaiseTPL (TPL_CALLBACK) must certainly have solved the original issue.

The original issue was that EDKII_IOMMU_PROTOCOL was being used at TPL_APPLICATION level, but then a notification function was queued (most likely due to a timer interrupt / timer event) at TPL_CALLBACK, and this notification function called into the VirtioNetDxe driver's SNP implementation (which is valid, per se, because SNP is usable up to TPL_CALLBACK). Then VirtioNetDxe indirectly called into EDKII_IOMMU_PROTOCOL as well (via VirtioMapAllBytesInSharedBuffer() / VirtioLib -> VirtioDeviceProtocol -> PciIoProtocol -> PciRootBridgeIoProtocol -> EdkiiIommuProtocol), so the iommu member function got re-entered.

Your original attempt with gBS->RaiseTPL (TPL_CALLBACK) solved this, because the initial IOMMU protocol activation made itself uninterruptible -- so the timer event handling would be *delayed*. OK.

(2) However, EDKII_IOMMU_PROTOCOL is almost definitely used via other avenues too (most likely some PCI device driver registers an event with a notification function to enqueue at TPL_NOTIFY, so we get PciIo -> PciRootBridgeIo -> EdkiiIommu at TPL_NOTIFY). Notably, Table 7.3 does not list PCI-related protocols by name, so they get the default <= TPL_NOTIFY restriction. And so when your gBS->RaiseTPL (TPL_CALLBACK) in the IOMMU driver is reached, the TPL is already NOTIFY, and so the behavior is undefined.

IOW, the idea to use TPL_NOTIFY seems good, and not just as a random idea ("shotgun debugging"), but fully it's justified by Table 7.3 -- Table 7.3 does not treat PCI-related protocols specifically, so they can be called at TPLs up to and including NOTIFY -- which in turn means the protocol *must* bracket itself with RaiseTPL (TPL_NOTIFY).


So, this is a long-winded way to say that we should probably audit *all* drivers in OvmfPkg (minimally all the virtio drivers), and add the missing locking :/

Comment 14 Laszlo Ersek 2023-07-07 09:12:27 UTC

Here's a factor that mitigates the symptom somewhat, though.

The virtio drivers bind individual devices. It is arguably unlikely that an application or driver uses a particular virtio device in *two ways* at the same time; namely both directly *and* via callbacks. Therefore reentrancy issues, at least at the individual protocol instance granularity, are unlikely (there could still be issues with static r/w data in the driver, of course).

The edk2 IOMMU protocol is more vulnerable / exposed, because it is a *service* protocol -- the whole system has just one instance of it, and it sees a bunch of (indirect) use from lots of places.

Comment 15 Gerd Hoffmann 2023-07-07 10:23:36 UTC

> With that, we can argue that, whenever we see a protocol member func
> implementation that lacks a RaiseTPL/RestoreTPL bracket, that's *in general*
> almost certainly a reentrancy bug!

Depends on what the function is actually doing ...

EDKII_IOMMU_PROTOCOL.Map() does first search for a bounce buffer, and then
reserves it (by setting the bit in mReservedMemBitmap).  That is not reentrant
so it needs locking (which the patch does by raising the TPL), to avoid the
function being interrupted between "search" and "reserve".

EDKII_IOMMU_PROTOCOL.Unmap() will just clear the bit in mReservedMemBitmap
to release the buffer, which is not critical, doing that without raising
the TPL should be fine.

Disclaimer: That is the bounce buffer allocation aspect of the protocol only,
didn't review the other bits in detail (yet).

Also note that just wrapping all callback code might not be possible because
TPL_NOTITY level code is not allowed to block (see table 7-2 in uefi spec).

(In reply to Laszlo Ersek from comment #14)
> The edk2 IOMMU protocol is more vulnerable / exposed, because it is a
> *service* protocol -- the whole system has just one instance of it, and it
> sees a bunch of (indirect) use from lots of places.

Yes.  We see block devices and network devices calling the iommu protocol
in parallel, the latter probably calling from a (polling) timer function.

Comment 16 Laszlo Ersek 2023-07-07 15:25:25 UTC

(In reply to Gerd Hoffmann from comment #15)

> Also note that just wrapping all callback code might not be possible because
> TPL_NOTITY level code is not allowed to block (see table 7-2 in uefi spec).

I've not verified it, but my gut is telling me that, in case table 7.3 permits TPL_NOTIFY for a particular protocol or protocol member, then any implementation for that protocol (or protocol member) will "inherently" *not* block internally. In other words, I expect that it would be exceptional that FoobarProtocol were allowed at TPL_NOTIFY, but you'd be "tempted" to write such a Foobar implementation that blocked. I feel that this is in fact how Table 7.3 was put together in the first place!

This raises an extra aspect: table 7.3 limits BlockIo and SimpleNetwork to TPL_CALLBACK, which makes sense. But the table doesn't discuss EFI_EXT_SCSI_PASS_THRU_PROTOCOL at all, as far as I can tell. Therefore ExtScsiPassThru *appears* to be permitted at TPL_NOTIFY. I think that must be wrong -- a bug in the spec. Here's why: EFI_EXT_SCSI_PASS_THRU_PROTOCOL.PassThru() *MAY* support nonblocking operation, but it is not required to. The spec says that if the protocol impl does not support non-blocking operation, then PassThru() is allowed to ignore the Event parameter altogether, and wait (block) until the request completes. That behavior however conflicts with TPL_NOTIFY. Therefore I think this must be a bug in Table 3 (i.e., omitting a restriction for ExtScsiPassThru to TPL_CALLBACK).

VirtioBlkDxe and VirtioNetDxe are / would be therefore fine with TPL_CALLBACK, whereas VirtioScsiDxe is a gray area -- it handles virtio requests in a blocking manner, and it is *entitled* to that, per EFI_EXT_SCSI_PASS_THRU_PROTOCOL spec. The "gray area" seems to apply to VirtioGpuDxe and VirtioRngDxe; their respective protocols are apparently only limited to TPL_NOTIFY by the default clause in the table, however they have no way of not blocking, even at the API level (so their internal blocking virtio behavior is also valid).

> (In reply to Laszlo Ersek from comment #14)
> > The edk2 IOMMU protocol is more vulnerable / exposed, because it is a
> > *service* protocol -- the whole system has just one instance of it, and it
> > sees a bunch of (indirect) use from lots of places.
> 
> Yes.  We see block devices and network devices calling the iommu protocol
> in parallel, the latter probably calling from a (polling) timer function.

Ah, that's enlightening!

It also raises an interesting, extra aspect. Once we modify VirtioBlkDxe to bump the TPL to TPL_CALLBACK, that will prevent a clash with VirtioNetDxe -- and this will *mask* the issue (= the lack of TPL_NOTIFY-level locking) in the IOMMU protocol implementation.

Anyway, let me take a look at the virtio drivers I've written, wrt. TPL bumping. I'm somewhat concerned about the "gray area" ones; I do think EFI_EXT_SCSI_PASS_THRU_PROTOCOL should be limited to TPL_CALLBACK as well, per spec.

Comment 17 Laszlo Ersek 2023-07-07 22:11:14 UTC

After investigating the virtio-blk driver, my opinion is that adding TPL raising is not worth the churn. We have no practical problem (bug) to solve in the virtio device drivers, and every code modification requires review and regression-testing. And pedantically conforming would require the followig changes at least:

1. raise the TPL to CALLBACK in VirtioBlkReadBlocks, VirtioBlkWriteBlocks, VirtioBlkFlushBlocks. Not too bad thus far, but:

2. raise the TPL to NOTIFY in VirtioBlkGetDriverName (!) -- the driver exposes EFI_COMPONENT_NAME_PROTOCOL and EFI_COMPONENT_NAME2_PROTOCOL too, after all, and they are only covered by the default clause in Table 7.3

3. raise the TPL to NOTIFY in VirtioBlkDriverBindingSupported, VirtioBlkDriverBindingStart, VirtioBlkDriverBindingStop (!) -- the driver exposes EFI_DRIVER_BINDING_PROTOCOL, after all; also not covered specifically in Table 7.3

Now consider what it would require to trigger the *lack* of #1. It requires an application to open BlockIo provided by VirtioBlkDxe in EXCLUSIVE mode (if the app opens BlockIo without exclusive, the behavior is almost guaranteed to be undefined at once, because higher-level abstractions will have BlockIo open already, such as DiskIo and/or SimpleFilesystem, and such concurrent accesses are invalid inside TPL_APPLICATION context already). After that, the app would have to call ReadBlocks / WriteBlocks from normal context, and to call the same functions on the *same device* from callback (interrupt) context too. This is exorbitantly unlikely.

Triggering #2 and/or #3 -- I don't even know how to make a scenario up for re-entering those. Per UEFI driver model, there's zero *reason* to call these functions from callbacks or (otherwise) an interrupt context. Re #3, you'd use ConnectController() / DisconnectController() in the first place, and again there seems to be no reason to call those from callbacks.

So I'd rather suggest fixing only the service driver that has the actual bug -- the IOMMU driver.

Comment 18 Gerd Hoffmann 2023-07-11 08:16:57 UTC

(In reply to Gerd Hoffmann from comment #11)
> 2023-06-28 03:50:27: FATAL ERROR - RaiseTpl with OldTpl(0x10) > NewTpl(0x8)
> 
> Ok, picked wrong TPL level apparently, lets try s/TPL_CALLBACK/TPL_NOTIFY/.
> 
> New scratch build (also updated to 2023-05 rebase):
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2426028

Ping, how things are going with this build?

Comment 19 zixchen 2023-07-11 08:37:43 UTC

(In reply to Gerd Hoffmann from comment #18)
> (In reply to Gerd Hoffmann from comment #11)
> > 2023-06-28 03:50:27: FATAL ERROR - RaiseTpl with OldTpl(0x10) > NewTpl(0x8)
> > 
> > Ok, picked wrong TPL level apparently, lets try s/TPL_CALLBACK/TPL_NOTIFY/.
> > 
> > New scratch build (also updated to 2023-05 rebase):
> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=2426028
> 
> Ping, how things are going with this build?

So far so good, I've run 20 tests now, all passes. Since the reproduce ratio is lower than 20%, I'm going to run more tests by today, if no failed cases, this build would be ok.

Comment 20 zixchen 2023-07-11 14:15:34 UTC

Test with edk2-ovmf-20230524-1.el9.bz2211060.20230629.1509.noarch.rpm, repeat 40 tests, couldn't reproduce the issue now.

Version:
edk2-ovmf-20230524-1.el9.bz2211060.20230629.1509.noarch
kernel-5.14.0-334.el9.x86_64
qemu-kvm-8.0.0-6.el9.x86_64

Steps:
1. Run sev-es guests with multi disks tests.

Result:
Run 40 cases of multi disks, no failed test.

Comment 22 Gerd Hoffmann 2023-07-19 11:35:48 UTC

posted upstream: https://edk2.groups.io/g/devel/message/107049

Comment 24 Michael Brown 2023-07-19 16:01:43 UTC

(In reply to Gerd Hoffmann from comment #15)
> > With that, we can argue that, whenever we see a protocol member func
> > implementation that lacks a RaiseTPL/RestoreTPL bracket, that's *in general*
> > almost certainly a reentrancy bug!
> 
> Depends on what the function is actually doing ...
> 
> EDKII_IOMMU_PROTOCOL.Map() does first search for a bounce buffer, and then
> reserves it (by setting the bit in mReservedMemBitmap).  That is not
> reentrant
> so it needs locking (which the patch does by raising the TPL), to avoid the
> function being interrupted between "search" and "reserve".
> 
> EDKII_IOMMU_PROTOCOL.Unmap() will just clear the bit in mReservedMemBitmap
> to release the buffer, which is not critical, doing that without raising
> the TPL should be fine.

The bit clearing operation in IoMmuFreeBounceBuffer() is not an atomic operation, so it seems as though even clearing the bit should require locking against anything else that might be modifying mReservedMemBitmap.

Michael

Comment 27 Gerd Hoffmann 2023-08-22 09:28:05 UTC

Better approach being discussed upstream:
https://edk2.groups.io/g/devel/message/107100
Test build:
https://kojihub.stream.centos.org/koji/taskinfo?taskID=2730173

Can you test this build please?

Comment 28 zixchen 2023-08-23 10:30:20 UTC

(In reply to Gerd Hoffmann from comment #27)
> Better approach being discussed upstream:
> https://edk2.groups.io/g/devel/message/107100
> Test build:
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2730173
> 
> Can you test this build please?

I run 12 tests, vm can boot but it takes quite a long time. sometimes it will freeze on the "boot to hard drive " screen for 3 mins then continue to boot. And it looks the boot time grows when I repeat more tests.
upload a firmware log edk2-ovmf-20230524-2.el9.bz2211060.20230822.1033.log.

Comment 30 Gerd Hoffmann 2023-08-23 11:01:13 UTC

(In reply to zixchen from comment #28)
> (In reply to Gerd Hoffmann from comment #27)
> > Better approach being discussed upstream:
> > https://edk2.groups.io/g/devel/message/107100
> > Test build:
> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=2730173
> > 
> > Can you test this build please?
> 
> I run 12 tests, vm can boot but it takes quite a long time. sometimes it
> will freeze on the "boot to hard drive " screen for 3 mins then continue to
> boot. And it looks the boot time grows when I repeat more tests.

The builds log plenty of debug messages which takes time, so a somewhat
slow boot process is expected.  3 mins seems extremely long though.

How much time need the comment 11 test builds?

Comment 32 zixchen 2023-08-24 01:44:47 UTC

(In reply to Gerd Hoffmann from comment #30)
> (In reply to zixchen from comment #28)
> > (In reply to Gerd Hoffmann from comment #27)
> > > Better approach being discussed upstream:
> > > https://edk2.groups.io/g/devel/message/107100
> > > Test build:
> > > https://kojihub.stream.centos.org/koji/taskinfo?taskID=2730173
> > > 
> > > Can you test this build please?
> > 
> > I run 12 tests, vm can boot but it takes quite a long time. sometimes it
> > will freeze on the "boot to hard drive " screen for 3 mins then continue to
> > boot. And it looks the boot time grows when I repeat more tests.
> 
> The builds log plenty of debug messages which takes time, so a somewhat
> slow boot process is expected.  3 mins seems extremely long though.
> 
> How much time need the comment 11 test builds?

The comment 11 build doesn't stuck at the boot hard drive screen, the whole booting time is around 20 to 30 sec.

Comment 37 Gerd Hoffmann 2023-08-24 13:00:10 UTC

New scratch build, based on edk2-20230524-3.el9:
https://kojihub.stream.centos.org/koji/taskinfo?taskID=2749240

Can you double-check with both edk2-20230524-3.el9 and the new scratch build in a otherwise 100% identical environment?

It is somewhat puzzling where the boot time jump comes from.
Attempts to reproduce this with a slightly modified driver IoMmuDxe driver (activate it even without SEV) failed.
If the boot time difference persist I might have to borrow a SEV machine next week to try hunting it down.

Comment 38 zixchen 2023-08-25 07:40:19 UTC

(In reply to Gerd Hoffmann from comment #37)
> New scratch build, based on edk2-20230524-3.el9:
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2749240
> 
> Can you double-check with both edk2-20230524-3.el9 and the new scratch build
> in a otherwise 100% identical environment?
> 
> It is somewhat puzzling where the boot time jump comes from.
> Attempts to reproduce this with a slightly modified driver IoMmuDxe driver
> (activate it even without SEV) failed.
> If the boot time difference persist I might have to borrow a SEV machine
> next week to try hunting it down.

Tested edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324.noarch.rpm, repeat test 20 times, not found booting time increase issue. 
Also test edk2-20230524-3.el9 in the same host, same result with edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324.noarch

Are we going to get edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324 in 9.3?

Comment 39 Yanan Fu 2023-08-25 09:00:01 UTC

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 42 Xueqiang Wei 2023-08-27 18:31:55 UTC

Tested edk2 and qemu gating test loops, no new bug was found.

Versions:
kernel-5.14.0-356.el9.x86_64
qemu-kvm-8.0.0-13.el9
edk2-ovmf-20230524-3.el9.noarch

1. Edk2_test_with_edk2-ovmf-20230524-3.el9
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/edk2_test_with_edk2-ovmf-20230524-3.el9/results.html
2. Qemu_gating_test_rhel9_with_edk2_test_with_edk2-ovmf-20230524-3.el9
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/qemu_gating_test_rhel9_with_edk2_test_with_edk2-ovmf-20230524-3.el9/results.html

Hit a system poweroff issue， tracked by Bug 2234782 - system failed to poweroff

Comment 43 zixchen 2023-08-28 06:05:41 UTC

Tested sev-es guest with vtpm device, system boots up without reseting. 

Versions:
kernel-5.14.0-360.el9.x86_64
qemu-kvm-8.0.0-13.el9
edk2-ovmf-20230524-3.el9.noarch

Steps:
1. /usr/bin/swtpm socket --ctrl type=unixio,path=/run/libvirt/qemu/swtpm/6-rhel9_sev-swtpm.sock,mode=0600 --tpmstate dir=/var/lib/libvirt/swtpm/e7f5187a-6e76-4f14-8085-69c95ad5e01f/tpm2,mode=0600 --log file=/var/log/swtpm/libvirt/qemu/rhel9_sev-swtpm.log --terminate --tpm2

2. Boot VM with OVMF.amdsev.fd and vtpm device. 
/usr/libexec/qemu-kvm -name guest=rhel9_sev,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-6-rhel9_sev/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF.amdsev.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -machine pc-q35-rhel9.2.0,usb=off,dump-guest-core=off,memory-backend=pc.ram,confidential-guest-support=lsec0,pflash0=libvirt-pflash0-format,hpet=off,acpi=on -accel kvm -cpu EPYC-Milan,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,avx512f=on,avx512dq=on,avx512ifma=on,avx512cd=on,avx512bw=on,avx512vl=on,avx512vbmi=on,avx512vbmi2=on,gfni=on,vaes=on,vpclmulqdq=on,avx512vnni=on,avx512bitalg=on,avx512-vpopcntdq=on,la57=on,spec-ctrl=on,stibp=on,flush-l1d=on,arch-capabilities=on,ssbd=on,avx512-bf16=on,cmp-legacy=on,stibp-always-on=on,virt-ssbd=on,amd-psfd=on,lbrv=on,tsc-scale=on,vmcb-clean=on,pause-filter=on,pfthreshold=on,v-vmsave-vmload=on,vgif=on,no-nested-data-bp=on,lfence-always-serializing=on,null-sel-clr-base=on,auto-ibrs=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on -m size=4194304k -object {"qom-type":"memory-backend-ram","id":"pc.ram","size":4294967296} -overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=4,threads=1 -uuid e7f5187a-6e76-4f14-8085-69c95ad5e01f -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device {"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"} -device {"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"} -device {"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"} -device {"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"} -device {"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"} -device {"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"} -device {"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"} -device {"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"} -device {"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"} -device {"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"} -device {"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"} -device {"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"} -device {"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"} -device {"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"} -device {"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.3","addr":"0x0"} -device {"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.2","addr":"0x0"} -blockdev {"driver":"file","filename":"/home/rhel9_sev.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device {"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"} -netdev {"type":"tap","fd":"44","vhost":true,"vhostfd":"45","id":"hostnet0"} -device {"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:56:00:00:00:07","bus":"pci.1","addr":"0x0"} -add-fd set=0,fd=22,opaque=serial0-source -chardev file,id=charserial0,path=/dev/fdset/0,append=on -device {"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0} -chardev pty,id=charserial1 -device {"driver":"isa-serial","chardev":"charserial1","id":"serial1","index":1} -chardev socket,id=charchannel0,fd=23,server=on,wait=off -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"} -chardev socket,id=chrtpm,path=/run/libvirt/qemu/swtpm/6-rhel9_sev-swtpm.sock -tpmdev emulator,id=tpm-tpm0,chardev=chrtpm -device {"driver":"tpm-crb","tpmdev":"tpm-tpm0","id":"tpm0"} -device {"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"} -audiodev {"id":"audio1","driver":"none"} -vnc 0.0.0.0:1,audiodev=audio1 -device {"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"} -global ICH9-LPC.noreboot=off -watchdog-action reset -device {"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"} -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/random"} -device {"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"} -object {"qom-type":"sev-guest","id":"lsec0","cbitpos":51,"reduced-phys-bits":1,"policy":7} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

Result:
Guest boots up. 


Sanity sev test log:
https://beaker.engineering.redhat.com/jobs/8225293

Comment 44 Gerd Hoffmann 2023-08-28 07:43:04 UTC

> Tested edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324.noarch.rpm, repeat
> test 20 times, not found booting time increase issue. 

Good, thanks.

Can you give it a few more test runs to be sure the fix works?
20 times isn't that much for a reproduce rate <20%.
(not urgent, it's for upstream and 9.4).

> Also test edk2-20230524-3.el9 in the same host, same result with
> edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324.noarch
> 
> Are we going to get edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324 in 9.3?

No, we stick to the patch we have in edk2-20230524-3.el9 for 9.3,
but upstream (and by rebase 9.4 too) will most likely switch over.

Comment 45 zixchen 2023-09-04 04:04:45 UTC

(In reply to Gerd Hoffmann from comment #44)
> > Tested edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324.noarch.rpm, repeat
> > test 20 times, not found booting time increase issue. 
> 
> Good, thanks.
> 
> Can you give it a few more test runs to be sure the fix works?
> 20 times isn't that much for a reproduce rate <20%.
> (not urgent, it's for upstream and 9.4).


Test with latest compose and edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324.noarch, run 100 tests, no issue found, guest booting time doesn't not increase either. 

Version:
edk2-ovmf-20230524-3.el9.bz2211060.20230824.1324.noarch
qemu-kvm-8.0.0-13.el9.x86_64
kernel-5.14.0-361.el9.x86_64

Steps:
1. Boot sev-es guest with data disks, do mirror or snapshot tests and disk checks inside guest. 
2. Repeat test 100 times

Result:
SEV-ES guests boot and the booting time doesn't increase.

Comment 47 errata-xmlrpc 2023-11-07 08:24:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: edk2 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6330

Note You need to log in before you can comment on or make changes to this bug.