Bug 2108391

Summary: Fail to boot a guest with 30GB sgx epc section when host is capable of 2*64G epc sections [rhel-9]
Product: Red Hat Enterprise Linux 9 Reporter: zixchen
Component: qemu-kvmAssignee: Bandan Das <bdas>
qemu-kvm sub component: Devices QA Contact: zixchen
Status: NEW --- Docs Contact:
Severity: medium    
Priority: medium CC: bdas, coli, jinzhao, juzhang, nilal, virt-maint, ymankad
Version: 9.1Keywords: CustomerScenariosInitiative, Reopened
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-07 09:57:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2108392    

Description zixchen 2022-07-19 01:15:55 UTC
Description of problem:
Ice lake host has a 2*64G epc sections, but boot a guest with 30GB sgx epc section failed. Guest boots failed in the early stage with Tianocore icon.  

Version-Release number of selected component (if applicable):
kernel-5.14.0-130.el9.x86_64
qemu-kvm-7.0.0-8.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Check host epc sections:
./test-sgx
...
CPUID Leaf 12H, Sub-Leaf 2 of Intel SGX Capabilities (EAX=12H,ECX=2)
eax: 18000001 ebx: 10 ecx: e7800002 edx: f
size of EPC section in Processor Reserved Memory, 65144 M

CPUID Leaf 12H, Sub-Leaf 3 of Intel SGX Capabilities (EAX=12H,ECX=3)
eax: 18000001 ebx: 30 ecx: e8000002 edx: f
size of EPC section in Processor Reserved Memory, 65152 M
...
# dmesg|grep -i sgx
[   10.054125] sgx: EPC section 0x1018000000-0x1fff7fffff
[   10.164986] sgx: EPC section 0x3018000000-0x3fffffffff
2. Boot a guest with 30 GB epc sections
...
    -machine q35,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
    -cpu host,+kvm_pv_unhalt \
    -m 40960 \
    -object memory-backend-epc,size=22960M,prealloc=true,id=mem-epc0 \
    -object memory-backend-ram,size=40960M,id=mem-mem0 \
    -smp 16,maxcpus=16,cores=4,threads=2,dies=1,sockets=2  \
    -numa node,memdev=mem-mem0,cpus=0-15  \
...
3. hmp info sgx:
(qemu) info sgx
SGX support: enabled
SGX1 support: enabled
SGX2 support: enabled
FLC support: enabled
size: 24075304960
NUMA node #0: size=24075304960

Actual results:
Guest boots failed in the early stage with Tianocore icon

Expected results:
Guest boots successfully

Additional info:
Boots a guest with 20GB epc section successfully.
If setting BIOS with 2*2G epc section, guest with 4G epc sections will boot successfully

Comment 1 zixchen 2022-07-19 08:21:40 UTC
Update: any guest epc section size above 22G will cause the same issue

Comment 4 Bandan Das 2022-12-08 18:27:20 UTC
(In reply to zixchen from comment #0)
> Description of problem:
> Ice lake host has a 2*64G epc sections, but boot a guest with 30GB sgx epc
> section failed. Guest boots failed in the early stage with Tianocore icon.  
> 
> Version-Release number of selected component (if applicable):
> kernel-5.14.0-130.el9.x86_64
> qemu-kvm-7.0.0-8.el9.x86_64
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Check host epc sections:
> ./test-sgx
> ...
> CPUID Leaf 12H, Sub-Leaf 2 of Intel SGX Capabilities (EAX=12H,ECX=2)
> eax: 18000001 ebx: 10 ecx: e7800002 edx: f
> size of EPC section in Processor Reserved Memory, 65144 M
> 
> CPUID Leaf 12H, Sub-Leaf 3 of Intel SGX Capabilities (EAX=12H,ECX=3)
> eax: 18000001 ebx: 30 ecx: e8000002 edx: f
> size of EPC section in Processor Reserved Memory, 65152 M
> ...
> # dmesg|grep -i sgx
> [   10.054125] sgx: EPC section 0x1018000000-0x1fff7fffff
> [   10.164986] sgx: EPC section 0x3018000000-0x3fffffffff
> 2. Boot a guest with 30 GB epc sections
> ...
>     -machine
> q35,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0,pflash0=drive_ovmf_code,
> pflash1=drive_ovmf_vars \
>     -cpu host,+kvm_pv_unhalt \
>     -m 40960 \
>     -object memory-backend-epc,size=22960M,prealloc=true,id=mem-epc0 \
>     -object memory-backend-ram,size=40960M,id=mem-mem0 \
>     -smp 16,maxcpus=16,cores=4,threads=2,dies=1,sockets=2  \
>     -numa node,memdev=mem-mem0,cpus=0-15  \
> ...
> 3. hmp info sgx:
> (qemu) info sgx
> SGX support: enabled
> SGX1 support: enabled
> SGX2 support: enabled
> FLC support: enabled
> size: 24075304960
> NUMA node #0: size=24075304960
> 
> Actual results:
> Guest boots failed in the early stage with Tianocore icon
> 
> Expected results:
> Guest boots successfully
> 
> Additional info:
> Boots a guest with 20GB epc section successfully.
> If setting BIOS with 2*2G epc section, guest with 4G epc sections will boot
> successfully

Sorry, I could never get to this one. I have a suggestion for QE. 
Would it be possible to check if Vladis' MR here https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1472/commits
changes anything ? Thanks!

Comment 5 zixchen 2022-12-14 08:55:28 UTC
(In reply to Bandan Das from comment #4)
> (In reply to zixchen from comment #0)
> > Description of problem:
> > Ice lake host has a 2*64G epc sections, but boot a guest with 30GB sgx epc
> > section failed. Guest boots failed in the early stage with Tianocore icon.  
> > 
> > Version-Release number of selected component (if applicable):
> > kernel-5.14.0-130.el9.x86_64
> > qemu-kvm-7.0.0-8.el9.x86_64
> > 
> > How reproducible:
> > 100%
> > 
> > Steps to Reproduce:
> > 1. Check host epc sections:
> > ./test-sgx
> > ...
> > CPUID Leaf 12H, Sub-Leaf 2 of Intel SGX Capabilities (EAX=12H,ECX=2)
> > eax: 18000001 ebx: 10 ecx: e7800002 edx: f
> > size of EPC section in Processor Reserved Memory, 65144 M
> > 
> > CPUID Leaf 12H, Sub-Leaf 3 of Intel SGX Capabilities (EAX=12H,ECX=3)
> > eax: 18000001 ebx: 30 ecx: e8000002 edx: f
> > size of EPC section in Processor Reserved Memory, 65152 M
> > ...
> > # dmesg|grep -i sgx
> > [   10.054125] sgx: EPC section 0x1018000000-0x1fff7fffff
> > [   10.164986] sgx: EPC section 0x3018000000-0x3fffffffff
> > 2. Boot a guest with 30 GB epc sections
> > ...
> >     -machine
> > q35,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0,pflash0=drive_ovmf_code,
> > pflash1=drive_ovmf_vars \
> >     -cpu host,+kvm_pv_unhalt \
> >     -m 40960 \
> >     -object memory-backend-epc,size=22960M,prealloc=true,id=mem-epc0 \
> >     -object memory-backend-ram,size=40960M,id=mem-mem0 \
> >     -smp 16,maxcpus=16,cores=4,threads=2,dies=1,sockets=2  \
> >     -numa node,memdev=mem-mem0,cpus=0-15  \
> > ...
> > 3. hmp info sgx:
> > (qemu) info sgx
> > SGX support: enabled
> > SGX1 support: enabled
> > SGX2 support: enabled
> > FLC support: enabled
> > size: 24075304960
> > NUMA node #0: size=24075304960
> > 
> > Actual results:
> > Guest boots failed in the early stage with Tianocore icon
> > 
> > Expected results:
> > Guest boots successfully
> > 
> > Additional info:
> > Boots a guest with 20GB epc section successfully.
> > If setting BIOS with 2*2G epc section, guest with 4G epc sections will boot
> > successfully
> 
> Sorry, I could never get to this one. I have a suggestion for QE. 
> Would it be possible to check if Vladis' MR here
> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/
> merge_requests/1472/commits
> changes anything ? Thanks!
Tested with kernel-5.14.0-212.el9.x86_64, boot a guest with 30G epc still failed.

Version:
kernel-5.14.0-212.el9.x86_64
qemu-kvm-7.1.0-5.el9.x86_64

Steps:
VM mem=40G epc=30G host model cpu
...
     -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0 \
     -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
     -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
     -nodefaults \
     -device VGA,bus=pcie.0,addr=0x2 \
     -m 40960 \ 
     -object memory-backend-epc,size=30720M,prealloc=on,id=mem-epc0 \
     -object '{"qom-type": "memory-backend-ram", "size": 42949672960, "id": "mem-machine_mem"}'  \
     -smp 56,maxcpus=56,cores=28,threads=1,dies=1,sockets=2  \
     -cpu 'host',+kvm_pv_unhalt \
...

Result:
VM boots failed, stuck at Tiancore screen.

Comment 6 zixchen 2023-03-07 09:57:35 UTC
Retest it on 9.2, the issue is fixed.

Version:
qemu-kvm-7.2.0-8.el9.x86_64
kernel-5.14.0-202.el9.x86_64

Steps:
1. host epc capability 2* 64G
sgx: EPC section 0x1018000000-0x1fff7fffff
sgx: EPC section 0x3018000000-0x3fffffffff
2.Boot VM mem=40G epc=64G host model cpu
...
     -machine pc-q35-rhel9.2.0,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0 \
     -m 40960 \
     -object '{"size": 68308434943, "prealloc": true, "policy": "bind", "host-nodes": [0], "id": "mem-epc0", "qom-type": "memory-backend-epc"}' \
     -cpu 'host',+kvm_pv_unhalt \
...
3. check epc section inside the guest
# dmesg|grep -i sgx
[    0.870117] sgx: EPC section 0xa80000000-0x1a677fefff
[    1.134082] sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.

Comment 7 zixchen 2023-04-03 02:54:04 UTC
(In reply to zixchen from comment #6)
>      -machine
> pc-q35-rhel9.2.0,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,sgx-epc.0.
> memdev=mem-epc0,sgx-epc.0.node=0 \
>      -m 40960 \
>      -object '{"size": 68308434943, "prealloc": true, "policy": "bind",
> "host-nodes": [0], "id": "mem-epc0", "qom-type": "memory-backend-epc"}' \
>      -cpu 'host',+kvm_pv_unhalt \
> ...

I have some updates on this bug, this issue may not be completely fixed, as if guest has numa node, large sgx enclave works well, but if guest has no numa node enabled, it can reproduce the bug 100%. Since this, I suggest to reopen this bug, but the severity of this bug should be lower to medium as it can only reproduce without guest numa node. Bandan, Nitesh, what do you think?

Version:
9.3: 
kernel-5.14.0-289.el9.x86_64
qemu-kvm-7.2.0-14.el9_2.x86_64
9.2:
qemu-kvm-7.2.0-11.el9_2.x86_64
kernel-5.14.0-284.2.1.el9_2.x86_64

Reproduced Steps:
/usr/libexec/qemu-kvm \
     -S  \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
     -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel920-64-virtio-scsi_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
     -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0 \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 93184 \
     -object '{"size": 68308434943, "prealloc": true, "id": "mem-epc0", "qom-type": "memory-backend-epc"}' \
     -object '{"size": 97710505984, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
     -smp 56,maxcpus=56,cores=28,threads=1,dies=1,sockets=2  \
     -cpu 'host',sgx-debug,sgx-exinfo,sgx-kss,sgx-mode64,sgx-provisionkey,sgx-tokenkey,+kvm_pv_unhalt \
     -chardev socket,id=qmp_id_qmpmonitor1,server=on,wait=off,path=/var/tmp/avocado_dzkdn6ce/monitor-qmpmonitor1-20230315-053634-BdqHoYJk  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,id=qmp_id_catch_monitor,server=on,wait=off,path=/var/tmp/avocado_dzkdn6ce/monitor-catch_monitor-20230315-053634-BdqHoYJk  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idTI5meW"}' \
     -chardev socket,id=chardev_serial0,server=on,wait=off,path=/var/tmp/avocado_dzkdn6ce/serial-serial0-20230315-053634-BdqHoYJk \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -chardev socket,id=seabioslog_id_20230315-053634-BdqHoYJk,path=/var/tmp/avocado_dzkdn6ce/seabios-20230315-053634-BdqHoYJk,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230315-053634-BdqHoYJk,iobase=0x402 \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel920-64-virtio-scsi.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:28:ec:de:40:81", "id": "idSRu2nN", "netdev": "idAKJqwi", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=idAKJqwi,vhost=on  \
     -vnc :1  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -monitor stdio \

Result:
Guest boots failed in the early stage with Tianocore icon