Bug 2108391 - Fail to boot a guest with 30GB sgx epc section when host is capable of 2*64G epc sections [rhel-9]
Summary: Fail to boot a guest with 30GB sgx epc section when host is capable of 2*64G ...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Bandan Das
QA Contact: zixchen
URL:
Whiteboard:
Depends On:
Blocks: 2108392
TreeView+ depends on / blocked
 
Reported: 2022-07-19 01:15 UTC by zixchen
Modified: 2023-06-29 05:17 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-07 09:57:35 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-128087 0 None None None 2022-07-19 01:18:31 UTC

Description zixchen 2022-07-19 01:15:55 UTC
Description of problem:
Ice lake host has a 2*64G epc sections, but boot a guest with 30GB sgx epc section failed. Guest boots failed in the early stage with Tianocore icon.  

Version-Release number of selected component (if applicable):
kernel-5.14.0-130.el9.x86_64
qemu-kvm-7.0.0-8.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Check host epc sections:
./test-sgx
...
CPUID Leaf 12H, Sub-Leaf 2 of Intel SGX Capabilities (EAX=12H,ECX=2)
eax: 18000001 ebx: 10 ecx: e7800002 edx: f
size of EPC section in Processor Reserved Memory, 65144 M

CPUID Leaf 12H, Sub-Leaf 3 of Intel SGX Capabilities (EAX=12H,ECX=3)
eax: 18000001 ebx: 30 ecx: e8000002 edx: f
size of EPC section in Processor Reserved Memory, 65152 M
...
# dmesg|grep -i sgx
[   10.054125] sgx: EPC section 0x1018000000-0x1fff7fffff
[   10.164986] sgx: EPC section 0x3018000000-0x3fffffffff
2. Boot a guest with 30 GB epc sections
...
    -machine q35,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
    -cpu host,+kvm_pv_unhalt \
    -m 40960 \
    -object memory-backend-epc,size=22960M,prealloc=true,id=mem-epc0 \
    -object memory-backend-ram,size=40960M,id=mem-mem0 \
    -smp 16,maxcpus=16,cores=4,threads=2,dies=1,sockets=2  \
    -numa node,memdev=mem-mem0,cpus=0-15  \
...
3. hmp info sgx:
(qemu) info sgx
SGX support: enabled
SGX1 support: enabled
SGX2 support: enabled
FLC support: enabled
size: 24075304960
NUMA node #0: size=24075304960

Actual results:
Guest boots failed in the early stage with Tianocore icon

Expected results:
Guest boots successfully

Additional info:
Boots a guest with 20GB epc section successfully.
If setting BIOS with 2*2G epc section, guest with 4G epc sections will boot successfully

Comment 1 zixchen 2022-07-19 08:21:40 UTC
Update: any guest epc section size above 22G will cause the same issue

Comment 4 Bandan Das 2022-12-08 18:27:20 UTC
(In reply to zixchen from comment #0)
> Description of problem:
> Ice lake host has a 2*64G epc sections, but boot a guest with 30GB sgx epc
> section failed. Guest boots failed in the early stage with Tianocore icon.  
> 
> Version-Release number of selected component (if applicable):
> kernel-5.14.0-130.el9.x86_64
> qemu-kvm-7.0.0-8.el9.x86_64
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Check host epc sections:
> ./test-sgx
> ...
> CPUID Leaf 12H, Sub-Leaf 2 of Intel SGX Capabilities (EAX=12H,ECX=2)
> eax: 18000001 ebx: 10 ecx: e7800002 edx: f
> size of EPC section in Processor Reserved Memory, 65144 M
> 
> CPUID Leaf 12H, Sub-Leaf 3 of Intel SGX Capabilities (EAX=12H,ECX=3)
> eax: 18000001 ebx: 30 ecx: e8000002 edx: f
> size of EPC section in Processor Reserved Memory, 65152 M
> ...
> # dmesg|grep -i sgx
> [   10.054125] sgx: EPC section 0x1018000000-0x1fff7fffff
> [   10.164986] sgx: EPC section 0x3018000000-0x3fffffffff
> 2. Boot a guest with 30 GB epc sections
> ...
>     -machine
> q35,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0,pflash0=drive_ovmf_code,
> pflash1=drive_ovmf_vars \
>     -cpu host,+kvm_pv_unhalt \
>     -m 40960 \
>     -object memory-backend-epc,size=22960M,prealloc=true,id=mem-epc0 \
>     -object memory-backend-ram,size=40960M,id=mem-mem0 \
>     -smp 16,maxcpus=16,cores=4,threads=2,dies=1,sockets=2  \
>     -numa node,memdev=mem-mem0,cpus=0-15  \
> ...
> 3. hmp info sgx:
> (qemu) info sgx
> SGX support: enabled
> SGX1 support: enabled
> SGX2 support: enabled
> FLC support: enabled
> size: 24075304960
> NUMA node #0: size=24075304960
> 
> Actual results:
> Guest boots failed in the early stage with Tianocore icon
> 
> Expected results:
> Guest boots successfully
> 
> Additional info:
> Boots a guest with 20GB epc section successfully.
> If setting BIOS with 2*2G epc section, guest with 4G epc sections will boot
> successfully

Sorry, I could never get to this one. I have a suggestion for QE. 
Would it be possible to check if Vladis' MR here https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1472/commits
changes anything ? Thanks!

Comment 5 zixchen 2022-12-14 08:55:28 UTC
(In reply to Bandan Das from comment #4)
> (In reply to zixchen from comment #0)
> > Description of problem:
> > Ice lake host has a 2*64G epc sections, but boot a guest with 30GB sgx epc
> > section failed. Guest boots failed in the early stage with Tianocore icon.  
> > 
> > Version-Release number of selected component (if applicable):
> > kernel-5.14.0-130.el9.x86_64
> > qemu-kvm-7.0.0-8.el9.x86_64
> > 
> > How reproducible:
> > 100%
> > 
> > Steps to Reproduce:
> > 1. Check host epc sections:
> > ./test-sgx
> > ...
> > CPUID Leaf 12H, Sub-Leaf 2 of Intel SGX Capabilities (EAX=12H,ECX=2)
> > eax: 18000001 ebx: 10 ecx: e7800002 edx: f
> > size of EPC section in Processor Reserved Memory, 65144 M
> > 
> > CPUID Leaf 12H, Sub-Leaf 3 of Intel SGX Capabilities (EAX=12H,ECX=3)
> > eax: 18000001 ebx: 30 ecx: e8000002 edx: f
> > size of EPC section in Processor Reserved Memory, 65152 M
> > ...
> > # dmesg|grep -i sgx
> > [   10.054125] sgx: EPC section 0x1018000000-0x1fff7fffff
> > [   10.164986] sgx: EPC section 0x3018000000-0x3fffffffff
> > 2. Boot a guest with 30 GB epc sections
> > ...
> >     -machine
> > q35,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0,pflash0=drive_ovmf_code,
> > pflash1=drive_ovmf_vars \
> >     -cpu host,+kvm_pv_unhalt \
> >     -m 40960 \
> >     -object memory-backend-epc,size=22960M,prealloc=true,id=mem-epc0 \
> >     -object memory-backend-ram,size=40960M,id=mem-mem0 \
> >     -smp 16,maxcpus=16,cores=4,threads=2,dies=1,sockets=2  \
> >     -numa node,memdev=mem-mem0,cpus=0-15  \
> > ...
> > 3. hmp info sgx:
> > (qemu) info sgx
> > SGX support: enabled
> > SGX1 support: enabled
> > SGX2 support: enabled
> > FLC support: enabled
> > size: 24075304960
> > NUMA node #0: size=24075304960
> > 
> > Actual results:
> > Guest boots failed in the early stage with Tianocore icon
> > 
> > Expected results:
> > Guest boots successfully
> > 
> > Additional info:
> > Boots a guest with 20GB epc section successfully.
> > If setting BIOS with 2*2G epc section, guest with 4G epc sections will boot
> > successfully
> 
> Sorry, I could never get to this one. I have a suggestion for QE. 
> Would it be possible to check if Vladis' MR here
> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/
> merge_requests/1472/commits
> changes anything ? Thanks!
Tested with kernel-5.14.0-212.el9.x86_64, boot a guest with 30G epc still failed.

Version:
kernel-5.14.0-212.el9.x86_64
qemu-kvm-7.1.0-5.el9.x86_64

Steps:
VM mem=40G epc=30G host model cpu
...
     -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0 \
     -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
     -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
     -nodefaults \
     -device VGA,bus=pcie.0,addr=0x2 \
     -m 40960 \ 
     -object memory-backend-epc,size=30720M,prealloc=on,id=mem-epc0 \
     -object '{"qom-type": "memory-backend-ram", "size": 42949672960, "id": "mem-machine_mem"}'  \
     -smp 56,maxcpus=56,cores=28,threads=1,dies=1,sockets=2  \
     -cpu 'host',+kvm_pv_unhalt \
...

Result:
VM boots failed, stuck at Tiancore screen.

Comment 6 zixchen 2023-03-07 09:57:35 UTC
Retest it on 9.2, the issue is fixed.

Version:
qemu-kvm-7.2.0-8.el9.x86_64
kernel-5.14.0-202.el9.x86_64

Steps:
1. host epc capability 2* 64G
sgx: EPC section 0x1018000000-0x1fff7fffff
sgx: EPC section 0x3018000000-0x3fffffffff
2.Boot VM mem=40G epc=64G host model cpu
...
     -machine pc-q35-rhel9.2.0,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0 \
     -m 40960 \
     -object '{"size": 68308434943, "prealloc": true, "policy": "bind", "host-nodes": [0], "id": "mem-epc0", "qom-type": "memory-backend-epc"}' \
     -cpu 'host',+kvm_pv_unhalt \
...
3. check epc section inside the guest
# dmesg|grep -i sgx
[    0.870117] sgx: EPC section 0xa80000000-0x1a677fefff
[    1.134082] sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.

Comment 7 zixchen 2023-04-03 02:54:04 UTC
(In reply to zixchen from comment #6)
>      -machine
> pc-q35-rhel9.2.0,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,sgx-epc.0.
> memdev=mem-epc0,sgx-epc.0.node=0 \
>      -m 40960 \
>      -object '{"size": 68308434943, "prealloc": true, "policy": "bind",
> "host-nodes": [0], "id": "mem-epc0", "qom-type": "memory-backend-epc"}' \
>      -cpu 'host',+kvm_pv_unhalt \
> ...

I have some updates on this bug, this issue may not be completely fixed, as if guest has numa node, large sgx enclave works well, but if guest has no numa node enabled, it can reproduce the bug 100%. Since this, I suggest to reopen this bug, but the severity of this bug should be lower to medium as it can only reproduce without guest numa node. Bandan, Nitesh, what do you think?

Version:
9.3: 
kernel-5.14.0-289.el9.x86_64
qemu-kvm-7.2.0-14.el9_2.x86_64
9.2:
qemu-kvm-7.2.0-11.el9_2.x86_64
kernel-5.14.0-284.2.1.el9_2.x86_64

Reproduced Steps:
/usr/libexec/qemu-kvm \
     -S  \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
     -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel920-64-virtio-scsi_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
     -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,sgx-epc.0.memdev=mem-epc0,sgx-epc.0.node=0 \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
     -m 93184 \
     -object '{"size": 68308434943, "prealloc": true, "id": "mem-epc0", "qom-type": "memory-backend-epc"}' \
     -object '{"size": 97710505984, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
     -smp 56,maxcpus=56,cores=28,threads=1,dies=1,sockets=2  \
     -cpu 'host',sgx-debug,sgx-exinfo,sgx-kss,sgx-mode64,sgx-provisionkey,sgx-tokenkey,+kvm_pv_unhalt \
     -chardev socket,id=qmp_id_qmpmonitor1,server=on,wait=off,path=/var/tmp/avocado_dzkdn6ce/monitor-qmpmonitor1-20230315-053634-BdqHoYJk  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,id=qmp_id_catch_monitor,server=on,wait=off,path=/var/tmp/avocado_dzkdn6ce/monitor-catch_monitor-20230315-053634-BdqHoYJk  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idTI5meW"}' \
     -chardev socket,id=chardev_serial0,server=on,wait=off,path=/var/tmp/avocado_dzkdn6ce/serial-serial0-20230315-053634-BdqHoYJk \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -chardev socket,id=seabioslog_id_20230315-053634-BdqHoYJk,path=/var/tmp/avocado_dzkdn6ce/seabios-20230315-053634-BdqHoYJk,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20230315-053634-BdqHoYJk,iobase=0x402 \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel920-64-virtio-scsi.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:28:ec:de:40:81", "id": "idSRu2nN", "netdev": "idAKJqwi", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=idAKJqwi,vhost=on  \
     -vnc :1  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -monitor stdio \

Result:
Guest boots failed in the early stage with Tianocore icon


Note You need to log in before you can comment on or make changes to this bug.