Bug 2041823
Summary: | [aarch64][numa] When there are at least 6 Numa nodes serial log shows 'arch topology borken' | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Zhenyu Zhang <zhenyzha> |
Component: | qemu-kvm | Assignee: | Guowen Shan <gshan> |
qemu-kvm sub component: | General | QA Contact: | Zhenyu Zhang <zhenyzha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | cohuck, eric.auger, gshan, jinzhao, juzhang, lijin, mrezanin, virt-maint, yihyu |
Version: | 9.1 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | 9.1 | ||
Hardware: | aarch64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-7.0.0-4.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-11-15 09:53:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | QEMU v7.1 |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1924294, 2042780 |
Description
Zhenyu Zhang
2022-01-18 10:45:41 UTC
Updated information with -m 4096, numa nodes = 16, node size = 256M hit this issue too And -m 4096, numa nodes = 8, node size = 512M hit this issue too But when I with -m 4096, numa nodes = 4, node size = 1024M guest boot failed, The issue can still be reproduced stably after using the new guest image. Guest cmd: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \ -blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel900-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \ -machine virt,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0 \ -m 4096 \ -object memory-backend-ram,size=1024M,prealloc=yes,policy=default,id=mem-mem0 \ -object memory-backend-ram,size=1024M,prealloc=yes,policy=default,id=mem-mem1 \ -object memory-backend-ram,size=1024M,prealloc=yes,policy=default,id=mem-mem2 \ -object memory-backend-ram,size=1024M,prealloc=yes,policy=default,id=mem-mem3 \ -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \ -numa node,memdev=mem-mem0 \ -numa node,memdev=mem-mem1 \ -numa node,memdev=mem-mem2 \ -numa node,memdev=mem-mem3 \ -cpu 'host' \ -chardev socket,path=/tmp/monitor-qmpmonitor1-20220118-074539-Gc1DtNjK,id=qmp_id_qmpmonitor1,server=on,wait=off \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,path=/tmp/monitor-catch_monitor-20220118-074539-Gc1DtNjK,id=qmp_id_catch_monitor,server=on,wait=off \ -mon chardev=qmp_id_catch_monitor,mode=control \ -serial unix:'/tmp/serial-serial0-20220118-074539-Gc1DtNjK',server=on,wait=off \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \ -device virtio-net-pci,mac=9a:0e:b5:fd:40:f5,rombar=0,id=idQDTQKP,netdev=idWRRjx5,bus=pcie-root-port-4,addr=0x0 \ -netdev tap,id=idWRRjx5,vhost=on \ -vnc :20 \ -rtc base=utc,clock=host,driftfix=slew \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x2,chassis=6 \ -device pcie-root-port,id=pcie_extra_root_port_1,addr=0x2.0x1,bus=pcie.0,chassis=7 Boot failed, the serial log: 2022-01-18 07:45:46: UEFI firmware starting. 2022-01-18 07:45:46: �� 2022-01-18 07:45:46: [2J[01;01H[=3h[2J[01;01H 2022-01-18 07:45:54: BdsDxe: loading Boot0001 "Red Hat Enterprise Linux" from HD(1,GPT,3232CC39-E802-424E-BC21-7F83D8EE32D4,0x800,0x12C000)/\EFI\redhat\shimaa64.efi 2022-01-18 07:45:54: BdsDxe: starting Boot0001 "Red Hat Enterprise Linux" from HD(1,GPT,3232CC39-E802-424E-BC21-7F83D8EE32D4,0x800,0x12C000)/\EFI\redhat\shimaa64.efi 2022-01-18 07:45:54: error: ../../grub-core/term/serial.c:217:serial port `com0' isn't found. 2022-01-18 07:45:54: error: ../../grub-core/commands/terminal.c:138:terminal `serial' isn't found. 2022-01-18 07:45:54: error: ../../grub-core/commands/terminal.c:138:terminal `serial' isn't found. 2022-01-18 07:45:54: [0m[30m[40m[2J[01;01H[0m[37m[40m[20;07HUse the ^ and v keys to change the selection. 2022-01-18 07:45:54: Press 'e' to edit the selected item, or 'c' for a command prompt. 2022-01-18 07:45:55: Press Escape to return to the previous menu. [04;80H [0m[30m[47m[04;01H Red Hat Enterprise Linux (5.14.0-39.el9.aarch64) 9.0 (Plow) [0m[37m[40m[04;79H[05;01H Red Hat Enterprise Linux (0-rescue-26908f3c220344fab03b4021ec7b9d0c) 9.0> 2022-01-18 07:45:55: [06;01H[05;79H[06;01H UEFI Firmware Settings [06;79H[07;01H [07;79H[08;01H [08;79H[09;01H [09;79H[10;01H [10;79H[11;01H [11;79H[12;01H [12;79H[13;01H [13;79H[14;01H [14;79H[15;01H [15;79H[16;01H [16;79H[17;01H [17;79H[18;01H [18;79H[18;80H [04;79H[23;01H The selected entry will be started automatically in 5s. [04;79H 2022-01-18 07:45:56: [23;01H The selected entry will be started automatically in 4s. [04;79H 2022-01-18 07:45:57: [23;01H The selected entry will be started automatically in 3s. [04;79H 2022-01-18 07:45:58: [23;01H The selected entry will be started automatically in 2s. [04;79H 2022-01-18 07:45:59: [23;01H The selected entry will be started automatically in 1s. [04;79H 2022-01-18 07:46:00: [23;01H The selected entry will be started automatically in 0s. [04;79H[0m[30m[40m[2J[01;01H[0m[37m[40m[0m[30m[40m[2J[01;01H[0m[37m[40m 2022-01-18 07:46:02: EFI stub: Booting Linux Kernel... 2022-01-18 07:46:02: EFI stub: EFI_RNG_PROTOCOL unavailable 2022-01-18 07:46:02: EFI stub: Using DTB from configuration table 2022-01-18 07:46:02: EFI stub: Exiting boot services and installing virtual address map... Update test results: Use the latest qemu version(qemu-kvm-6.2.0-4.el9), both issues can still be triggered 1. When there are at least 6 Numa nodes serial log shows 'arch topology borken' cmd: -m 3072 \ -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem0 \ -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem1 \ -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem2 \ -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem3 \ -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem4 \ -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem5 \ -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \ -numa node,memdev=mem-mem0 \ -numa node,memdev=mem-mem1 \ -numa node,memdev=mem-mem2 \ -numa node,memdev=mem-mem3 \ -numa node,memdev=mem-mem4 \ -numa node,memdev=mem-mem5 \ serial log: 2022-01-19 22:25:19: [ 0.313488] smp: Brought up 6 nodes, 6 CPUs 2022-01-19 22:25:19: [ 0.313555] SMP: Total of 6 processors activated. 2022-01-19 22:25:19: [ 0.313562] CPU features: detected: CRC32 instructions 2022-01-19 22:25:19: [ 0.313566] CPU features: detected: LSE atomic instructions 2022-01-19 22:25:19: [ 0.313569] CPU features: detected: Privileged Access Never 2022-01-19 22:25:19: [ 0.313572] CPU features: detected: RAS Extension Support 2022-01-19 22:25:19: [ 0.344952] CPU: All CPU(s) started at EL1 2022-01-19 22:25:19: [ 0.345023] alternatives: patching kernel code 2022-01-19 22:25:19: [ 0.346008] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346016] the CLS domain not a subset of the MC domain 2022-01-19 22:25:19: [ 0.346021] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346023] the MC domain not a subset of the DIE domain 2022-01-19 22:25:19: [ 0.346028] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346030] the DIE domain not a subset of the NODE domain 2022-01-19 22:25:19: [ 0.346039] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346041] the CLS domain not a subset of the MC domain 2022-01-19 22:25:19: [ 0.346045] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346047] the MC domain not a subset of the DIE domain 2022-01-19 22:25:19: [ 0.346051] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346053] the DIE domain not a subset of the NODE domain 2022-01-19 22:25:19: [ 0.346061] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346063] the CLS domain not a subset of the MC domain 2022-01-19 22:25:19: [ 0.346067] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346069] the MC domain not a subset of the DIE domain 2022-01-19 22:25:19: [ 0.346073] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346076] the DIE domain not a subset of the NODE domain 2022-01-19 22:25:19: [ 0.346084] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346086] the CLS domain not a subset of the MC domain 2022-01-19 22:25:19: [ 0.346090] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346092] the MC domain not a subset of the DIE domain 2022-01-19 22:25:19: [ 0.346096] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346097] the DIE domain not a subset of the NODE domain 2022-01-19 22:25:19: [ 0.346105] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346107] the CLS domain not a subset of the MC domain 2022-01-19 22:25:19: [ 0.346111] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346113] the MC domain not a subset of the DIE domain 2022-01-19 22:25:19: [ 0.346117] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346118] the DIE domain not a subset of the NODE domain 2022-01-19 22:25:19: [ 0.346126] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346128] the CLS domain not a subset of the MC domain 2022-01-19 22:25:19: [ 0.346132] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346134] the MC domain not a subset of the DIE domain 2022-01-19 22:25:19: [ 0.346138] BUG: arch topology borken 2022-01-19 22:25:19: [ 0.346140] the DIE domain not a subset of the NODE domain 2. When there are at 3-5 Numa nodes, the guest hangs boot failed For the specific cmd please see comment1. Hello Guowen, Is the error about comment1 related to this bug? Do we need a new bug to track it down? hmm, Zhenyu, I think they're different issues. Please create another bug to track the issue reported from comment#1 and assign to me. Lets use this one to track the broken CPU topology issue. (In reply to Guowen Shan from comment #3) > hmm, Zhenyu, I think they're different issues. Please create another bug > to track the issue reported from comment#1 and assign to me. Lets use this > one to track the broken CPU topology issue. Got it, Guowen thanks for the reminder, create the following bug to track the comment#1 issue. Bug 2042780 - [aarch64][numa] When there are at 3-5 Numa nodes, the guest hangs boot failed The issue can be reproduced successfully with the following steps. Note that the issue is existing on upstream qemu as well. (1) Provision machine with last RHEL-9.0.0 build from beaker. (2) Install qemu-kvm/libvirt/virt-install. After that, install VM1 host# virt-install -n vm1 --vcpus 24 --memory 8192 --disk size=36 --network bridge=br0 \ -l http://10.19.43.4/rhel-9/nightly/RHEL-9/latest-RHEL-9.0.0/compose/BaseOS/aarch64/os/ (3) Shutdown vm1 and start it via command lines. Note that we have 6 NUMA nodes. host# virsh shutdown vm1 host# cd /home/gavin/sandbox/images; ln -s disk.qcow2 vm1.qcow2 host# /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem5,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 \ -L /home/gavin/sandbox/qemu.main/build/pc-bios \ -monitor none -serial mon:stdio -nographic -gdb tcp::1234 \ -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ -boot c -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \ -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \ -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \ -device pcie-root-port,bus=pcie.0,chassis=4,id=pcie.4 \ -device pcie-root-port,bus=pcie.0,chassis=5,id=pcie.5 \ -device pcie-root-port,bus=pcie.0,chassis=6,id=pcie.6 \ -device pcie-root-port,bus=pcie.0,chassis=7,id=pcie.7 \ -device pcie-root-port,bus=pcie.0,chassis=8,id=pcie.8 \ -device pcie-root-port,bus=pcie.0,chassis=9,id=pcie.9 \ -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \ -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \ -device virtio-net-pci,bus=pcie.6,netdev=unet,mac=52:54:00:f1:26:a0 \ -netdev user,id=unet,hostfwd=tcp::50959-:22 \ -device virtio-balloon-pci,id=balloon0,bus=pcie.7,free-page-reporting=yes \ -device pvpanic-pci,bus=pcie.8 : EFI stub: Booting Linux Kernel... EFI stub: EFI_RNG_PROTOCOL unavailable EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... SetUefiImageMemoryAttributes - 0x0000000047500000 - 0x0000000000040000 (0x0000000000000008) SetUefiImageMemoryAttributes - 0x0000000044190000 - 0x0000000000040000 (0x0000000000000008) SetUefiImageMemoryAttributes - 0x0000000044140000 - 0x0000000000040000 (0x0000000000000008) SetUefiImageMemoryAttributes - 0x00000000474C0000 - 0x0000000000030000 (0x0000000000000008) SetUefiImageMemoryAttributes - 0x00000000440F0000 - 0x0000000000040000 (0x0000000000000008) SetUefiImageMemoryAttributes - 0x0000000043FB0000 - 0x0000000000040000 (0x0000000000000008) SetUefiImageMemoryAttributes - 0x0000000043E00000 - 0x0000000000030000 (0x0000000000000008) SetUefiImageMemoryAttributes - 0x0000000043DC0000 - 0x0000000000030000 (0x0000000000000008) : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain <the above error log repeats> BUG: arch topology borken the DIE domain not a subset of the NODE domain The root cause is invalid command lines used to start the guest. There are several topology levels seen by CPU scheduler in RHEL9.0: SMT, CLUSTER, MULTI-CORE, DIE, NUMA-NODE. Their granularities should be enlarged and the guest kernel spews the warning message if the rule is violated. Taking the command lines used to reproduce the issue in comment#5, as below. There is conflicting configuration between 2 sockets and 6 NUMA nodes. I mean it's impossible for 6 NUMA nodes to be seated in two sockets. -cpu host -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem5,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 \ With the above invalid command lines, the topology seen by CPU scheduler is like below and it's invalid. The topology is broken from CLUSTER to NUMA-NODE. CPU SMT CLUSTER MULTI-CORE DIE NUMA-NODE ---------------------------------------------------------------------- 0 0 0,1,2 0 0 0 1 1 0,1,2 1 1 1 2 2 0,1,2 2 2 2 3 3 3,4,5 3 3 3 4 4 3,4,5 4 4 4 5 5 3,4,5 5 5 5 With the corrected command line as below, the CPU topology broken warning disappears. The key is number of sockets should match with that of NUMA nodes. -cpu host -smp 6,sockets=6,cores=1,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem5,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 \ CPU SMT CLUSTER MULTI-CORE DIE NUMA-NODE ---------------------------------------------------------------------- 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 (in reply to Zhenyu from comment#8) Zhenyu, With the command lines you used, the CPU topology seen by the guest kernel should be like below. It's broken. CPU SMT CLUSTER MULTI-CORE DIE NUMA-NODE ---------------------------------------------------------------------- 0 0 0 0,1 0 0,5 1 1 1 0,1 1 1,6 2 2 2 2,3 2 2,7 3 3 3 2,3 3 3,8 4 4 4 4,5 4 4,9 5 5 5 4,5 5 0,5 6 6 6 6,7 0 1,6 7 7 7 6,7 1 2,7 8 8 8 8,9 2 3,8 9 9 9 8,9 3 4,9 The reason is that the default NUMA node ID is assigned to the CPUs if their association isn't provided explicitly. The default node ID is calculated simply by (cpu_index % numa_node_num), meaning the factors related to SOCKET/DIE/MULTI-CORE/CLUSTER aren't considered at all in qemu/hw/arm/virt.c::virt_get_default_cpu_node_id(). We need consider SOCKET/DIE/MULTI-CORE/CLUSTER in virt_get_default_cpu_node_id(). I will figure out the fix and post patch to upstream community for review. With the attached patch applied, all the following command lines work well and no CPU topology broken warnings are seen any more. I will post the patch to upstream community for review. Besides, I think this needs defer to RHEL9.1 as the development cycle on RHEL9.0 is going to be completed pretty soon. -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem4,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0 <<< 6 CPUs are assocaited with 2 nodes evenly [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x4 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x5 -> Node 1 -smp 10,sockets=5,cores=2,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem4,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0 <<< 10 CPUs are associated with 5 nodes evenly. [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x2 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 2 -> MPIDR 0x4 -> Node 2 [ 0.000000] ACPI: NUMA: SRAT: PXM 2 -> MPIDR 0x5 -> Node 2 [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x6 -> Node 3 [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x7 -> Node 3 [ 0.000000] ACPI: NUMA: SRAT: PXM 4 -> MPIDR 0x8 -> Node 4 [ 0.000000] ACPI: NUMA: SRAT: PXM 4 -> MPIDR 0x9 -> Node 4 The attached patch also fixes bug 2042780. v1 patch was posted for review. https://lists.nongnu.org/archive/html/qemu-arm/2022-01/msg00269.html The last posted upstream series is v8, pending for review or merge. https://lists.nongnu.org/archive/html/qemu-arm/2022-04/msg00623.html The series is merged to upstream QEMU v7.1. Lets see if QEMU-RHEL9.1 has chance to be rebased to upstream QEMU-v7.1. ae9141d4a3 hw/acpi/aml-build: Use existing CPU topology to build PPTT table 4c18bc1923 hw/arm/virt: Fix CPU's default NUMA node ID e280ecb39b qtest/numa-test: Correct CPU and NUMA association in aarch64_numa_cpu() c9ec4cb5e4 hw/arm/virt: Consider SMP configuration in CPU topology ac7199a252 qtest/numa-test: Specify CPU topology in aarch64_numa_cpu() 1dcf7001d4 qapi/machine.json: Add cluster-id Hi Gavin, Congrats on getting the series merged! We won't rebase QEMU in 9.1 (ie. we'll keep v7.0), so you'll need to backport the series. (In reply to Luiz Capitulino from comment #15) > Congrats on getting the series merged! We won't rebase QEMU in 9.1 (ie. > we'll keep v7.0), so you'll need to backport the series. Luiz, thanks. Lets create an MR to backport them into our downstream QEMU for RHEL9.1. Thanks, Gavin Hello Gavin With below cmd, 'arch topology borken' was resolved, but hit another error. kdump.service - Crash recovery kernel arming kdump: No memory reserved for crash kernel Is this as expected? qemu-kvm-7.0.0-2.el9.gwshan202205111802 host kernel: 5.14.0-85.el9.aarch64 guest kernel: 5.14.0-86.el9.aarch64 -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem5,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 \ (qemu) info numa 6 nodes node 0 cpus: 0 1 2 node 0 size: 128 MB node 0 plugged: 0 MB node 1 cpus: 3 4 5 node 1 size: 128 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 128 MB node 2 plugged: 0 MB node 3 cpus: node 3 size: 128 MB node 3 plugged: 0 MB node 4 cpus: node 4 size: 128 MB node 4 plugged: 0 MB node 5 cpus: node 5 size: 384 MB node 5 plugged: 0 MB [root@dhcp19-243-226 ~]# numactl --hardware available: 6 nodes (0-5) node 0 cpus: 0 1 2 node 0 size: 83 MB node 0 free: 5 MB node 1 cpus: 3 4 5 node 1 size: 125 MB node 1 free: 10 MB node 2 cpus: node 2 size: 125 MB node 2 free: 8 MB node 3 cpus: node 3 size: 125 MB node 3 free: 8 MB node 4 cpus: node 4 size: 125 MB node 4 free: 11 MB node 5 cpus: node 5 size: 367 MB node 5 free: 29 MB node distances: node 0 1 2 3 4 5 0: 10 20 20 20 20 20 1: 20 10 20 20 20 20 2: 20 20 10 20 20 20 3: 20 20 20 10 20 20 4: 20 20 20 20 10 20 5: 20 20 20 20 20 10 [ 0.000000] smccc: KVM: hypervisor services detected (0x00000000 0x00000000 0x00000000 0x00000003) [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x4 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x5 -> Node 1 [ 0.000000] percpu: Embedded 31 pages/cpu s86680 r8192 d32104 u126976 [ 0.000000] pcpu-alloc: s86680 r8192 d32104 u126976 alloc=31*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [1] 3 [1] 4 [1] 5 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] CPU features: detected: GIC system register CPU interface [ 0.000000] CPU features: detected: Spectre-v2 [ 0.000000] CPU features: kernel page table isolation forced ON by KASLR [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI) [ 0.000000] Built 6 zonelists, mobility grouping on. Total pages: 258048 [root@dhcp19-243-226 ~]# systemctl status kdump.service lines 1-13/13 (END) × kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2022-05-12 13:43:27 CST; 4min 3s ago Process: 1106 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE) Main PID: 1106 (code=exited, status=1/FAILURE) CPU: 29ms May 12 13:43:27 localhost.localdomain systemd[1]: Starting Crash recovery kernel arming... May 12 13:43:27 localhost.localdomain kdumpctl[1110]: kdump: No memory reserved for crash kernel May 12 13:43:27 localhost.localdomain kdumpctl[1110]: kdump: Starting kdump: [FAILED] May 12 13:43:27 localhost.localdomain systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE May 12 13:43:27 localhost.localdomain systemd[1]: kdump.service: Failed with result 'exit-code'. May 12 13:43:27 localhost.localdomain systemd[1]: Failed to start Crash recovery kernel arming. [root@dhcp19-243-226 ~]# vim /var/log/messages ...... May 12 13:43:27 localhost kdumpctl[1110]: kdump: No memory reserved for crash kernel May 12 13:43:27 localhost kdumpctl[1110]: kdump: Starting kdump: [FAILED] May 12 13:43:27 localhost systemd[1]: Started Command Scheduler. May 12 13:43:27 localhost systemd[1]: Starting GNOME Display Manager... May 12 13:43:27 localhost systemd[1]: Starting Hold until boot process finishes up... May 12 13:43:27 localhost systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE May 12 13:43:27 localhost systemd[1]: kdump.service: Failed with result 'exit-code'. May 12 13:43:27 localhost systemd[1]: Failed to start Crash recovery kernel arming. ....... (In reply to Zhenyu Zhang from comment #23) > Hello Gavin > > With below cmd, 'arch topology borken' was resolved, but hit another error. > kdump.service - Crash recovery kernel arming > kdump: No memory reserved for crash kernel > Is this as expected? Seems to be the cause of the memory boundary value. I doubled the memory and no longer had this problem, so set tested -m 2048M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=256M \ -object memory-backend-ram,id=mem1,size=256M \ -object memory-backend-ram,id=mem2,size=256M \ -object memory-backend-ram,id=mem3,size=256M \ -object memory-backend-ram,id=mem4,size=256M \ -object memory-backend-ram,id=mem5,size=768M \ (In reply to Zhenyu Zhang from comment #24) > (In reply to Zhenyu Zhang from comment #23) > > > > With below cmd, 'arch topology borken' was resolved, but hit another error. > > kdump.service - Crash recovery kernel arming > > kdump: No memory reserved for crash kernel > > Is this as expected? > > Seems to be the cause of the memory boundary value. > I doubled the memory and no longer had this problem, so set tested > > -m 2048M,slots=16,maxmem=64G \ > -object memory-backend-ram,id=mem0,size=256M \ > -object memory-backend-ram,id=mem1,size=256M \ > -object memory-backend-ram,id=mem2,size=256M \ > -object memory-backend-ram,id=mem3,size=256M \ > -object memory-backend-ram,id=mem4,size=256M \ > -object memory-backend-ram,id=mem5,size=768M \ > The kexec/crash issue has nothing to do with this bug and its fixes. I think there is no enough memory in NUMA nodes (128MB capacity) for kexec/crash to reserve memory from. Zhenyu, Lets create another bug, asking kexec/crash developer to do some investigations. Thanks, Gavin With qemu-kvm-7.0.0-4.el9 -m 2048M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=256M \ -object memory-backend-ram,id=mem1,size=256M \ -object memory-backend-ram,id=mem2,size=256M \ -object memory-backend-ram,id=mem3,size=256M \ -object memory-backend-ram,id=mem4,size=256M \ -object memory-backend-ram,id=mem5,size=768M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 \ (qemu) info numa 6 nodes node 0 cpus: 0 1 2 node 0 size: 256 MB node 0 plugged: 0 MB node 1 cpus: 3 4 5 node 1 size: 256 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 256 MB node 2 plugged: 0 MB node 3 cpus: node 3 size: 256 MB node 3 plugged: 0 MB node 4 cpus: node 4 size: 256 MB node 4 plugged: 0 MB node 5 cpus: node 5 size: 768 MB node 5 plugged: 0 MB (qemu) info status VM status: running [ 0.000000] smccc: KVM: hypervisor services detected (0x00000000 0x00000000 0x00000000 0x00000003) [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x4 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x5 -> Node 1 /usr/libexec/qemu-kvm -cpu host \ -smp 8,sockets=2,clusters=2,cores=2,threads=1 \ -m 8192 \ -object memory-backend-ram,size=1024M,id=mem-mem0 \ -object memory-backend-ram,size=3072M,id=mem-mem1 \ -object memory-backend-ram,size=1024M,id=mem-mem2 \ -object memory-backend-ram,size=3072M,id=mem-mem3 \ -numa node,memdev=mem-mem0,nodeid=0 \ -numa node,memdev=mem-mem1,nodeid=1 \ -numa node,memdev=mem-mem2,nodeid=2 \ -numa node,memdev=mem-mem3,nodeid=3 \ -numa cpu,node-id=0,socket-id=0,cluster-id=0,core-id=0,thread-id=0 \ -numa cpu,node-id=0,socket-id=0,cluster-id=0,core-id=1,thread-id=0 \ -numa cpu,node-id=1,socket-id=0,cluster-id=1,core-id=0,thread-id=0 \ -numa cpu,node-id=1,socket-id=0,cluster-id=1,core-id=1,thread-id=0 \ -numa cpu,node-id=2,socket-id=1,cluster-id=0,core-id=0,thread-id=0 \ -numa cpu,node-id=2,socket-id=1,cluster-id=0,core-id=1,thread-id=0 \ -numa cpu,node-id=3,socket-id=1,cluster-id=1,core-id=0,thread-id=0 \ -numa cpu,node-id=3,socket-id=1,cluster-id=1,core-id=1,thread-id=0 \ -monitor stdio (qemu) info numa 4 nodes node 0 cpus: 0 1 node 0 size: 1024 MB node 0 plugged: 0 MB node 1 cpus: 2 3 node 1 size: 3072 MB node 1 plugged: 0 MB node 2 cpus: 4 5 node 2 size: 1024 MB node 2 plugged: 0 MB node 3 cpus: 6 7 node 3 size: 3072 MB node 3 plugged: 0 MB Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7967 |