Bug 2041823

Summary:	[aarch64][numa] When there are at least 6 Numa nodes serial log shows 'arch topology borken'
Product:	Red Hat Enterprise Linux 9	Reporter:	Zhenyu Zhang <zhenyzha>
Component:	qemu-kvm	Assignee:	Guowen Shan <gshan>
qemu-kvm sub component:	General	QA Contact:	Zhenyu Zhang <zhenyzha>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	cohuck, eric.auger, gshan, jinzhao, juzhang, lijin, mrezanin, virt-maint, yihyu
Version:	9.1	Keywords:	Triaged
Target Milestone:	rc
Target Release:	9.1
Hardware:	aarch64
OS:	Linux
Whiteboard:
Fixed In Version:	qemu-kvm-7.0.0-4.el9	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-11-15 09:53:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:	QEMU v7.1
Embargoed:
Bug Depends On:
Bug Blocks:	1924294, 2042780

Description Zhenyu Zhang 2022-01-18 10:45:41 UTC

Description of problem:
when I used -m 4096, numa nodes = 32, node size = 128M 
Check the dmesg find the following error:
[    0.098080] BUG: arch topology borken
[    0.098088]      the CLS domain not a subset of the MC domain

Version-Release number of selected component (if applicable):
Host Distro: RHEL-9.0.0-20220115.2
Host Kernel: kernel-5.14.0-42.el9.aarch64
Guest Kernel: kernel-5.14.0-39.el9.aarch64
qemu-kvm: qemu-kvm-6.2.0-3.el9

How reproducible:
100%

Steps to Reproduce:
1. Boot guest with 32 numa nodes, 128M node size.
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \
-blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel900-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \
-machine virt,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0 \
-m 4096 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem0 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem1 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem2 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem3 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem4 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem5 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem6 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem7 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem8 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem9 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem10 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem11 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem12 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem13 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem14 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem15 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem16 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem17 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem18 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem19 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem20 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem21 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem22 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem23 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem24 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem25 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem26 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem27 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem28 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem29 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem30 \
-object memory-backend-ram,size=128M,prealloc=yes,policy=default,id=mem-mem31  \
-smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
-numa node,memdev=mem-mem0  \
-numa node,memdev=mem-mem1  \
-numa node,memdev=mem-mem2  \
-numa node,memdev=mem-mem3  \
-numa node,memdev=mem-mem4  \
-numa node,memdev=mem-mem5  \
-numa node,memdev=mem-mem6  \
-numa node,memdev=mem-mem7  \
-numa node,memdev=mem-mem8  \
-numa node,memdev=mem-mem9  \
-numa node,memdev=mem-mem10  \
-numa node,memdev=mem-mem11  \
-numa node,memdev=mem-mem12  \
-numa node,memdev=mem-mem13  \
-numa node,memdev=mem-mem14  \
-numa node,memdev=mem-mem15  \
-numa node,memdev=mem-mem16  \
-numa node,memdev=mem-mem17  \
-numa node,memdev=mem-mem18  \
-numa node,memdev=mem-mem19  \
-numa node,memdev=mem-mem20  \
-numa node,memdev=mem-mem21  \
-numa node,memdev=mem-mem22  \
-numa node,memdev=mem-mem23  \
-numa node,memdev=mem-mem24  \
-numa node,memdev=mem-mem25  \
-numa node,memdev=mem-mem26  \
-numa node,memdev=mem-mem27  \
-numa node,memdev=mem-mem28  \
-numa node,memdev=mem-mem29  \
-numa node,memdev=mem-mem30  \
-numa node,memdev=mem-mem31  \
-cpu 'host' \
-chardev socket,wait=off,path=/tmp/monitor-qmpmonitor1-20220118-050031-5juhyG2n,id=qmp_id_qmpmonitor1,server=on  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,wait=off,path=/tmp/monitor-catch_monitor-20220118-050031-5juhyG2n,id=qmp_id_catch_monitor,server=on  \
-mon chardev=qmp_id_catch_monitor,mode=control  \
-serial unix:'/tmp/serial-serial0-20220118-050031-5juhyG2n',server=on,wait=off \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
-device virtio-net-pci,mac=9a:c0:c9:f8:37:e8,rombar=0,id=idWzl2R8,netdev=idFcrgPw,bus=pcie-root-port-4,addr=0x0  \
-netdev tap,id=idFcrgPw,vhost=on  \
-vnc :20  \
-rtc base=utc,clock=host,driftfix=slew \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x2,chassis=6 \
-device pcie-root-port,id=pcie_extra_root_port_1,addr=0x2.0x1,bus=pcie.0,chassis=7 \
-monitor stdio 

2.Check the serial log find the following error:
2022-01-17 02:43:22: [    0.088356] smp: Brought up 32 nodes, 6 CPUs
2022-01-17 02:43:22: [    0.088401] SMP: Total of 6 processors activated.
2022-01-17 02:43:22: [    0.088405] CPU features: detected: 32-bit EL0 Support
2022-01-17 02:43:22: [    0.088408] CPU features: detected: 32-bit EL1 Support
2022-01-17 02:43:22: [    0.088411] CPU features: detected: CRC32 instructions
2022-01-17 02:43:22: [    0.096400] CPU: All CPU(s) started at EL1
2022-01-17 02:43:22: [    0.096459] alternatives: patching kernel code
2022-01-17 02:43:22: [    0.097015] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097024]      the CLS domain not a subset of the MC domain
2022-01-17 02:43:22: [    0.097029] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097031]      the MC domain not a subset of the DIE domain
2022-01-17 02:43:22: [    0.097035] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097037]      the DIE domain not a subset of the NODE domain
2022-01-17 02:43:22: [    0.097045] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097048]      the CLS domain not a subset of the MC domain
2022-01-17 02:43:22: [    0.097052] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097054]      the MC domain not a subset of the DIE domain
2022-01-17 02:43:22: [    0.097058] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097060]      the DIE domain not a subset of the NODE domain
2022-01-17 02:43:22: [    0.097067] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097070]      the CLS domain not a subset of the MC domain
2022-01-17 02:43:22: [    0.097074] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097076]      the MC domain not a subset of the DIE domain
2022-01-17 02:43:22: [    0.097079] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097082]      the DIE domain not a subset of the NODE domain
2022-01-17 02:43:22: [    0.097089] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097092]      the CLS domain not a subset of the MC domain
2022-01-17 02:43:22: [    0.097096] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097098]      the MC domain not a subset of the DIE domain
2022-01-17 02:43:22: [    0.097102] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097104]      the DIE domain not a subset of the NODE domain
2022-01-17 02:43:22: [    0.097112] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097114]      the CLS domain not a subset of the MC domain
2022-01-17 02:43:22: [    0.097118] BUG: arch topology borken
2022-01-17 02:43:22: [    0.097120]      the MC domain not a subset of the DIE domain
2022-01-17 02:43:22: [    0.097124] BUG: arch topology borken
2022-01-17 02:43:23: [    0.097126]      the DIE domain not a subset of the NODE domain
2022-01-17 02:43:23: [    0.097133] BUG: arch topology borken
2022-01-17 02:43:23: [    0.097136]      the CLS domain not a subset of the MC domain
2022-01-17 02:43:23: [    0.097139] BUG: arch topology borken
2022-01-17 02:43:23: [    0.097142]      the MC domain not a subset of the DIE domain
2022-01-17 02:43:23: [    0.097145] BUG: arch topology borken
2022-01-17 02:43:23: [    0.097148]      the DIE domain not a subset of the NODE domain
2022-01-17 02:43:23: [    0.098719] devtmpfs: initialized
2022-01-17 02:43:23: [    0.100945] KASLR disabled due to lack of seed
2022-01-17 02:43:23: [    0.101097] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
2022-01-17 02:43:23: [    0.101139] futex hash table entries: 2048 (order: 5, 131072 bytes, vmalloc)
2022-01-17 02:43:23: [    0.101555] pinctrl core: initialized pinctrl subsystem

Actual results:
BUG: arch topology borken

Expected results:
no error message

Additional info:
The detailed log:
http://pastebin.test.redhat.com/1021731

Comment 1 Zhenyu Zhang 2022-01-19 00:26:23 UTC

Updated information

with -m 4096, numa nodes = 16, node size = 256M hit this issue too 
And  -m 4096, numa nodes = 8, node size = 512M hit this issue too 

But when I with -m 4096, numa nodes = 4, node size = 1024M guest boot failed,
The issue can still be reproduced stably after using the new guest image.

Guest cmd:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \
-blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel900-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \
-machine virt,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0 \
-m 4096 \
-object memory-backend-ram,size=1024M,prealloc=yes,policy=default,id=mem-mem0 \
-object memory-backend-ram,size=1024M,prealloc=yes,policy=default,id=mem-mem1 \
-object memory-backend-ram,size=1024M,prealloc=yes,policy=default,id=mem-mem2 \
-object memory-backend-ram,size=1024M,prealloc=yes,policy=default,id=mem-mem3  \
-smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
-numa node,memdev=mem-mem0  \
-numa node,memdev=mem-mem1  \
-numa node,memdev=mem-mem2  \
-numa node,memdev=mem-mem3  \
-cpu 'host' \
-chardev socket,path=/tmp/monitor-qmpmonitor1-20220118-074539-Gc1DtNjK,id=qmp_id_qmpmonitor1,server=on,wait=off  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,path=/tmp/monitor-catch_monitor-20220118-074539-Gc1DtNjK,id=qmp_id_catch_monitor,server=on,wait=off  \
-mon chardev=qmp_id_catch_monitor,mode=control  \
-serial unix:'/tmp/serial-serial0-20220118-074539-Gc1DtNjK',server=on,wait=off \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
-device virtio-net-pci,mac=9a:0e:b5:fd:40:f5,rombar=0,id=idQDTQKP,netdev=idWRRjx5,bus=pcie-root-port-4,addr=0x0  \
-netdev tap,id=idWRRjx5,vhost=on  \
-vnc :20  \
-rtc base=utc,clock=host,driftfix=slew \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x2,chassis=6 \
-device pcie-root-port,id=pcie_extra_root_port_1,addr=0x2.0x1,bus=pcie.0,chassis=7 

Boot failed, the serial log:
2022-01-18 07:45:46: UEFI firmware starting.
2022-01-18 07:45:46: ��
2022-01-18 07:45:46: [2J[01;01H[=3h[2J[01;01H
2022-01-18 07:45:54: BdsDxe: loading Boot0001 "Red Hat Enterprise Linux" from HD(1,GPT,3232CC39-E802-424E-BC21-7F83D8EE32D4,0x800,0x12C000)/\EFI\redhat\shimaa64.efi
2022-01-18 07:45:54: BdsDxe: starting Boot0001 "Red Hat Enterprise Linux" from HD(1,GPT,3232CC39-E802-424E-BC21-7F83D8EE32D4,0x800,0x12C000)/\EFI\redhat\shimaa64.efi
2022-01-18 07:45:54: error: ../../grub-core/term/serial.c:217:serial port `com0' isn't found.
2022-01-18 07:45:54: error: ../../grub-core/commands/terminal.c:138:terminal `serial' isn't found.
2022-01-18 07:45:54: error: ../../grub-core/commands/terminal.c:138:terminal `serial' isn't found.
2022-01-18 07:45:54: [0m[30m[40m[2J[01;01H[0m[37m[40m[20;07HUse the ^ and v keys to change the selection.
2022-01-18 07:45:54:       Press 'e' to edit the selected item, or 'c' for a command prompt.
2022-01-18 07:45:55:       Press Escape to return to the previous menu.                        [04;80H [0m[30m[47m[04;01H      Red Hat Enterprise Linux (5.14.0-39.el9.aarch64) 9.0 (Plow)              [0m[37m[40m[04;79H[05;01H      Red Hat Enterprise Linux (0-rescue-26908f3c220344fab03b4021ec7b9d0c) 9.0>
2022-01-18 07:45:55: [06;01H[05;79H[06;01H      UEFI Firmware Settings                                                   [06;79H[07;01H                                                                               [07;79H[08;01H                                                                               [08;79H[09;01H                                                                               [09;79H[10;01H                                                                               [10;79H[11;01H                                                                               [11;79H[12;01H                                                                               [12;79H[13;01H                                                                               [13;79H[14;01H                                                                               [14;79H[15;01H                                                                               [15;79H[16;01H                                                                               [16;79H[17;01H                                                                               [17;79H[18;01H                                                                               [18;79H[18;80H [04;79H[23;01H   The selected entry will be started automatically in 5s.                     [04;79H
2022-01-18 07:45:56: [23;01H   The selected entry will be started automatically in 4s.                     [04;79H
2022-01-18 07:45:57: [23;01H   The selected entry will be started automatically in 3s.                     [04;79H
2022-01-18 07:45:58: [23;01H   The selected entry will be started automatically in 2s.                     [04;79H
2022-01-18 07:45:59: [23;01H   The selected entry will be started automatically in 1s.                     [04;79H
2022-01-18 07:46:00: [23;01H   The selected entry will be started automatically in 0s.                     [04;79H[0m[30m[40m[2J[01;01H[0m[37m[40m[0m[30m[40m[2J[01;01H[0m[37m[40m
2022-01-18 07:46:02: EFI stub: Booting Linux Kernel...
2022-01-18 07:46:02: EFI stub: EFI_RNG_PROTOCOL unavailable
2022-01-18 07:46:02: EFI stub: Using DTB from configuration table
2022-01-18 07:46:02: EFI stub: Exiting boot services and installing virtual address map...

Comment 2 Zhenyu Zhang 2022-01-20 03:44:29 UTC

Update test results:

Use the latest qemu version(qemu-kvm-6.2.0-4.el9), both issues can still be triggered

1. When there are at least 6 Numa nodes serial log shows 'arch topology borken'
cmd:
    -m 3072 \
    -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem0 \
    -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem1 \
    -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem2 \
    -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem3 \
    -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem4 \
    -object memory-backend-ram,policy=default,prealloc=yes,size=512M,id=mem-mem5  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
    -numa node,memdev=mem-mem0  \
    -numa node,memdev=mem-mem1  \
    -numa node,memdev=mem-mem2  \
    -numa node,memdev=mem-mem3  \
    -numa node,memdev=mem-mem4  \
    -numa node,memdev=mem-mem5  \

serial log:
2022-01-19 22:25:19: [    0.313488] smp: Brought up 6 nodes, 6 CPUs
2022-01-19 22:25:19: [    0.313555] SMP: Total of 6 processors activated.
2022-01-19 22:25:19: [    0.313562] CPU features: detected: CRC32 instructions
2022-01-19 22:25:19: [    0.313566] CPU features: detected: LSE atomic instructions
2022-01-19 22:25:19: [    0.313569] CPU features: detected: Privileged Access Never
2022-01-19 22:25:19: [    0.313572] CPU features: detected: RAS Extension Support
2022-01-19 22:25:19: [    0.344952] CPU: All CPU(s) started at EL1
2022-01-19 22:25:19: [    0.345023] alternatives: patching kernel code
2022-01-19 22:25:19: [    0.346008] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346016]      the CLS domain not a subset of the MC domain
2022-01-19 22:25:19: [    0.346021] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346023]      the MC domain not a subset of the DIE domain
2022-01-19 22:25:19: [    0.346028] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346030]      the DIE domain not a subset of the NODE domain
2022-01-19 22:25:19: [    0.346039] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346041]      the CLS domain not a subset of the MC domain
2022-01-19 22:25:19: [    0.346045] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346047]      the MC domain not a subset of the DIE domain
2022-01-19 22:25:19: [    0.346051] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346053]      the DIE domain not a subset of the NODE domain
2022-01-19 22:25:19: [    0.346061] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346063]      the CLS domain not a subset of the MC domain
2022-01-19 22:25:19: [    0.346067] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346069]      the MC domain not a subset of the DIE domain
2022-01-19 22:25:19: [    0.346073] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346076]      the DIE domain not a subset of the NODE domain
2022-01-19 22:25:19: [    0.346084] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346086]      the CLS domain not a subset of the MC domain
2022-01-19 22:25:19: [    0.346090] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346092]      the MC domain not a subset of the DIE domain
2022-01-19 22:25:19: [    0.346096] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346097]      the DIE domain not a subset of the NODE domain
2022-01-19 22:25:19: [    0.346105] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346107]      the CLS domain not a subset of the MC domain
2022-01-19 22:25:19: [    0.346111] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346113]      the MC domain not a subset of the DIE domain
2022-01-19 22:25:19: [    0.346117] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346118]      the DIE domain not a subset of the NODE domain
2022-01-19 22:25:19: [    0.346126] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346128]      the CLS domain not a subset of the MC domain
2022-01-19 22:25:19: [    0.346132] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346134]      the MC domain not a subset of the DIE domain
2022-01-19 22:25:19: [    0.346138] BUG: arch topology borken
2022-01-19 22:25:19: [    0.346140]      the DIE domain not a subset of the NODE domain

2. When there are at 3-5 Numa nodes, the guest hangs boot failed
For the specific cmd please see comment1.


Hello Guowen,

Is the error about comment1 related to this bug? 
Do we need a new bug to track it down?

Comment 3 Guowen Shan 2022-01-20 05:25:57 UTC

hmm, Zhenyu, I think they're different issues. Please create another bug
to track the issue reported from comment#1 and assign to me. Lets use this
one to track the broken CPU topology issue.

Comment 4 Zhenyu Zhang 2022-01-20 07:04:03 UTC

(In reply to Guowen Shan from comment #3)
> hmm, Zhenyu, I think they're different issues. Please create another bug
> to track the issue reported from comment#1 and assign to me. Lets use this
> one to track the broken CPU topology issue.


Got it, Guowen thanks for the reminder, create the following bug to track the comment#1 issue.
Bug 2042780 - [aarch64][numa] When there are at 3-5 Numa nodes, the guest hangs boot failed

Comment 5 Guowen Shan 2022-01-21 02:11:31 UTC

The issue can be reproduced successfully with the following steps. Note that
the issue is existing on upstream qemu as well.

  (1) Provision machine with last RHEL-9.0.0 build from beaker.

  (2) Install qemu-kvm/libvirt/virt-install. After that, install VM1

      host# virt-install -n vm1 --vcpus 24 --memory 8192 --disk size=36 --network bridge=br0 \
            -l http://10.19.43.4/rhel-9/nightly/RHEL-9/latest-RHEL-9.0.0/compose/BaseOS/aarch64/os/

  (3) Shutdown vm1 and start it via command lines. Note that we have 6 NUMA nodes.
      host# virsh shutdown vm1
      host# cd /home/gavin/sandbox/images; ln -s disk.qcow2 vm1.qcow2
      host# /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                   \
            -accel kvm -machine virt,gic-version=host                                 \
            -cpu host -smp 6,sockets=2,cores=3,threads=1                              \
            -m 1024M,slots=16,maxmem=64G                                              \
            -object memory-backend-ram,id=mem0,size=128M                              \
            -object memory-backend-ram,id=mem1,size=128M                              \
            -object memory-backend-ram,id=mem2,size=128M                              \
            -object memory-backend-ram,id=mem3,size=128M                              \
            -object memory-backend-ram,id=mem4,size=128M                              \
            -object memory-backend-ram,id=mem5,size=384M                              \
            -numa node,nodeid=0,memdev=mem0                                           \
            -numa node,nodeid=1,memdev=mem1                                           \
            -numa node,nodeid=2,memdev=mem2                                           \
            -numa node,nodeid=3,memdev=mem3                                           \
            -numa node,nodeid=4,memdev=mem4                                           \
            -numa node,nodeid=5,memdev=mem5                                           \
            -L /home/gavin/sandbox/qemu.main/build/pc-bios                            \
            -monitor none -serial mon:stdio -nographic -gdb tcp::1234                 \
            -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd    \
            -boot c -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1             \
            -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2                     \
            -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3                     \
            -device pcie-root-port,bus=pcie.0,chassis=4,id=pcie.4                     \
            -device pcie-root-port,bus=pcie.0,chassis=5,id=pcie.5                     \
            -device pcie-root-port,bus=pcie.0,chassis=6,id=pcie.6                     \
            -device pcie-root-port,bus=pcie.0,chassis=7,id=pcie.7                     \
            -device pcie-root-port,bus=pcie.0,chassis=8,id=pcie.8                     \
            -device pcie-root-port,bus=pcie.0,chassis=9,id=pcie.9                     \
            -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0       \
            -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4   \
            -device virtio-net-pci,bus=pcie.6,netdev=unet,mac=52:54:00:f1:26:a0       \
            -netdev user,id=unet,hostfwd=tcp::50959-:22                               \
            -device virtio-balloon-pci,id=balloon0,bus=pcie.7,free-page-reporting=yes \
            -device pvpanic-pci,bus=pcie.8

            :
           EFI stub: Booting Linux Kernel...
           EFI stub: EFI_RNG_PROTOCOL unavailable
           EFI stub: Using DTB from configuration table
           EFI stub: Exiting boot services and installing virtual address map...
           SetUefiImageMemoryAttributes - 0x0000000047500000 - 0x0000000000040000 (0x0000000000000008)
           SetUefiImageMemoryAttributes - 0x0000000044190000 - 0x0000000000040000 (0x0000000000000008)
           SetUefiImageMemoryAttributes - 0x0000000044140000 - 0x0000000000040000 (0x0000000000000008)
           SetUefiImageMemoryAttributes - 0x00000000474C0000 - 0x0000000000030000 (0x0000000000000008)
           SetUefiImageMemoryAttributes - 0x00000000440F0000 - 0x0000000000040000 (0x0000000000000008)
           SetUefiImageMemoryAttributes - 0x0000000043FB0000 - 0x0000000000040000 (0x0000000000000008)
           SetUefiImageMemoryAttributes - 0x0000000043E00000 - 0x0000000000030000 (0x0000000000000008)
           SetUefiImageMemoryAttributes - 0x0000000043DC0000 - 0x0000000000030000 (0x0000000000000008)
           :
           alternatives: patching kernel code
           BUG: arch topology borken
                the CLS domain not a subset of the MC domain
           <the above error log repeats>
           BUG: arch topology borken
           the DIE domain not a subset of the NODE domain

Comment 6 Guowen Shan 2022-01-25 08:06:38 UTC

The root cause is invalid command lines used to start the guest. There
are several topology levels seen by CPU scheduler in RHEL9.0: SMT,
CLUSTER, MULTI-CORE, DIE, NUMA-NODE. Their granularities should be
enlarged and the guest kernel spews the warning message if the rule
is violated.

Taking the command lines used to reproduce the issue in comment#5,
as below. There is conflicting configuration between 2 sockets and
6 NUMA nodes. I mean it's impossible for 6 NUMA nodes to be seated in
two sockets.

   -cpu host -smp 6,sockets=2,cores=3,threads=1                              \
   -m 1024M,slots=16,maxmem=64G                                              \
   -object memory-backend-ram,id=mem0,size=128M                              \
   -object memory-backend-ram,id=mem1,size=128M                              \
   -object memory-backend-ram,id=mem2,size=128M                              \
   -object memory-backend-ram,id=mem3,size=128M                              \
   -object memory-backend-ram,id=mem4,size=128M                              \
   -object memory-backend-ram,id=mem5,size=384M                              \
   -numa node,nodeid=0,memdev=mem0                                           \
   -numa node,nodeid=1,memdev=mem1                                           \
   -numa node,nodeid=2,memdev=mem2                                           \
   -numa node,nodeid=3,memdev=mem3                                           \
   -numa node,nodeid=4,memdev=mem4                                           \
   -numa node,nodeid=5,memdev=mem5                                           \

With the above invalid command lines, the topology seen by CPU scheduler
is like below and it's invalid. The topology is broken from CLUSTER to
NUMA-NODE.

   CPU    SMT       CLUSTER       MULTI-CORE      DIE          NUMA-NODE
   ----------------------------------------------------------------------
    0     0         0,1,2         0               0            0
    1     1         0,1,2         1               1            1
    2     2         0,1,2         2               2            2
    3     3         3,4,5         3               3            3
    4     4         3,4,5         4               4            4
    5     5         3,4,5         5               5            5

With the corrected command line as below, the CPU topology broken warning
disappears. The key is number of sockets should match with that of NUMA
nodes.
 
   -cpu host -smp 6,sockets=6,cores=1,threads=1                              \
   -m 1024M,slots=16,maxmem=64G                                              \
   -object memory-backend-ram,id=mem0,size=128M                              \
   -object memory-backend-ram,id=mem1,size=128M                              \
   -object memory-backend-ram,id=mem2,size=128M                              \
   -object memory-backend-ram,id=mem3,size=128M                              \
   -object memory-backend-ram,id=mem4,size=128M                              \
   -object memory-backend-ram,id=mem5,size=384M                              \
   -numa node,nodeid=0,memdev=mem0                                           \
   -numa node,nodeid=1,memdev=mem1                                           \
   -numa node,nodeid=2,memdev=mem2                                           \
   -numa node,nodeid=3,memdev=mem3                                           \
   -numa node,nodeid=4,memdev=mem4                                           \
   -numa node,nodeid=5,memdev=mem5                                           \

   CPU    SMT       CLUSTER       MULTI-CORE      DIE          NUMA-NODE
   ----------------------------------------------------------------------
    0     0         0             0               0            0
    1     1         1             1               1            1
    2     2         2             2               2            2
    3     3         3             3               3            3
    4     4         4             4               4            4
    5     5         5             5               5            5

Comment 9 Guowen Shan 2022-01-25 13:46:30 UTC

(in reply to Zhenyu from comment#8)

Zhenyu, With the command lines you used, the CPU topology seen by the guest
kernel should be like below. It's broken.

   CPU    SMT       CLUSTER       MULTI-CORE      DIE          NUMA-NODE
   ----------------------------------------------------------------------
    0     0         0             0,1             0            0,5
    1     1         1             0,1             1            1,6
    2     2         2             2,3             2            2,7
    3     3         3             2,3             3            3,8
    4     4         4             4,5             4            4,9
    5     5         5             4,5             5            0,5
    6     6         6             6,7             0            1,6
    7     7         7             6,7             1            2,7
    8     8         8             8,9             2            3,8
    9     9         9             8,9             3            4,9

The reason is that the default NUMA node ID is assigned to the CPUs
if their association isn't provided explicitly. The default node ID
is calculated simply by (cpu_index % numa_node_num), meaning the
factors related to SOCKET/DIE/MULTI-CORE/CLUSTER aren't considered
at all in qemu/hw/arm/virt.c::virt_get_default_cpu_node_id(). 

We need consider SOCKET/DIE/MULTI-CORE/CLUSTER in virt_get_default_cpu_node_id().
I will figure out the fix and post patch to upstream community for
review.

Comment 10 Guowen Shan 2022-01-26 05:27:48 UTC

With the attached patch applied, all the following command lines work well
and no CPU topology broken warnings are seen any more. I will post the patch
to upstream community for review. 

Besides, I think this needs defer to RHEL9.1 as the development cycle on RHEL9.0
is going to be completed pretty soon.

   -smp 6,sockets=2,cores=3,threads=1            \
   -m 1024M,slots=16,maxmem=64G                  \
   -object memory-backend-ram,id=mem0,size=128M  \
   -object memory-backend-ram,id=mem1,size=128M  \
   -object memory-backend-ram,id=mem2,size=128M  \
   -object memory-backend-ram,id=mem3,size=128M  \
   -object memory-backend-ram,id=mem4,size=128M  \
   -object memory-backend-ram,id=mem4,size=384M  \
   -numa node,nodeid=0,memdev=mem0               \
   -numa node,nodeid=1,memdev=mem1               \
   -numa node,nodeid=2,memdev=mem2               \
   -numa node,nodeid=3,memdev=mem3               \
   -numa node,nodeid=4,memdev=mem4               \
   -numa node,nodeid=5,memdev=mem5
          :
   [    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0  <<< 6 CPUs are assocaited with 2 nodes evenly
   [    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0
   [    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2 -> Node 0
   [    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1
   [    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x4 -> Node 1
   [    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x5 -> Node 1


   -smp 10,sockets=5,cores=2,threads=1            \
   -m 1024M,slots=16,maxmem=64G                   \
   -object memory-backend-ram,id=mem0,size=128M   \
   -object memory-backend-ram,id=mem1,size=128M   \
   -object memory-backend-ram,id=mem2,size=128M   \
   -object memory-backend-ram,id=mem3,size=128M   \
   -object memory-backend-ram,id=mem4,size=128M  \
   -object memory-backend-ram,id=mem4,size=384M  \
   -numa node,nodeid=0,memdev=mem0               \
   -numa node,nodeid=1,memdev=mem1               \
   -numa node,nodeid=2,memdev=mem2               \
   -numa node,nodeid=3,memdev=mem3               \
   -numa node,nodeid=4,memdev=mem4               \
   -numa node,nodeid=5,memdev=mem5
          :
   [    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0   <<< 10 CPUs are associated with 5 nodes evenly.
   [    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0
   [    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x2 -> Node 1
   [    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1
   [    0.000000] ACPI: NUMA: SRAT: PXM 2 -> MPIDR 0x4 -> Node 2
   [    0.000000] ACPI: NUMA: SRAT: PXM 2 -> MPIDR 0x5 -> Node 2
   [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x6 -> Node 3
   [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x7 -> Node 3
   [    0.000000] ACPI: NUMA: SRAT: PXM 4 -> MPIDR 0x8 -> Node 4
   [    0.000000] ACPI: NUMA: SRAT: PXM 4 -> MPIDR 0x9 -> Node 4

Comment 12 Guowen Shan 2022-01-26 05:44:48 UTC

The attached patch also fixes bug 2042780. v1 patch was posted for review.

https://lists.nongnu.org/archive/html/qemu-arm/2022-01/msg00269.html

Comment 13 Guowen Shan 2022-04-29 13:01:08 UTC

The last posted upstream series is v8, pending for review or merge.

https://lists.nongnu.org/archive/html/qemu-arm/2022-04/msg00623.html

Comment 14 Guowen Shan 2022-05-09 22:17:22 UTC

The series is merged to upstream QEMU v7.1. Lets see if QEMU-RHEL9.1 has chance
to be rebased to upstream QEMU-v7.1.

ae9141d4a3 hw/acpi/aml-build: Use existing CPU topology to build PPTT table
4c18bc1923 hw/arm/virt: Fix CPU's default NUMA node ID
e280ecb39b qtest/numa-test: Correct CPU and NUMA association in aarch64_numa_cpu()
c9ec4cb5e4 hw/arm/virt: Consider SMP configuration in CPU topology
ac7199a252 qtest/numa-test: Specify CPU topology in aarch64_numa_cpu()
1dcf7001d4 qapi/machine.json: Add cluster-id

Comment 15 Luiz Capitulino 2022-05-10 18:10:48 UTC

Hi Gavin,

Congrats on getting the series merged! We won't rebase QEMU in 9.1 (ie. we'll keep v7.0), so you'll need to backport the series.

Comment 16 Guowen Shan 2022-05-11 03:36:21 UTC

(In reply to Luiz Capitulino from comment #15)
> Congrats on getting the series merged! We won't rebase QEMU in 9.1 (ie.
> we'll keep v7.0), so you'll need to backport the series.

Luiz, thanks. Lets create an MR to backport them into our downstream
QEMU for RHEL9.1.

Thanks,
Gavin

Comment 23 Zhenyu Zhang 2022-05-12 06:06:21 UTC

Hello Gavin

With below cmd, 'arch topology borken' was resolved, but hit another error.
kdump.service - Crash recovery kernel arming
kdump: No memory reserved for crash kernel
Is this as expected?

qemu-kvm-7.0.0-2.el9.gwshan202205111802
host kernel: 5.14.0-85.el9.aarch64
guest kernel: 5.14.0-86.el9.aarch64


-smp 6,sockets=2,cores=3,threads=1                      \
-m 1024M,slots=16,maxmem=64G                            \
-object memory-backend-ram,id=mem0,size=128M            \
-object memory-backend-ram,id=mem1,size=128M            \
-object memory-backend-ram,id=mem2,size=128M            \
-object memory-backend-ram,id=mem3,size=128M            \
-object memory-backend-ram,id=mem4,size=128M            \
-object memory-backend-ram,id=mem5,size=384M            \
-numa node,nodeid=0,memdev=mem0                         \
-numa node,nodeid=1,memdev=mem1                         \
-numa node,nodeid=2,memdev=mem2                         \
-numa node,nodeid=3,memdev=mem3                         \
-numa node,nodeid=4,memdev=mem4                         \
-numa node,nodeid=5,memdev=mem5                         \

(qemu) info numa
6 nodes
node 0 cpus: 0 1 2
node 0 size: 128 MB
node 0 plugged: 0 MB
node 1 cpus: 3 4 5
node 1 size: 128 MB
node 1 plugged: 0 MB
node 2 cpus:
node 2 size: 128 MB
node 2 plugged: 0 MB
node 3 cpus:
node 3 size: 128 MB
node 3 plugged: 0 MB
node 4 cpus:
node 4 size: 128 MB
node 4 plugged: 0 MB
node 5 cpus:
node 5 size: 384 MB
node 5 plugged: 0 MB

[root@dhcp19-243-226 ~]# numactl --hardware
available: 6 nodes (0-5)
node 0 cpus: 0 1 2
node 0 size: 83 MB
node 0 free: 5 MB
node 1 cpus: 3 4 5
node 1 size: 125 MB
node 1 free: 10 MB
node 2 cpus:
node 2 size: 125 MB
node 2 free: 8 MB
node 3 cpus:
node 3 size: 125 MB
node 3 free: 8 MB
node 4 cpus:
node 4 size: 125 MB
node 4 free: 11 MB
node 5 cpus:
node 5 size: 367 MB
node 5 free: 29 MB
node distances:
node   0   1   2   3   4   5
  0:  10  20  20  20  20  20
  1:  20  10  20  20  20  20
  2:  20  20  10  20  20  20
  3:  20  20  20  10  20  20
  4:  20  20  20  20  10  20
  5:  20  20  20  20  20  10



[    0.000000] smccc: KVM: hypervisor services detected (0x00000000 0x00000000 0x00000000 0x00000003)
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1
[    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x4 -> Node 1
[    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x5 -> Node 1
[    0.000000] percpu: Embedded 31 pages/cpu s86680 r8192 d32104 u126976
[    0.000000] pcpu-alloc: s86680 r8192 d32104 u126976 alloc=31*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [1] 3 [1] 4 [1] 5
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] CPU features: detected: Spectre-v2
[    0.000000] CPU features: kernel page table isolation forced ON by KASLR
[    0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[    0.000000] Built 6 zonelists, mobility grouping on.  Total pages: 258048




[root@dhcp19-243-226 ~]# systemctl status kdump.service

lines 1-13/13 (END)
× kdump.service - Crash recovery kernel arming
     Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2022-05-12 13:43:27 CST; 4min 3s ago
    Process: 1106 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
   Main PID: 1106 (code=exited, status=1/FAILURE)
        CPU: 29ms

May 12 13:43:27 localhost.localdomain systemd[1]: Starting Crash recovery kernel arming...
May 12 13:43:27 localhost.localdomain kdumpctl[1110]: kdump: No memory reserved for crash kernel
May 12 13:43:27 localhost.localdomain kdumpctl[1110]: kdump: Starting kdump: [FAILED]
May 12 13:43:27 localhost.localdomain systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
May 12 13:43:27 localhost.localdomain systemd[1]: kdump.service: Failed with result 'exit-code'.
May 12 13:43:27 localhost.localdomain systemd[1]: Failed to start Crash recovery kernel arming.


[root@dhcp19-243-226 ~]# vim /var/log/messages
......
May 12 13:43:27 localhost kdumpctl[1110]: kdump: No memory reserved for crash kernel
May 12 13:43:27 localhost kdumpctl[1110]: kdump: Starting kdump: [FAILED]
May 12 13:43:27 localhost systemd[1]: Started Command Scheduler.
May 12 13:43:27 localhost systemd[1]: Starting GNOME Display Manager...
May 12 13:43:27 localhost systemd[1]: Starting Hold until boot process finishes up...
May 12 13:43:27 localhost systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
May 12 13:43:27 localhost systemd[1]: kdump.service: Failed with result 'exit-code'.
May 12 13:43:27 localhost systemd[1]: Failed to start Crash recovery kernel arming.
.......

Comment 24 Zhenyu Zhang 2022-05-12 06:15:49 UTC

(In reply to Zhenyu Zhang from comment #23)
> Hello Gavin
> 
> With below cmd, 'arch topology borken' was resolved, but hit another error.
> kdump.service - Crash recovery kernel arming
> kdump: No memory reserved for crash kernel
> Is this as expected?

Seems to be the cause of the memory boundary value.
I doubled the memory and no longer had this problem, so set tested

-m 2048M,slots=16,maxmem=64G                            \
-object memory-backend-ram,id=mem0,size=256M            \
-object memory-backend-ram,id=mem1,size=256M            \
-object memory-backend-ram,id=mem2,size=256M            \
-object memory-backend-ram,id=mem3,size=256M            \
-object memory-backend-ram,id=mem4,size=256M            \
-object memory-backend-ram,id=mem5,size=768M            \

Comment 25 Guowen Shan 2022-05-12 13:54:30 UTC

(In reply to Zhenyu Zhang from comment #24)
> (In reply to Zhenyu Zhang from comment #23)
> > 
> > With below cmd, 'arch topology borken' was resolved, but hit another error.
> > kdump.service - Crash recovery kernel arming
> > kdump: No memory reserved for crash kernel
> > Is this as expected?
> 
> Seems to be the cause of the memory boundary value.
> I doubled the memory and no longer had this problem, so set tested
> 
> -m 2048M,slots=16,maxmem=64G                            \
> -object memory-backend-ram,id=mem0,size=256M            \
> -object memory-backend-ram,id=mem1,size=256M            \
> -object memory-backend-ram,id=mem2,size=256M            \
> -object memory-backend-ram,id=mem3,size=256M            \
> -object memory-backend-ram,id=mem4,size=256M            \
> -object memory-backend-ram,id=mem5,size=768M            \
>

The kexec/crash issue has nothing to do with this bug and its fixes.
I think there is no enough memory in NUMA nodes (128MB capacity) for
kexec/crash to reserve memory from. Zhenyu, Lets create another bug,
asking kexec/crash developer to do some investigations.

Thanks,
Gavin

Comment 30 Zhenyu Zhang 2022-05-23 10:19:15 UTC

With qemu-kvm-7.0.0-4.el9

-m 2048M,slots=16,maxmem=64G                            \
-object memory-backend-ram,id=mem0,size=256M            \
-object memory-backend-ram,id=mem1,size=256M            \
-object memory-backend-ram,id=mem2,size=256M            \
-object memory-backend-ram,id=mem3,size=256M            \
-object memory-backend-ram,id=mem4,size=256M            \
-object memory-backend-ram,id=mem5,size=768M            \
-numa node,nodeid=0,memdev=mem0                         \
-numa node,nodeid=1,memdev=mem1                         \
-numa node,nodeid=2,memdev=mem2                         \
-numa node,nodeid=3,memdev=mem3                         \
-numa node,nodeid=4,memdev=mem4                         \
-numa node,nodeid=5,memdev=mem5                         \
(qemu) info numa
6 nodes
node 0 cpus: 0 1 2
node 0 size: 256 MB
node 0 plugged: 0 MB
node 1 cpus: 3 4 5
node 1 size: 256 MB
node 1 plugged: 0 MB
node 2 cpus:
node 2 size: 256 MB
node 2 plugged: 0 MB
node 3 cpus:
node 3 size: 256 MB
node 3 plugged: 0 MB
node 4 cpus:
node 4 size: 256 MB
node 4 plugged: 0 MB
node 5 cpus:
node 5 size: 768 MB
node 5 plugged: 0 MB
(qemu) info status
VM status: running
[    0.000000] smccc: KVM: hypervisor services detected (0x00000000 0x00000000 0x00000000 0x00000003)
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2 -> Node 0
[    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1
[    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x4 -> Node 1
[    0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x5 -> Node 1

/usr/libexec/qemu-kvm -cpu host \
-smp 8,sockets=2,clusters=2,cores=2,threads=1 \
-m 8192 \
-object memory-backend-ram,size=1024M,id=mem-mem0 \
-object memory-backend-ram,size=3072M,id=mem-mem1  \
-object memory-backend-ram,size=1024M,id=mem-mem2 \
-object memory-backend-ram,size=3072M,id=mem-mem3  \
-numa node,memdev=mem-mem0,nodeid=0 \
-numa node,memdev=mem-mem1,nodeid=1 \
-numa node,memdev=mem-mem2,nodeid=2 \
-numa node,memdev=mem-mem3,nodeid=3 \
-numa cpu,node-id=0,socket-id=0,cluster-id=0,core-id=0,thread-id=0 \
-numa cpu,node-id=0,socket-id=0,cluster-id=0,core-id=1,thread-id=0 \
-numa cpu,node-id=1,socket-id=0,cluster-id=1,core-id=0,thread-id=0 \
-numa cpu,node-id=1,socket-id=0,cluster-id=1,core-id=1,thread-id=0 \
-numa cpu,node-id=2,socket-id=1,cluster-id=0,core-id=0,thread-id=0 \
-numa cpu,node-id=2,socket-id=1,cluster-id=0,core-id=1,thread-id=0 \
-numa cpu,node-id=3,socket-id=1,cluster-id=1,core-id=0,thread-id=0 \
-numa cpu,node-id=3,socket-id=1,cluster-id=1,core-id=1,thread-id=0 \
-monitor stdio
(qemu) info numa
4 nodes
node 0 cpus: 0 1
node 0 size: 1024 MB
node 0 plugged: 0 MB
node 1 cpus: 2 3
node 1 size: 3072 MB
node 1 plugged: 0 MB
node 2 cpus: 4 5
node 2 size: 1024 MB
node 2 plugged: 0 MB
node 3 cpus: 6 7
node 3 size: 3072 MB
node 3 plugged: 0 MB

Comment 37 errata-xmlrpc 2022-11-15 09:53:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7967