2176010 – Guest reports CPU #4's llc-sibling CPU #3 is not on the same node! [node: 1 != 0]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2176010 - Guest reports CPU #4's llc-sibling CPU #3 is not on the same node! [node: 1 != 0]

Summary: Guest reports CPU #4's llc-sibling CPU #3 is not on the same node! [node: 1 !...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	9.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Igor Mammedov
QA Contact:	Mario Casquero
Docs Contact:	Jiri Herrmann
URL:
Whiteboard:
Duplicates (1):	2203821 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-03-07 07:03 UTC by Mario Casquero
Modified:	2023-07-12 12:23 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	.NUMA node mapping not working correctly on AMD EPYC CPUs QEMU does not handle NUMA node mapping on AMD EPYC CPUs correctly. As a result, the performance of virtual machines (VMs) with these CPUs might be negatively impacted if using a NUMA node configuration. In addition, the VMs display a warning similar to the following during boot. ---- sched: CPU #4's llc-sibling CPU #3 is not on the same node! [node: 1 != 0]. Ignoring dependency. WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:415 topology_sane.isra.0+0x6b/0x80 ---- To work around this issue, do not use AMD EPYC CPUs for NUMA node configurations.
Clone Of:
Environment:
Last Closed:	2023-05-23 12:28:21 UTC
Type:	---
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-150884	0	None	None	None	2023-03-07 07:06:29 UTC

Description Mario Casquero 2023-03-07 07:03:50 UTC

Description of problem:

The guest complains about cpus on the other node not being on the same node.

Version-Release number of selected component (if applicable):
Host & Guest RHEL.9.2.0
kernel-5.14.0-268.el9.x86_64
qemu-kvm-7.2.0-10.el9.x86_64
libvirt-9.0.0-6.el9.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Get a RHEL 9.2.0 image
2. Include the specified qemu-kvm cmd in [1]
3. Boot the system and check the guest serial console

Actual results:

[stdlog] 2023-03-06 02:07:13,545 avocado.virttest.virt_vm DEBUG| ----------[ cut here ]------------
[stdlog] [    0.094913] sched: CPU #4's llc-sibling CPU #3 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[stdlog] [    0.094913] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:415 topology_sane.isra.0+0x6b/0x80
[stdlog] [    0.094913] Modules linked in:
[stdlog] [    0.094913] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.2.0 #1
[stdlog] [    0.094913] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20221207gitfff6d81270b5-6.el9 12/07/2022
[stdlog] [    0.094913] RIP: 0010:topology_sane.isra.0+0x6b/0x80
[stdlog] [    0.094913] Code: 80 3d 25 9e 0b 02 00 75 f2 48 83 ec 08 4c 89 da 44 89 d6 48 c7 c7 e8 ee f5 8f 88 44 24 07 c6 05 07 9e 0b 02 01 e8 96 fd b8 00 <0f> 0b 0f b6 44 24 07 48 83 c4 08 c3 cc cc cc cc 0f 1f 44 00 00 90
[stdlog] [    0.094913] RSP: 0000:ffffa3dcc00f7ec8 EFLAGS: 00010082
[stdlog] [    0.094913] RAX: 0000000000000000 RBX: ffff8a167ec198c0 RCX: c0000000ffff7fff
[stdlog] [    0.094913] RDX: 0000000000000000 RSI: 0000000000027ffb RDI: 0000000000000001
[stdlog] [    0.094913] RBP: 0000000000000004 R08: 0000000000000000 R09: 00000000ffff7fff
[stdlog] [    0.094913] R10: ffffa3dcc00f7d78 R11: ffffffff909e59c8 R12: 0000000000000003
[stdlog] [    0.094913] R13: ffff8a17bcd998c0 R14: 0000000000000004 R15: 0000000000000003
[stdlog] [    0.094913] FS:  0000000000000000(0000) GS:ffff8a167ec00000(0000) knlGS:0000000000000000
[stdlog] [    0.094913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[stdlog] [    0.094913] CR2: 0000000000000000 CR3: 000000010f410001 CR4: 0000000000770ee0
[stdlog] [    0.094913] PKRU: 55555554
[stdlog] [    0.094913] Call Trace:
[stdlog] [    0.094913]  <TASK>
[stdlog] [    0.094913]  set_cpu_sibling_map+0x12d/0x5e0
[stdlog] [    0.094913]  start_secondary+0x5b/0x130
[stdlog] [    0.094913]  secondary_startup_64_no_verify+0xe5/0xeb
[stdlog] [    0.094913]  </TASK>
[stdlog] [    0.094913] ---[ end trace 0000000000000000 ]---

Expected results:
Should boot with no issues

Additional info:

[1]
    -m 4096 \
    -object '{"size": 1073741824, "id": "mem-mem0", "qom-type": "memory-backend-ram"}' \
    -object '{"size": 3221225472, "id": "mem-mem1", "qom-type": "memory-backend-ram"}'  \
    -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
    -numa node,memdev=mem-mem0,cpus=4,cpus=5  \
    -numa node,memdev=mem-mem1,cpus=0,cpus=1,cpus=2,cpus=3  \
    -cpu 'EPYC-Milan',x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,vaes=on,vpclmulqdq=on,spec-ctrl=on,stibp=on,arch-capabilities=on,ssbd=on,cmp-legacy=on,virt-ssbd=on,lbrv=on,tsc-scale=on,vmcb-clean=on,pause-filter=on,pfthreshold=on,v-vmsave-vmload=on,vgif=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,erms=off,fsrm=off,kvm_pv_unhalt=on \

Comment 2 Mario Casquero 2023-03-10 09:38:04 UTC

This bug is also reproducible with libvirt and the following domain.xml configuration[1]

Test environment
kernel-5.14.0-283.el9.x86_64
qemu-kvm-7.2.0-10.el9.x86_64
libvirt-9.0.0-7.el9.x86_64

Guest RHEL.9.2.0

[1]
<memory unit='KiB'>4194304</memory>
<currentMemory unit='KiB'>4194304</currentMemory>
<vcpu placement='static'>6</vcpu>
<cpu mode='host-model' check='partial'>
    <topology sockets='2' dies='1' cores='3' threads='1'/>
    <numa>
      <cell id='0' cpus='4-5' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='0-3' memory='3145728' unit='KiB'/>
    </numa>
</cpu>

guest dmesg

[    0.115882] x86: Booting SMP configuration:
[    0.116096] .... node  #0, CPUs:      #1 #2 #3
[    0.117834] .... node  #1, CPUs:   #4
[    0.064550] ------------[ cut here ]------------
[    0.064550] sched: CPU #4's llc-sibling CPU #3 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.064550] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:424 topology_sane.isra.0+0x6b/0x80
[    0.064550] Modules linked in:
[    0.064550] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.14.0-284.el9.x86_64 #1
[    0.064550] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20221207gitfff6d81270b5-7.el9 12/07/2022
[    0.064550] RIP: 0010:topology_sane.isra.0+0x6b/0x80
[    0.064550] Code: 80 3d 63 c7 eb 01 00 75 f2 48 83 ec 08 4c 89 da 44 89 d6 48 c7 c7 50 6a 1e 84 88 44 24 07 c6 05 45 c7 eb 01 01 e8 25 80 a7 00 <0f> 0b 0f b6 44 24 07 48 83 c4 08 c3 cc cc cc cc 0f 1f 44 00 00 0f
[    0.064550] RSP: 0000:ffffb8b5000f3ec8 EFLAGS: 00010082
[    0.064550] RAX: 0000000000000000 RBX: 0000000000000004 RCX: c0000000ffff7fff
[    0.064550] RDX: 0000000000000000 RSI: 0000000000027ffb RDI: 0000000000000001
[    0.064550] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000ffff7fff
[    0.064550] R10: ffffb8b5000f3d68 R11: ffffffff84be9608 R12: 0000000000000004
[    0.064550] R13: ffff8d4a3fd938c0 R14: 0000000000000003 R15: ffff8d48fec138c0
[    0.064550] FS:  0000000000000000(0000) GS:ffff8d48fec00000(0000) knlGS:0000000000000000
[    0.064550] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.064550] CR2: 0000000000000000 CR3: 000000017ba10001 CR4: 0000000000770ee0
[    0.064550] PKRU: 55555554
[    0.064550] Call Trace:
[    0.064550]  <TASK>
[    0.064550]  set_cpu_sibling_map+0x179/0x5f0
[    0.064550]  start_secondary+0x5b/0x140
[    0.064550]  secondary_startup_64_no_verify+0xe5/0xeb
[    0.064550]  </TASK>
[    0.064550] ---[ end trace a4c33a80ba20ed7c ]---

Comment 3 Mario Casquero 2023-03-10 10:44:59 UTC

Hello Igor,

Is this needed to be fixed for RHEL9.2 ?
Depending the cpu combination of cores/sockets/dies is reproducible or not, and as you know it is not a big impact on the guest...

Comment 4 Igor Mammedov 2023-03-10 12:05:48 UTC

(In reply to Mario Casquero from comment #2)
> This bug is also reproducible with libvirt and the following domain.xml
> configuration[1]
> 
> Test environment
> kernel-5.14.0-283.el9.x86_64
> qemu-kvm-7.2.0-10.el9.x86_64
> libvirt-9.0.0-7.el9.x86_64
> 
> Guest RHEL.9.2.0
> 
> [1]
> <memory unit='KiB'>4194304</memory>
> <currentMemory unit='KiB'>4194304</currentMemory>
> <vcpu placement='static'>6</vcpu>
> <cpu mode='host-model' check='partial'>

what host CPU this is running on?

>     <topology sockets='2' dies='1' cores='3' threads='1'/>
>     <numa>
>       <cell id='0' cpus='4-5' memory='1048576' unit='KiB'/>
>       <cell id='1' cpus='0-3' memory='3145728' unit='KiB'/>
>     </numa>
> </cpu>
> 
> guest dmesg
> 
> [    0.115882] x86: Booting SMP configuration:
> [    0.116096] .... node  #0, CPUs:      #1 #2 #3
> [    0.117834] .... node  #1, CPUs:   #4
> [    0.064550] ------------[ cut here ]------------
> [    0.064550] sched: CPU #4's llc-sibling CPU #3 is not on the same node!
> [node: 1 != 0]. Ignoring dependency.
> [    0.064550] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:424
> topology_sane.isra.0+0x6b/0x80
> [    0.064550] Modules linked in:
> [    0.064550] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> 5.14.0-284.el9.x86_64 #1
> [    0.064550] Hardware name: Red Hat KVM/RHEL, BIOS
> edk2-20221207gitfff6d81270b5-7.el9 12/07/2022
> [    0.064550] RIP: 0010:topology_sane.isra.0+0x6b/0x80
> [    0.064550] Code: 80 3d 63 c7 eb 01 00 75 f2 48 83 ec 08 4c 89 da 44 89
> d6 48 c7 c7 50 6a 1e 84 88 44 24 07 c6 05 45 c7 eb 01 01 e8 25 80 a7 00 <0f>
> 0b 0f b6 44 24 07 48 83 c4 08 c3 cc cc cc cc 0f 1f 44 00 00 0f
> [    0.064550] RSP: 0000:ffffb8b5000f3ec8 EFLAGS: 00010082
> [    0.064550] RAX: 0000000000000000 RBX: 0000000000000004 RCX:
> c0000000ffff7fff
> [    0.064550] RDX: 0000000000000000 RSI: 0000000000027ffb RDI:
> 0000000000000001
> [    0.064550] RBP: 0000000000000003 R08: 0000000000000000 R09:
> 00000000ffff7fff
> [    0.064550] R10: ffffb8b5000f3d68 R11: ffffffff84be9608 R12:
> 0000000000000004
> [    0.064550] R13: ffff8d4a3fd938c0 R14: 0000000000000003 R15:
> ffff8d48fec138c0
> [    0.064550] FS:  0000000000000000(0000) GS:ffff8d48fec00000(0000)
> knlGS:0000000000000000
> [    0.064550] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.064550] CR2: 0000000000000000 CR3: 000000017ba10001 CR4:
> 0000000000770ee0
> [    0.064550] PKRU: 55555554
> [    0.064550] Call Trace:
> [    0.064550]  <TASK>
> [    0.064550]  set_cpu_sibling_map+0x179/0x5f0
> [    0.064550]  start_secondary+0x5b/0x140
> [    0.064550]  secondary_startup_64_no_verify+0xe5/0xeb
> [    0.064550]  </TASK>
> [    0.064550] ---[ end trace a4c33a80ba20ed7c ]---

Comment 5 Mario Casquero 2023-03-10 12:12:51 UTC

(In reply to Igor Mammedov from comment #4)
> what host CPU this is running on?

AMD EPYC 7313

Comment 6 Igor Mammedov 2023-03-21 14:49:22 UTC

CCing Babu,

Can you look into this issue, please?

PS:
(we've had similar issue with Intel CPU models and host passthrough
https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html,
though you mentioned that for AMD cpus fix might be complicated/different)

Comment 7 Babu Moger 2023-03-21 15:16:33 UTC

(In reply to Igor Mammedov from comment #6)
> CCing Babu,
> 
> Can you look into this issue, please?
> 
> PS:
> (we've had similar issue with Intel CPU models and host passthrough
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html,
> though you mentioned that for AMD cpus fix might be complicated/different)

Igor, This is kind of odd case. I don't think we can fix this.

 -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
    -numa node,memdev=mem-mem0,cpus=4,cpus=5  \
    -numa node,memdev=mem-mem1,cpus=0,cpus=1,cpus=2,cpus=3  \

Command line is saying, there are 6 CPUs. 4 cpus(0,1,2,3) are in node 0 and 2 cpus(4,5) are in node 1. That is kind of weird scenario. The encoding will not work that way.

Comment 8 Igor Mammedov 2023-03-22 12:08:03 UTC

Mario,

can you try to reproduce with a bit more sane topology (something that resembles real life AMD cpus)?
i.e. if -smp says cores=3, then put all 3 cores on the same node.
It's not clear which vcpu belongs to which core when specifying numa
mapping with 'cpus=X-Y',but you can use alternative numa command
to do right thing:

 -numa cpus,node-id=0,socket-id=0
 -numa cpus,node-id=1,socket-id=1

Comment 9 Mario Casquero 2023-03-24 07:34:04 UTC

(In reply to Igor Mammedov from comment #8)
> Mario,
> 
> can you try to reproduce with a bit more sane topology (something that
> resembles real life AMD cpus)?
> i.e. if -smp says cores=3, then put all 3 cores on the same node.
> It's not clear which vcpu belongs to which core when specifying numa
> mapping with 'cpus=X-Y',but you can use alternative numa command
> to do right thing:
> 
>  -numa cpus,node-id=0,socket-id=0
>  -numa cpus,node-id=1,socket-id=1

Hello Igor,

I've tried with the following qemu-kvm cmd[1] according to your suggestion:

[1]
/usr/libexec/qemu-kvm \
...
-smp 8,maxcpus=8,cores=2,threads=1,sockets=4  \
-m 4096 \
-object '{"size": 1073741824, "id": "mem-mem0", "qom-type": "memory-backend-ram"}' \
-object '{"size": 3221225472, "id": "mem-mem1", "qom-type": "memory-backend-ram"}'  \
-smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
-numa node,memdev=mem-mem0,nodeid=0  \
-numa node,memdev=mem-mem1,nodeid=1  \
-numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=1,socket-id=1 \
-cpu 'EPYC' \
...

Now the guest numa topology:
[root@localhost ~]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2
node 0 size: 750 MB
node 0 free: 126 MB
node 1 cpus: 3 4 5
node 1 size: 2897 MB
node 1 free: 2035 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

The same WARNING trace can be seen in guest dmesg

[    0.073274] ------------[ cut here ]------------
[    0.073274] sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.073274] WARNING: CPU: 3 PID: 0 at arch/x86/kernel/smpboot.c:424 topology_sane.isra.0+0x6b/0x80
[    0.073274] Modules linked in:
[    0.073274] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.14.0-284.4.1.el9_2.x86_64 #1
[    0.073274] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20221207gitfff6d81270b5-9.el9_2 12/07/2022
[    0.073274] RIP: 0010:topology_sane.isra.0+0x6b/0x80
[    0.073274] Code: 80 3d 93 c7 eb 01 00 75 f2 48 83 ec 08 4c 89 da 44 89 d6 48 c7 c7 c8 6b fe 91 88 44 24 07 c6 05 75 c7 eb 01 01 e8 a3 7f a7 00 <0f> 0b 0f b6 44 24 07 48 83 c4 08 e9 65 1e da 00 0f 1f 44 00 00 0f
[    0.073274] RSP: 0000:ffffa3b3800f3ec8 EFLAGS: 00010082
[    0.073274] RAX: 0000000000000000 RBX: 0000000000000003 RCX: c0000000ffff7fff
[    0.073274] RDX: 0000000000000000 RSI: 0000000000027ffb RDI: 0000000000000001
[    0.073274] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffff7fff
[    0.073274] R10: ffffa3b3800f3d68 R11: ffffffff929e9608 R12: 0000000000000003
[    0.073274] R13: ffff914bffc138c0 R14: 0000000000000000 R15: ffff914d3cc138c0
[    0.073274] FS:  0000000000000000(0000) GS:ffff914d3cc00000(0000) knlGS:0000000000000000
[    0.073274] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.073274] CR2: 0000000000000000 CR3: 0000000108410000 CR4: 00000000003506e0
[    0.073274] Call Trace:
[    0.073274]  <TASK>
[    0.073274]  set_cpu_sibling_map+0x179/0x5f0
[    0.073274]  start_secondary+0x5b/0x140
[    0.073274]  secondary_startup_64_no_verify+0xe5/0xeb
[    0.073274]  </TASK>
[    0.073274] ---[ end trace 29640d700c71645a ]---

Comment 10 Igor Mammedov 2023-03-27 12:39:24 UTC

(In reply to Babu Moger from comment #7)
> (In reply to Igor Mammedov from comment #6)
> > CCing Babu,
> > 
> > Can you look into this issue, please?
> > 
> > PS:
> > (we've had similar issue with Intel CPU models and host passthrough
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html,
> > though you mentioned that for AMD cpus fix might be complicated/different)
> 
> Igor, This is kind of odd case. I don't think we can fix this.
> 
>  -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
>     -numa node,memdev=mem-mem0,cpus=4,cpus=5  \
>     -numa node,memdev=mem-mem1,cpus=0,cpus=1,cpus=2,cpus=3  \
> 
> Command line is saying, there are 6 CPUs. 4 cpus(0,1,2,3) are in node 0 and
> 2 cpus(4,5) are in node 1. That is kind of weird scenario. The encoding will
> not work that way.

Babu,

It still reproduces with sane topology either with 3 cores per socket or 4 cores per socket on 2 socket setup.
It also reproduces on current upstream QEMU master branch (e3debd5e7d0ce03).

Note:
it starts complaining as soon as it gets to bringing up cores on the 2nd numa node.

Comment 11 Babu Moger 2023-03-27 19:54:41 UTC

(In reply to Igor Mammedov from comment #10)
> (In reply to Babu Moger from comment #7)
> > (In reply to Igor Mammedov from comment #6)
> > > CCing Babu,
> > > 
> > > Can you look into this issue, please?
> > > 
> > > PS:
> > > (we've had similar issue with Intel CPU models and host passthrough
> > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html,
> > > though you mentioned that for AMD cpus fix might be complicated/different)
> > 
> > Igor, This is kind of odd case. I don't think we can fix this.
> > 
> >  -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
> >     -numa node,memdev=mem-mem0,cpus=4,cpus=5  \
> >     -numa node,memdev=mem-mem1,cpus=0,cpus=1,cpus=2,cpus=3  \
> > 
> > Command line is saying, there are 6 CPUs. 4 cpus(0,1,2,3) are in node 0 and
> > 2 cpus(4,5) are in node 1. That is kind of weird scenario. The encoding will
> > not work that way.
> 
> Babu,
> 
> It still reproduces with sane topology either with 3 cores per socket or 4
> cores per socket on 2 socket setup.
> It also reproduces on current upstream QEMU master branch (e3debd5e7d0ce03).
> 
> Note:
> it starts complaining as soon as it gets to bringing up cores on the 2nd
> numa node.

Yes. It is because the way kernel calculates CPUs sharing L3 cache(num_sharing_cache). num_sharing_cache is in the order of power of 2(For example 1,2,4,8,16 etc..). The above command line says there are only 3 cpus sharing the L3 cache. Looking at the apicid kernel says that cannot happen.

The message is coming these files.
https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/cpu/cacheinfo.c#L662
https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/smpboot.c#L411

We may not be able to fix this.
Thanks
Babu

Comment 12 Babu Moger 2023-03-27 20:01:17 UTC

(In reply to Babu Moger from comment #11)
> (In reply to Igor Mammedov from comment #10)
> > (In reply to Babu Moger from comment #7)
> > > (In reply to Igor Mammedov from comment #6)
> > > > CCing Babu,
> > > > 
> > > > Can you look into this issue, please?
> > > > 
> > > > PS:
> > > > (we've had similar issue with Intel CPU models and host passthrough
> > > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html,
> > > > though you mentioned that for AMD cpus fix might be complicated/different)
> > > 
> > > Igor, This is kind of odd case. I don't think we can fix this.
> > > 
> > >  -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
> > >     -numa node,memdev=mem-mem0,cpus=4,cpus=5  \
> > >     -numa node,memdev=mem-mem1,cpus=0,cpus=1,cpus=2,cpus=3  \
> > > 
> > > Command line is saying, there are 6 CPUs. 4 cpus(0,1,2,3) are in node 0 and
> > > 2 cpus(4,5) are in node 1. That is kind of weird scenario. The encoding will
> > > not work that way.
> > 
> > Babu,
> > 
> > It still reproduces with sane topology either with 3 cores per socket or 4
> > cores per socket on 2 socket setup.
> > It also reproduces on current upstream QEMU master branch (e3debd5e7d0ce03).
> > 
> > Note:
> > it starts complaining as soon as it gets to bringing up cores on the 2nd
> > numa node.
> 
> Yes. It is because the way kernel calculates CPUs sharing L3
> cache(num_sharing_cache). num_sharing_cache is in the order of power of
> 2(For example 1,2,4,8,16 etc..). The above command line says there are only
> 3 cpus sharing the L3 cache. Looking at the apicid kernel says that cannot
> happen.

Based on apic decoding kernel thinks the CPU 3 and CPU 0 should be on the same node. But command line says otherwise. That is why we are seeing the warning here.
 

> 
> The message is coming these files.
> https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/cpu/cacheinfo.
> c#L662
> https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/smpboot.c#L411
> 
> We may not be able to fix this.
> Thanks
> Babu

Comment 13 Igor Mammedov 2023-03-28 10:54:02 UTC

(In reply to Babu Moger from comment #12)
> (In reply to Babu Moger from comment #11)
> > (In reply to Igor Mammedov from comment #10)
> > > (In reply to Babu Moger from comment #7)
> > > > (In reply to Igor Mammedov from comment #6)
> > > > > CCing Babu,
> > > > > 
> > > > > Can you look into this issue, please?
> > > > > 
> > > > > PS:
> > > > > (we've had similar issue with Intel CPU models and host passthrough
> > > > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html,
> > > > > though you mentioned that for AMD cpus fix might be complicated/different)
> > > > 
> > > > Igor, This is kind of odd case. I don't think we can fix this.
> > > > 
> > > >  -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
> > > >     -numa node,memdev=mem-mem0,cpus=4,cpus=5  \
> > > >     -numa node,memdev=mem-mem1,cpus=0,cpus=1,cpus=2,cpus=3  \
> > > > 
> > > > Command line is saying, there are 6 CPUs. 4 cpus(0,1,2,3) are in node 0 and
> > > > 2 cpus(4,5) are in node 1. That is kind of weird scenario. The encoding will
> > > > not work that way.
> > > 
> > > Babu,
> > > 
> > > It still reproduces with sane topology either with 3 cores per socket or 4
> > > cores per socket on 2 socket setup.
> > > It also reproduces on current upstream QEMU master branch (e3debd5e7d0ce03).
> > > 
> > > Note:
> > > it starts complaining as soon as it gets to bringing up cores on the 2nd
> > > numa node.
> > 
> > Yes. It is because the way kernel calculates CPUs sharing L3
> > cache(num_sharing_cache). num_sharing_cache is in the order of power of
> > 2(For example 1,2,4,8,16 etc..). The above command line says there are only
> > 3 cpus sharing the L3 cache. Looking at the apicid kernel says that cannot
> > happen.
> 
> Based on apic decoding kernel thinks the CPU 3 and CPU 0 should be on the
> same node. But command line says otherwise. That is why we are seeing the
> warning here.

We have the same warning with '-smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2'

> > The message is coming these files.
> > https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/cpu/cacheinfo.
> > c#L662
> > https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/smpboot.c#L411
> > 
> > We may not be able to fix this.

it seems QEMU has been broken somewhere between 5.0.1 and 5.2

> > Thanks
> > Babu

Comment 14 Igor Mammedov 2023-03-28 11:48:00 UTC

1st offending commit:

commit 081599cb9f9b2043aa543907e932bc6f2bb315ba
Author: Babu Moger <babu.moger>
Date:   Mon Aug 31 13:42:17 2020 -0500

    Revert "target/i386: Enable new apic id encoding for EPYC based cpus models"

Comment 15 Babu Moger 2023-03-28 14:07:44 UTC

(In reply to Igor Mammedov from comment #14)
> 1st offending commit:
> 
> commit 081599cb9f9b2043aa543907e932bc6f2bb315ba
> Author: Babu Moger <babu.moger>
> Date:   Mon Aug 31 13:42:17 2020 -0500
> 
>     Revert "target/i386: Enable new apic id encoding for EPYC based cpus
> models"

Igor, Yes. We tried to fix all the weird configurations modifying apic id. We ran into lots of issues. So, we went back to generic decoding and reverted all the changes.
Here is the thread. https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg08270.html

Comment 16 Babu Moger 2023-03-28 14:11:34 UTC

(In reply to Igor Mammedov from comment #13)
> (In reply to Babu Moger from comment #12)
> > (In reply to Babu Moger from comment #11)
> > > (In reply to Igor Mammedov from comment #10)
> > > > (In reply to Babu Moger from comment #7)
> > > > > (In reply to Igor Mammedov from comment #6)
> > > > > > CCing Babu,
> > > > > > 
> > > > > > Can you look into this issue, please?
> > > > > > 
> > > > > > PS:
> > > > > > (we've had similar issue with Intel CPU models and host passthrough
> > > > > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html,
> > > > > > though you mentioned that for AMD cpus fix might be complicated/different)
> > > > > 
> > > > > Igor, This is kind of odd case. I don't think we can fix this.
> > > > > 
> > > > >  -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
> > > > >     -numa node,memdev=mem-mem0,cpus=4,cpus=5  \
> > > > >     -numa node,memdev=mem-mem1,cpus=0,cpus=1,cpus=2,cpus=3  \
> > > > > 
> > > > > Command line is saying, there are 6 CPUs. 4 cpus(0,1,2,3) are in node 0 and
> > > > > 2 cpus(4,5) are in node 1. That is kind of weird scenario. The encoding will
> > > > > not work that way.
> > > > 
> > > > Babu,
> > > > 
> > > > It still reproduces with sane topology either with 3 cores per socket or 4
> > > > cores per socket on 2 socket setup.
> > > > It also reproduces on current upstream QEMU master branch (e3debd5e7d0ce03).
> > > > 
> > > > Note:
> > > > it starts complaining as soon as it gets to bringing up cores on the 2nd
> > > > numa node.
> > > 
> > > Yes. It is because the way kernel calculates CPUs sharing L3
> > > cache(num_sharing_cache). num_sharing_cache is in the order of power of
> > > 2(For example 1,2,4,8,16 etc..). The above command line says there are only
> > > 3 cpus sharing the L3 cache. Looking at the apicid kernel says that cannot
> > > happen.
> > 
> > Based on apic decoding kernel thinks the CPU 3 and CPU 0 should be on the
> > same node. But command line says otherwise. That is why we are seeing the
> > warning here.
> 
> We have the same warning with '-smp
> 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2'

This should have worked. I will try to recreate locally. I use qemu command line. Please send me full qemu command line.
Thanks

Comment 17 Babu Moger 2023-03-28 16:13:02 UTC

(In reply to Babu Moger from comment #16)
> (In reply to Igor Mammedov from comment #13)
> > (In reply to Babu Moger from comment #12)
> > > (In reply to Babu Moger from comment #11)
> > > > (In reply to Igor Mammedov from comment #10)
> > > > > (In reply to Babu Moger from comment #7)
> > > > > > (In reply to Igor Mammedov from comment #6)
> > > > > > > CCing Babu,
> > > > > > > 
> > > > > > > Can you look into this issue, please?
> > > > > > > 
> > > > > > > PS:
> > > > > > > (we've had similar issue with Intel CPU models and host passthrough
> > > > > > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html,
> > > > > > > though you mentioned that for AMD cpus fix might be complicated/different)
> > > > > > 
> > > > > > Igor, This is kind of odd case. I don't think we can fix this.
> > > > > > 
> > > > > >  -smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
> > > > > >     -numa node,memdev=mem-mem0,cpus=4,cpus=5  \
> > > > > >     -numa node,memdev=mem-mem1,cpus=0,cpus=1,cpus=2,cpus=3  \
> > > > > > 
> > > > > > Command line is saying, there are 6 CPUs. 4 cpus(0,1,2,3) are in node 0 and
> > > > > > 2 cpus(4,5) are in node 1. That is kind of weird scenario. The encoding will
> > > > > > not work that way.
> > > > > 
> > > > > Babu,
> > > > > 
> > > > > It still reproduces with sane topology either with 3 cores per socket or 4
> > > > > cores per socket on 2 socket setup.
> > > > > It also reproduces on current upstream QEMU master branch (e3debd5e7d0ce03).
> > > > > 
> > > > > Note:
> > > > > it starts complaining as soon as it gets to bringing up cores on the 2nd
> > > > > numa node.
> > > > 
> > > > Yes. It is because the way kernel calculates CPUs sharing L3
> > > > cache(num_sharing_cache). num_sharing_cache is in the order of power of
> > > > 2(For example 1,2,4,8,16 etc..). The above command line says there are only
> > > > 3 cpus sharing the L3 cache. Looking at the apicid kernel says that cannot
> > > > happen.
> > > 
> > > Based on apic decoding kernel thinks the CPU 3 and CPU 0 should be on the
> > > same node. But command line says otherwise. That is why we are seeing the
> > > warning here.
> > 
> > We have the same warning with '-smp
> > 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2'
> 
> This should have worked. I will try to recreate locally. I use qemu command
> line. Please send me full qemu command line.
> Thanks

I am able to run similar configuration without any issues.

qemu-system-x86_64 -name rh85 -m 1024M -smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2 -object memory-backend-ram,id=mem0,size=512M -object memory-backend-ram,id=mem1,size=512M -numa node,nodeid=0,cpu
s=4-7,memdev=mem0 -numa node,nodeid=1,cpus=0-3,memdev=mem1 -hda vdisk.qcow2 -enable-kvm -net nic -net bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper -cpu EPYC-Milan,+svm,+pmu -nographic

# uname -r
4.18.0-348.el8.x86_64
================================================================

[    0.025391] smp: Bringing up secondary CPUs ...
[    0.025880] x86: Booting SMP configuration:
[    0.026001] .... node  #0, CPUs:      #1
[    0.001000] kvm-clock: cpu 1, msr 39201041, secondary cpu clock
[    0.027181] kvm-guest: stealtime: cpu 1, msr 3f4ac080
[    0.028055]  #2
[    0.001000] kvm-clock: cpu 2, msr 39201081, secondary cpu clock
[    0.029038] kvm-guest: stealtime: cpu 2, msr 3f52c080
[    0.030055]  #3
[    0.001000] kvm-clock: cpu 3, msr 392010c1, secondary cpu clock
[    0.030536] kvm-guest: stealtime: cpu 3, msr 3f5ac080
[    0.031066] .... node  #1, CPUs:   #4
[    0.001000] kvm-clock: cpu 4, msr 39201101, secondary cpu clock
[    0.001000] smpboot: CPU 4 Converting physical 0 to logical die 1
[    0.033601] kvm-guest: stealtime: cpu 4, msr 1f42c080
[    0.034059]  #5
[    0.001000] kvm-clock: cpu 5, msr 39201141, secondary cpu clock
[    0.034564] kvm-guest: stealtime: cpu 5, msr 1f4ac080
[    0.035052]  #6
[    0.001000] kvm-clock: cpu 6, msr 39201181, secondary cpu clock
[    0.036395] kvm-guest: stealtime: cpu 6, msr 1f52c080
[    0.037235]  #7
[    0.001000] kvm-clock: cpu 7, msr 392011c1, secondary cpu clock
[    0.038238] kvm-guest: stealtime: cpu 7, msr 1f5ac080
[    0.039004] smp: Brought up 2 nodes, 8 CPUs
[    0.039410] smpboot: Max logical packages: 2
[    0.040001] smpboot: Total of 8 processors activated (38399.93 BogoMIPS)
[    0.041416] node 0 deferred pages initialised in 0ms
[    0.041514] node 1 deferred pages initialised in 0ms

Comment 18 Igor Mammedov 2023-03-29 15:35:05 UTC

upstream QEMU, boot RHEL9.0 cdrom into rescue mode shell and see dmesg

./x86_64-softmmu/qemu-system-x86_64 -m 1024M \
  -smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2 \
  -object memory-backend-ram,id=mem0,size=512M -object memory-backend-ram,id=mem1,size=512M \
  -numa node,nodeid=0,cpus=4-7,memdev=mem0 -numa node,nodeid=1,cpus=0-3,memdev=mem1 \
  -enable-kvm  \
  -cpu EPYC \
  -cdrom ~/RHEL-9.0.0-20220420.0-x86_64-dvd1.iso

pay attn to cpumodel (it works fine with EPYC-Milan, but not with plain EPYC)
(same with rhel8.0 install cd, so it's likely not guest depended)

host if that matters:
RHEL9.2 beta on
AMD EPYC 7542 32-Core Processor

Comment 19 Babu Moger 2023-03-29 15:43:54 UTC

(In reply to Igor Mammedov from comment #18)
> upstream QEMU, boot RHEL9.0 cdrom into rescue mode shell and see dmesg

Ok. Looks like different issue. Will look into it. 

Does it work with redhat qemu?

> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 1024M \
>   -smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2 \
>   -object memory-backend-ram,id=mem0,size=512M -object
> memory-backend-ram,id=mem1,size=512M \
>   -numa node,nodeid=0,cpus=4-7,memdev=mem0 -numa
> node,nodeid=1,cpus=0-3,memdev=mem1 \
>   -enable-kvm  \
>   -cpu EPYC \
>   -cdrom ~/RHEL-9.0.0-20220420.0-x86_64-dvd1.iso
> 
> pay attn to cpumodel (it works fine with EPYC-Milan, but not with plain EPYC)
> (same with rhel8.0 install cd, so it's likely not guest depended)
> 
> host if that matters:
> RHEL9.2 beta on
> AMD EPYC 7542 32-Core Processor

Comment 20 Babu Moger 2023-03-29 19:19:02 UTC

(In reply to Igor Mammedov from comment #18)
> upstream QEMU, boot RHEL9.0 cdrom into rescue mode shell and see dmesg
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 1024M \
>   -smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2 \
>   -object memory-backend-ram,id=mem0,size=512M -object
> memory-backend-ram,id=mem1,size=512M \
>   -numa node,nodeid=0,cpus=4-7,memdev=mem0 -numa
> node,nodeid=1,cpus=0-3,memdev=mem1 \
>   -enable-kvm  \
>   -cpu EPYC \
>   -cdrom ~/RHEL-9.0.0-20220420.0-x86_64-dvd1.iso
> 
> pay attn to cpumodel (it works fine with EPYC-Milan, but not with plain EPYC)
> (same with rhel8.0 install cd, so it's likely not guest depended)
> 
> host if that matters:
> RHEL9.2 beta on
> AMD EPYC 7542 32-Core Processor


I have no problem installing RHEL 9 with EPYC model without any issues.

#usr/local/bin/qemu-system-x86_64 -version
QEMU emulator version 7.2.92 (v8.0.0-rc2-16-gf00506aeca-dirty)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

/usr/local/bin/qemu-system-x86_64 -name rhel90  -m 1024M -smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2 -object memory-backend-ram,id=mem0,size=512M -object memory-backend-ram,id=mem1,size=512M  -numa node,nodeid=0,cpus=4-7,memdev=mem0 -numa node,nodeid=1,cpus=0-3,memdev=mem1  -hda vdisk-rh90.qcow2 -enable-kvm  -net nic -net bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper -cpu EPYC,+svm  -nographic 
-nographic -boot d -cdrom RHEL-9.0.0-20220420.0-x86_64-dvd1.iso

[root@Milan-b1-vm ~]# uname -r
5.14.0-70.13.1.el9_0.x86_64
[root@Milan-b1-vm ~]# lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               AuthenticAMD
  BIOS Vendor ID:        QEMU
  Model name:            AMD EPYC Processor

Comment 23 Babu Moger 2023-04-14 15:59:18 UTC

(In reply to Igor Mammedov from comment #22)
> I got access to the same host again
> (dell-per6515-01.khw2.lab.eng.bos.redhat.com)
> and yes reproducer requires AMD host (any host that leads guest kernel to
> cacheinfo_amd_init_llc_id() should do),
> which in my case is:
> 
>  Vendor ID:               AuthenticAMD
>   BIOS Vendor ID:        AMD
>   Model name:            AMD EPYC 7542 32-Core Processor
>     BIOS Model name:     AMD EPYC 7542 32-Core Processor                
>     CPU family:          23
>     Model:               49

This is a Rome system.

>     Thread(s) per core:  2
>     Core(s) per socket:  32
>     Socket(s):           1
>     Stepping:            0
> 
> With some printk kernel instrumentation following shows:
> [    0.001000] SRAT: PXM 0 -> APIC 0x00 -> Node 0
> [    0.001000] SRAT: PXM 0 -> APIC 0x01 -> Node 0
> [    0.001000] SRAT: PXM 0 -> APIC 0x02 -> Node 0
> [    0.001000] SRAT: PXM 0 -> APIC 0x03 -> Node 0
> [    0.001000] SRAT: PXM 1 -> APIC 0x04 -> Node 1
> [    0.001000] SRAT: PXM 1 -> APIC 0x05 -> Node 1
> [    0.001000] SRAT: PXM 1 -> APIC 0x06 -> Node 1
> [    0.001000] SRAT: PXM 1 -> APIC 0x07 -> Node 1
> ...
> [    0.231503] smp: Bringing up secondary CPUs ...
> [    0.231838] x86: Booting SMP configuration:
> [    0.232319] .... node  #0, CPUs:      #1
> [    0.097768] cacheinfo_amd_init_llc_id: cpu: 1. apicid: 1
> [    0.097768] c->x86 == 0x17 && c->x86_model <= 0x1F
> [    0.097768] cpu_llc_id 0
> [    0.235053]  #2
> [    0.097768] cacheinfo_amd_init_llc_id: cpu: 2. apicid: 2
> [    0.097768] c->x86 == 0x17 && c->x86_model <= 0x1F
> [    0.097768] cpu_llc_id 0
> [    0.237401]  #3
> [    0.097768] cacheinfo_amd_init_llc_id: cpu: 3. apicid: 3
> [    0.097768] c->x86 == 0x17 && c->x86_model <= 0x1F
> [    0.097768] cpu_llc_id 0
> [    0.239585] 
> [    0.239770] .... node  #1, CPUs:   #4
> [    0.097768] cacheinfo_amd_init_llc_id: cpu: 4. apicid: 4
> [    0.097768] c->x86 == 0x17 && c->x86_model <= 0x1F
> [    0.097768] cpu_llc_id 0
> [    0.097768] ------------[ cut here ]------------
> [    0.097768] sched: CPU #4's llc-sibling CPU #0 is not on the same node!
> [node: 1 != 0]. Ignoring dependency.
> [    0.097768] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:418
> topology_sane.isra.6+0x5f/0x70
> 
> 
> which tells us that 2nd branch is being followed:
> 
> void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu)
> {
> [...]
> 	} else if (c->x86 == 0x17 && c->x86_model <= 0x1F) {
> 		/*
> 		 * LLC is at the core complex level.
> 		 * Core complex ID is ApicId[3] for these processors.
> 		 */
> 		per_cpu(cpu_llc_id, cpu) = c->apicid >> 3;
> [...]
> 
> 
> and that can't possibly work since all 8 vcpus have APIC ID in the 1st 3
> bits, and the rest is 0.
> 
> 1) -smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2   \
>    -numa node,nodeid=1,cpus=4-7,memdev=mem0 \
>    -numa node,nodeid=0,cpus=0-3,memdev=mem1

Looks like you are using EPYC model for the guest.  Rome models uses little bit different way to calculate LLC ID.

 /*
  * LLC ID is calculated from the number of threads sharing the
  * cache.
  * */

Can you try with EPYC-Rome?

Comment 24 Igor Mammedov 2023-04-17 09:41:53 UTC

(In reply to Babu Moger from comment #23)
> (In reply to Igor Mammedov from comment #22)
> > I got access to the same host again
> > (dell-per6515-01.khw2.lab.eng.bos.redhat.com)
> > and yes reproducer requires AMD host (any host that leads guest kernel to
> > cacheinfo_amd_init_llc_id() should do),
> > which in my case is:
> > 
> >  Vendor ID:               AuthenticAMD
> >   BIOS Vendor ID:        AMD
> >   Model name:            AMD EPYC 7542 32-Core Processor
> >     BIOS Model name:     AMD EPYC 7542 32-Core Processor                
> >     CPU family:          23
> >     Model:               49
> 
> This is a Rome system.

It's host CPU though => shouldn't influence guest's 'EPYC' cpu model  
 
> >     Thread(s) per core:  2
> >     Core(s) per socket:  32
> >     Socket(s):           1
> >     Stepping:            0
> > 
> > With some printk kernel instrumentation following shows:
> > [    0.001000] SRAT: PXM 0 -> APIC 0x00 -> Node 0
> > [    0.001000] SRAT: PXM 0 -> APIC 0x01 -> Node 0
> > [    0.001000] SRAT: PXM 0 -> APIC 0x02 -> Node 0
> > [    0.001000] SRAT: PXM 0 -> APIC 0x03 -> Node 0
> > [    0.001000] SRAT: PXM 1 -> APIC 0x04 -> Node 1
> > [    0.001000] SRAT: PXM 1 -> APIC 0x05 -> Node 1
> > [    0.001000] SRAT: PXM 1 -> APIC 0x06 -> Node 1
> > [    0.001000] SRAT: PXM 1 -> APIC 0x07 -> Node 1
> > ...
> > [    0.231503] smp: Bringing up secondary CPUs ...
> > [    0.231838] x86: Booting SMP configuration:
> > [    0.232319] .... node  #0, CPUs:      #1
> > [    0.097768] cacheinfo_amd_init_llc_id: cpu: 1. apicid: 1
> > [    0.097768] c->x86 == 0x17 && c->x86_model <= 0x1F
> > [    0.097768] cpu_llc_id 0
> > [    0.235053]  #2
> > [    0.097768] cacheinfo_amd_init_llc_id: cpu: 2. apicid: 2
> > [    0.097768] c->x86 == 0x17 && c->x86_model <= 0x1F
> > [    0.097768] cpu_llc_id 0
> > [    0.237401]  #3
> > [    0.097768] cacheinfo_amd_init_llc_id: cpu: 3. apicid: 3
> > [    0.097768] c->x86 == 0x17 && c->x86_model <= 0x1F
> > [    0.097768] cpu_llc_id 0
> > [    0.239585] 
> > [    0.239770] .... node  #1, CPUs:   #4
> > [    0.097768] cacheinfo_amd_init_llc_id: cpu: 4. apicid: 4
> > [    0.097768] c->x86 == 0x17 && c->x86_model <= 0x1F
> > [    0.097768] cpu_llc_id 0
> > [    0.097768] ------------[ cut here ]------------
> > [    0.097768] sched: CPU #4's llc-sibling CPU #0 is not on the same node!
> > [node: 1 != 0]. Ignoring dependency.
> > [    0.097768] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:418
> > topology_sane.isra.6+0x5f/0x70
> > 
> > 
> > which tells us that 2nd branch is being followed:
> > 
> > void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu)
> > {
> > [...]
> > 	} else if (c->x86 == 0x17 && c->x86_model <= 0x1F) {
> > 		/*
> > 		 * LLC is at the core complex level.
> > 		 * Core complex ID is ApicId[3] for these processors.
> > 		 */
> > 		per_cpu(cpu_llc_id, cpu) = c->apicid >> 3;
> > [...]
> > 
> > 
> > and that can't possibly work since all 8 vcpus have APIC ID in the 1st 3
> > bits, and the rest is 0.
> > 
> > 1) -smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2   \
> >    -numa node,nodeid=1,cpus=4-7,memdev=mem0 \
> >    -numa node,nodeid=0,cpus=0-3,memdev=mem1
> 
> Looks like you are using EPYC model for the guest.  Rome models uses little
> bit different way to calculate LLC ID.

Yep, I'm using plain EPYC model as it was a problematic one on this host
(as stated in earlier comments)

> 
>  /*
>   * LLC ID is calculated from the number of threads sharing the
>   * cache.
>   * */
> 
> Can you try with EPYC-Rome?

I've returned loaned system already.
If you have access to beaker I can try to get the machine again for you to play with.
(It might take a few days)

Comment 25 Babu Moger 2023-04-18 15:49:45 UTC

(In reply to Igor Mammedov from comment #24)
> (In reply to Babu Moger from comment #23)
> > (In reply to Igor Mammedov from comment #22)
> > > I got access to the same host again
> > > (dell-per6515-01.khw2.lab.eng.bos.redhat.com)
> > > and yes reproducer requires AMD host (any host that leads guest kernel to
> > > cacheinfo_amd_init_llc_id() should do),
> > > which in my case is:
> > > 
> > >  Vendor ID:               AuthenticAMD
> > >   BIOS Vendor ID:        AMD
> > >   Model name:            AMD EPYC 7542 32-Core Processor
> > >     BIOS Model name:     AMD EPYC 7542 32-Core Processor                
> > >     CPU family:          23
> > >     Model:               49
> > 
> > This is a Rome system.
> 
> It's host CPU though => shouldn't influence guest's 'EPYC' cpu model  

That is correct. However, we have no plans to change the decoding specific to some configuration. We have tried it earlier to support it. But it is more problematic.

Comment 26 Mario Casquero 2023-05-03 07:30:30 UTC

After booting up a guest[1] on dell-per6515-01.khw2.lab.eng.bos.redhat.com host, using EPYC-Rome CPU Model, no warning trace can be observed

Host
kernel-5.14.0-284.11.1.el9_2.x86_64
qemu-kvm-7.2.0-14.el9_2
libvirt-9.0.0-10.1.el9_2.x86_64

Guest
RHEL.9.2.0
[root@dhcp16-211-245 home]# lscpu
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               AuthenticAMD
  BIOS Vendor ID:        Red Hat
  Model name:            AMD EPYC-Rome Processor
    BIOS Model name:     RHEL-9.2.0 PC (Q35 + ICH9, 2009)
    CPU family:          23
    Model:               49
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           2

guest dmesg
[    0.303463] x86: Booting SMP configuration:
[    0.303648] .... node  #0, CPUs:      #1 #2 #3
[    0.305647] .... node  #1, CPUs:   #4 #5 #6 #7
[    0.307991] smp: Brought up 2 nodes, 8 CPUs
[    0.308647] smpboot: Max logical packages: 2
[    0.309034] smpboot: Total of 8 processors activated (46312.99 BogoMIPS)
[    0.312618] node 0 deferred pages initialised in 3ms
[    0.312974] node 1 deferred pages initialised in 3ms

[1]
/usr/libexec/qemu-kvm \
...
-smp 8,maxcpus=8,cores=4,threads=1,dies=1,sockets=2  \
-m 4096,maxmem=40G,slots=20 \
-object '{"size": 2147483648, "id": "mem0", "qom-type": "memory-backend-ram"}' \
-object '{"size": 2147483648, "id": "mem1", "qom-type": "memory-backend-ram"}' \
-numa node,nodeid=1,cpus=4-7,memdev=mem0 \
-numa node,nodeid=0,cpus=0-3,memdev=mem1 \
-cpu 'EPYC-Rome' \
...

Comment 27 Mario Casquero 2023-05-08 07:15:47 UTC

Hello Igor,

Based on the results from comment 26, what is the plan for this bug?

Comment 28 Igor Mammedov 2023-05-17 15:18:03 UTC

*** Bug 2203821 has been marked as a duplicate of this bug. ***

Comment 29 Igor Mammedov 2023-05-23 12:28:21 UTC

Based on comment 25, closing bug as wontfix.

I've filled in 'Known issues' doc comment in BZ.

Babu,
 please fix it up if it isn't described well enough/wrong.

Comment 30 Yanhui Ma 2023-05-25 09:38:00 UTC

Hi Jiri,

It seems the bug needs documentation. But I am not sure about the process of how to add documentation.
Now the flag of requires_doc_text has been set ? and Doc Text field has been filled. What else should we do?
Could you please help check this?

Thanks in advance.

Comment 31 Babu Moger 2023-05-31 14:19:36 UTC

(In reply to Igor Mammedov from comment #29)
> Based on comment 25, closing bug as wontfix.
> 
> I've filled in 'Known issues' doc comment in BZ.
> 
> Babu,
>  please fix it up if it isn't described well enough/wrong.

Igot, Please update the Workaound.  Feel free to modify.

Workaround (if any):
Try creating topology to closely match the baremetal topology.
Or try using latest CPU model which supports required topology.

Comment 32 Igor Mammedov 2023-05-31 15:34:15 UTC

(In reply to Babu Moger from comment #31)
> (In reply to Igor Mammedov from comment #29)
> > Based on comment 25, closing bug as wontfix.
> > 
> > I've filled in 'Known issues' doc comment in BZ.
> > 
> > Babu,
> >  please fix it up if it isn't described well enough/wrong.
> 
> Igot, Please update the Workaound.  Feel free to modify.
> 
> Workaround (if any):
> Try creating topology to closely match the baremetal topology.
> Or try using latest CPU model which supports required topology.

If you look at comment 22, the topology looks sane to me
(then imagine end user confusion when reading workaround that suggests
sane some topology) so we need to define 'sane' or plainly admit that
'EPYC' model is not usable with numa config.

Comment 33 Babu Moger 2023-05-31 15:47:39 UTC

(In reply to Igor Mammedov from comment #32)
> (In reply to Babu Moger from comment #31)
> > (In reply to Igor Mammedov from comment #29)
> > > Based on comment 25, closing bug as wontfix.
> > > 
> > > I've filled in 'Known issues' doc comment in BZ.
> > > 
> > > Babu,
> > >  please fix it up if it isn't described well enough/wrong.
> > 
> > Igot, Please update the Workaound.  Feel free to modify.
> > 
> > Workaround (if any):
> > Try creating topology to closely match the baremetal topology.
> > Or try using latest CPU model which supports required topology.
> 
> If you look at comment 22, the topology looks sane to me
> (then imagine end user confusion when reading workaround that suggests
> sane some topology) so we need to define 'sane' or plainly admit that
> 'EPYC' model is not usable with numa config.

I would say EPYC model can still support numa config with some limitations.

Comment 34 Igor Mammedov 2023-06-01 12:11:35 UTC

(In reply to Babu Moger from comment #33)
> (In reply to Igor Mammedov from comment #32)
> > (In reply to Babu Moger from comment #31)
> > > (In reply to Igor Mammedov from comment #29)
> > > > Based on comment 25, closing bug as wontfix.
> > > > 
> > > > I've filled in 'Known issues' doc comment in BZ.
> > > > 
> > > > Babu,
> > > >  please fix it up if it isn't described well enough/wrong.
> > > 
> > > Igot, Please update the Workaound.  Feel free to modify.
> > > 
> > > Workaround (if any):
> > > Try creating topology to closely match the baremetal topology.
> > > Or try using latest CPU model which supports required topology.
> > 
> > If you look at comment 22, the topology looks sane to me
> > (then imagine end user confusion when reading workaround that suggests
> > sane some topology) so we need to define 'sane' or plainly admit that
> > 'EPYC' model is not usable with numa config.
> 
> I would say EPYC model can still support numa config with some limitations.

In that case, I'd suggest you to amend 'workaround' section with something
more concrete than 'sane topology'. (i.e asking enduser to set sane topo,
when even when vendor/folks who implemented cpu model can't do that is
just not right).

Note You need to log in before you can comment on or make changes to this bug.