RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2088311 - Guest reports "CPU #5's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]"
Summary: Guest reports "CPU #5's llc-sibling CPU #0 is not on the same node! [node: 1 ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 9.2
Assignee: Igor Mammedov
QA Contact: Mario Casquero
Parth Shah
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-19 07:46 UTC by Lukáš Doktor
Modified: 2023-11-20 04:34 UTC (History)
13 users (show)

Fixed In Version: qemu-kvm-7.2.0-10
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-19 07:28:44 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Libvirt xml file of a machine that emits this warning (3.87 KB, text/plain)
2022-05-19 07:46 UTC, Lukáš Doktor
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-122588 0 None None None 2022-05-19 12:47:19 UTC

Description Lukáš Doktor 2022-05-19 07:46:11 UTC
Created attachment 1881213 [details]
Libvirt xml file of a machine that emits this warning

Description of problem:
I'm trying to map a numa topology to guest but the guest kernel aways complains about cpus on the other node not being on the same node. Similar configuration worked well in RHEL8.

Version-Release number of selected component (if applicable):
* Host & guest use the same OS version
* RHEL-9.0.0-20211026.10
* qemu-kvm-core-6.0.0-13.el9_b.5.x86_64 as well as various upstream qemus (6621441db50d5bae7e34dbd04bf3c57a27a71b32)
* libvirt-7.6.0-2.el9.x86_64

How reproducible:
* Always, tried various setups

Steps to Reproduce:
1. Get a bootable RHEL image
2. Adjust the provided xml file to match your hardware, but keep at least 2 numa nodes
3. Boot the system and check the guest serial console

Actual results:
[    0.141952] ------------[ cut here ]------------
[    0.141952] sched: CPU #5's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.141952] WARNING: CPU: 5 PID: 0 at arch/x86/kernel/smpboot.c:421 topology_sane.isra.0+0x67/0x80
[    0.141952] Modules linked in:
[    0.141952] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-1.7.1.el9.x86_64 #1
[    0.141952] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014
[    0.141952] RIP: 0010:topology_sane.isra.0+0x67/0x80
[    0.141952] Code: 80 3d 3d 9c f2 01 00 75 f6 48 83 ec 08 4c 89 da 44 89 d6 48 c7 c7 30 b7 b1 b9 88 44 24 07 c6 05 1f 9c f2 01 01 e8 bc 64 99 00 <0f> 0b 0f b6 44 24 07 48 83 c4 08 c3 66 66 2e 0f 1f 84 00 00 00 00
[    0.141952] RSP: 0000:ffffa161431cfed0 EFLAGS: 00010086
[    0.141952] RAX: 0000000000000000 RBX: ffff8acdb5a11460 RCX: ffffffffba527a08
[    0.141952] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: ffffffffba267a00
[    0.141952] RBP: 0000000000000005 R08: 0000000000000000 R09: ffffa161431cfd08
[    0.141952] R10: ffffa161431cfd00 R11: ffffffffba5e7a48 R12: 0000000000000000
[    0.141952] R13: ffff8acb35811460 R14: 0000000000011010 R15: 0000000000000000
[    0.141952] FS:  0000000000000000(0000) GS:ffff8acdb5a00000(0000) knlGS:0000000000000000
[    0.141952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.141952] CR2: 0000000000000000 CR3: 0000000315010001 CR4: 0000000000770ee0
[    0.141952] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.141952] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.141952] PKRU: 55555554
[    0.141952] Call Trace:
[    0.141952]  set_cpu_sibling_map+0x176/0x590
[    0.141952]  start_secondary+0x5b/0x150
[    0.141952]  secondary_startup_64_no_verify+0xc2/0xcb
[    0.141952] ---[ end trace 83f8500fc1ba2966 ]---

Expected results:
Should boot with no issues

Additional info:

Comment 1 Lukáš Doktor 2022-05-19 07:48:18 UTC
Including reply from Igor from preliminary check:

I can reproduce it with following QEMU CLI:
 /usr/libexec/qemu-kvm \
        -machine pc-q35-rhel8.5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off                                           \
        -cpu host,migratable=on,kvm-hint-dedicated=on,kvm-poll-control=on,host-cache-info=on,l3-cache=off                    \
        -m 4G    \
        -smp 10,sockets=2,dies=1,cores=5,threads=1
        -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,memdev=ram-node0  \
        -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,memdev=ram-node1  \
        -numa cpu,node-id=0,socket-id=0 \
        -numa cpu,node-id=1,socket-id=1 \
        path_to_rhel9_disk_image

but SRAT table in guest contains only 8 vcpus, and warning we get complains about
a cpu that's not described in SRAT.

[    0.011045] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[    0.011047] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[    0.011048] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[    0.011049] SRAT: PXM 0 -> APIC 0x03 -> Node 0
[    0.011050] SRAT: PXM 0 -> APIC 0x04 -> Node 0
[    0.011051] SRAT: PXM 1 -> APIC 0x08 -> Node 1
[    0.011052] SRAT: PXM 1 -> APIC 0x09 -> Node 1
[    0.011052] SRAT: PXM 1 -> APIC 0x0a -> Node 1
[    0.011053] SRAT: PXM 1 -> APIC 0x0b -> Node 1
[    0.011054] SRAT: PXM 1 -> APIC 0x0c -> Node 1

Comment 3 Igor Mammedov 2022-05-25 08:41:27 UTC
(In reply to Lukáš Doktor from comment #1)
> Including reply from Igor from preliminary check:
> 
> I can reproduce it with following QEMU CLI:
>  /usr/libexec/qemu-kvm \
>         -machine
> pc-q35-rhel8.5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off           
> \
>         -cpu
> host,migratable=on,kvm-hint-dedicated=on,kvm-poll-control=on,host-cache-
> info=on,l3-cache=off                    \
>         -m 4G    \
>         -smp 10,sockets=2,dies=1,cores=5,threads=1
>         -object memory-backend-ram,id=ram-node0,size=2G -numa
> node,nodeid=0,memdev=ram-node0  \
>         -object memory-backend-ram,id=ram-node1,size=2G -numa
> node,nodeid=1,memdev=ram-node1  \
>         -numa cpu,node-id=0,socket-id=0 \
>         -numa cpu,node-id=1,socket-id=1 \
>         path_to_rhel9_disk_image
> 
> but SRAT table in guest contains only 8 vcpus, and warning we get complains
> about
> a cpu that's not described in SRAT.
> 
> [    0.011045] SRAT: PXM 0 -> APIC 0x00 -> Node 0
> [    0.011047] SRAT: PXM 0 -> APIC 0x01 -> Node 0
> [    0.011048] SRAT: PXM 0 -> APIC 0x02 -> Node 0
> [    0.011049] SRAT: PXM 0 -> APIC 0x03 -> Node 0
> [    0.011050] SRAT: PXM 0 -> APIC 0x04 -> Node 0
> [    0.011051] SRAT: PXM 1 -> APIC 0x08 -> Node 1
> [    0.011052] SRAT: PXM 1 -> APIC 0x09 -> Node 1
> [    0.011052] SRAT: PXM 1 -> APIC 0x0a -> Node 1
> [    0.011053] SRAT: PXM 1 -> APIC 0x0b -> Node 1
> [    0.011054] SRAT: PXM 1 -> APIC 0x0c -> Node 1

so above turned out to be red herring (SRAT table is correct),

what really happening is that host's L3 cache info passed-through
as is which confuses guest. Possible fix is on the way:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg890062.html

Comment 13 Igor Mammedov 2022-11-02 11:22:52 UTC
upstream commits:
d7caf13b5f x86: cpu: fixup number of addressable IDs for logical processors sharing cache
efb3934adf x86: cpu: make sure number of addressable IDs for processor cores meets the spec

Comment 16 Igor Mammedov 2023-02-23 09:38:16 UTC
we have inherited fix with rebase to 7.2

Comment 24 Mario Casquero 2023-03-02 13:22:29 UTC
Moving to VERIFIED based on comment 20 and comment 22

Comment 26 RHEL Program Management 2023-11-19 07:28:44 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.