RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1095203 - rhel guest will call trace if specify NUMA nodes in qemu-kvm command line
Summary: rhel guest will call trace if specify NUMA nodes in qemu-kvm command line
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Eduardo Habkost
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1100103 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-07 09:36 UTC by Sibiao Luo
Modified: 2014-05-22 13:48 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-05-07 17:47:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
my-qemu-kvm-command-line.txt (4.74 KB, text/plain)
2014-05-07 09:39 UTC, Sibiao Luo
no flags Details
guest-dmesg-with-NUMA-nodes.txt (61.13 KB, text/plain)
2014-05-07 09:41 UTC, Sibiao Luo
no flags Details
guest-dmesg-without-NUMA-nodes.txt (57.80 KB, text/plain)
2014-05-07 09:43 UTC, Sibiao Luo
no flags Details

Description Sibiao Luo 2014-05-07 09:36:26 UTC
Description of problem:
rhel guest will call trace if boot a KVM guest with a huge qemu-kvm command line and specify NUMA nodes, both on my SandyBridge and Opteron_G4 hosts can hit it. If remove the NUMA nodes from cmdline which did not hit such issue.

Version-Release number of selected component (if applicable):
host info:
# uname -r && rpm -q qemu-kvm
3.10.0-123.el7.x86_64
qemu-kvm-1.5.3-60.el7.x86_64
guest info:
3.10.0-123.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot a KVM guest with a huge qemu-kvm command line and specify NUMA node.
e.g:...-smp 4,sockets=2,cores=2,threads=1,maxcpus=160 -numa node,cpus=0 -numa node,cpus=1 -numa node,cpus=2 -numa node,cpus=3
2.check the guest dmesg.
# dmesg

Actual results:
after step 2, guest will call trace. I will attach all the cmdline and dmesg log later.
...
[    0.073513] smpboot: CPU0: AMD Opteron 62xx class CPU (fam: 15, model: 01, stepping: 02)
[    0.074000] Performance Events: Broken PMU hardware detected, using software events only.
[    0.074007] Failed to access perfctr msr (MSR c0010001 is ffffffffffffffff)
[    0.086663] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.086617] kvm-clock: cpu 1, msr 1:3ff87041, secondary cpu clock
[    0.086617] ------------[ cut here ]------------
[    0.086617] WARNING: at arch/x86/kernel/smpboot.c:326 topology_sane.isra.1+0x6f/0x80()
[    0.086617] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.086846] smpboot: Booting Node   1, Processors  #1
[    0.086617] Modules linked in:

[    0.086617] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-123.el7.x86_64 #1
[    0.086617] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[    0.086617]  ffff88003db59e38 f0eab5104b2364a6 ffff88003db59df0 ffffffff815e19ba
[    0.086617]  ffff88003db59e28 ffffffff8105dee1 0000000000000001 0000000000013c80
[    0.086617]  0000000000000001 0000000000000000 0000000000000000 ffff88003db59e90
[    0.086617] Call Trace:
[    0.086617]  [<ffffffff815e19ba>] dump_stack+0x19/0x1b
[    0.086617]  [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
[    0.086617]  [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
[    0.086617]  [<ffffffff8102ee8d>] ? __mcheck_cpu_init_timer+0x4d/0x60
[    0.086617]  [<ffffffff815cf731>] topology_sane.isra.1+0x6f/0x80
[    0.086617]  [<ffffffff815cfaa6>] set_cpu_sibling_map+0x32a/0x500
[    0.086617]  [<ffffffff815cfe17>] start_secondary+0x19b/0x27b
[    0.086617] ---[ end trace e95be5cf08152738 ]---
[    0.099291] KVM setup async PF for cpu 1
[    0.099291] kvm-stealtime: cpu 1, msr 7e80dfc0
[    0.099485]  OK
...

Expected results:
It should no any guest call trace.

Additional info:

Comment 1 Sibiao Luo 2014-05-07 09:39:47 UTC
Created attachment 893175 [details]
my-qemu-kvm-command-line.txt

Comment 2 Sibiao Luo 2014-05-07 09:41:40 UTC
Created attachment 893177 [details]
guest-dmesg-with-NUMA-nodes.txt

Comment 3 Sibiao Luo 2014-05-07 09:43:15 UTC
(In reply to Sibiao Luo from comment #1)
> Created attachment 893175 [details]
> my-qemu-kvm-command-line.txt
If i remove '-numa node,cpus=0 -numa node,cpus=1 -numa node,cpus=2 -numa node,cpus=3' from the attachment 893175 [details] and test it which did not hit such issue, I will attach the guest dmesg log later.

Comment 4 Sibiao Luo 2014-05-07 09:43:51 UTC
Created attachment 893188 [details]
guest-dmesg-without-NUMA-nodes.txt

Comment 5 Sibiao Luo 2014-05-07 09:51:36 UTC
If use "-smp 4 -numa node,cpus=0 -numa node,cpus=1 -numa node,cpus=2 -numa node,cpus=3" to test which do not hit such issue.

Comment 6 Andrew Jones 2014-05-07 14:15:15 UTC
The kernel assumes that if the last level of cache (llc) is shared between two processors (i.e. the llc_id of both processors match), then they can't be on separate numa nodes. Since separate numa nodes means separate memory, then I think that's a pretty valid assumption.

The kernel also assumes that the socket_id can be used for the llc_id. This is a pretty good assumption too (usually). I did some googling, though, and found that Magny-Cours may be a counter example. I think this warning would output when booting a bare-metal Magny-Cours machine too.

But, this bug was opened using SandyBridge and Opteron_G4 models, which are most likely not weird, and we shouldn't make them so. With that in mind, I believe the test case is flawed. The number of numa nodes should be <= number of sockets, in order to maintain the sanity of the Linux kernel.

Comment 7 Eduardo Habkost 2014-05-07 17:47:57 UTC
QEMU currently allows threads from the same socket to be on different NUMA nodes. But doing it is a bad idea (as explained by Drew on comment #6) and is likely to confuse guests (and you just found out that it really does confuse Linux guests). Not a bug.

Future versions of QEMU may even disallow this because there are plans to add a node/socket/core/thread QOM object hierarchy.

Comment 8 Andrew Jones 2014-05-22 13:48:30 UTC
*** Bug 1100103 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.