Bug 519431
Summary: | Single socket Nehalem-EP causes issues in /proc/cpuinfo | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Jon Thomas <jthomas> |
Component: | kernel | Assignee: | Luming Yu <luyu> |
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.8 | CC: | hui.xiao, james.brown, jane.lv, jvillalo, jwest, jwilleford, luyu, lwoodman, peterm, rdoty, tao |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-14 20:22:39 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 499416 |
Description
Jon Thomas
2009-08-26 15:28:19 UTC
Apparently there is a numa issue between the two kernels outlined in IT 339779. Numa in regular smp is unbalanced....which is causing bad performance of parallel workloads. alancha@caliph6:~> sudo dmihardware Cisco Systems Inc N20-B6620-1 alancha@caliph6:~> cat /etc/motd Cisco Linux 5.03-4 Kickstarted on: Sat Sep 5 05:03:44 PDT 2009. alancha@caliph6:~> uname -a Linux caliph6 2.6.9-89.0.10.ELsmp #1 SMP Fri Aug 21 17:14:28 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux alancha@caliph6:~> numastat node1 node0 numa_hit 89486 2498428 numa_miss 0 0 numa_foreign 0 0 interleave_hit 89486 86840 local_node 0 2498428 other_node 89486 0 ## - showing more than 1 node, that means numa is turned on... alancha@caliph6:~> ls /sys/devices/system/node/node[01] /sys/devices/system/node/node0: cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu7 cpumap meminfo numastat /sys/devices/system/node/node1: cpumap meminfo numastat ## - but... it's totally unbalanced, node0 has all the cpus... Now I'm rebooting with the rc2's kernel... ## alancha@caliph6:~> uname -a Linux caliph6 2.6.9-89.0.9.ELlargesmp #1 SMP Wed Aug 19 08:12:11 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux alancha@caliph6:~> numastat node1 node0 numa_hit 1036997 1472009 numa_miss 0 0 numa_foreign 0 0 interleave_hit 88865 86562 local_node 981327 1439606 other_node 55670 32403 ## - numa is on... alancha@caliph6:~> ls /sys/devices/system/node/node[01] /sys/devices/system/node/node0: cpu0 cpu2 cpu4 cpu6 cpumap meminfo numastat /sys/devices/system/node/node1: cpu1 cpu3 cpu5 cpu7 cpumap meminfo numastat ## - and it's perfect balance! So in reality this is an Anaconda bug, since the wrong kernel is getting installed. I think it should be installing the SMP kernel and not the largeSMP kernel. There is a bug in the kernel too, but if the correct kernel is used you wouldn't see the issue, I think. Please open necessary private comments to me (luyu) to enable me to follow up this bug. Thanks, Luming > The "smp_num_siblings" holds the maximum number of logical processors
> taken by x86 CPUID instruction. In the Nehalem-EP case, smp_num_siblings is
> 16, not 8.
> On the other hand, the "NR_CPUS" equals to 8 in case of RHEL4.8 x86_64 smp
> kernel.
> Therefore, " if (smp_num_siblings > NR_CPUS) " holds true and the subsequent
> initializations are skipped.
> As a result, cpu_core_id, phys_proc_id and relevant variables remain
> uninitialized.
There are several options:
1. bump up NR_CPUS for smp-kernel to 16, but will still fail for future processors coming with more and more cores.
2. use largeSMP kernel as John said to use full 16 logical processors.
3. Turn off HT in BIOS to get 8 cores.
if you don't like the three options listed above, and really need to have correct /proc/cpuinfo displayed and leave half logical processor resource not used, then please let me know. I will come up a proper fix for this case.
(In reply to comment #19) > So in reality this is an Anaconda bug, since the wrong kernel is getting > installed. > > I think it should be installing the SMP kernel and not the largeSMP kernel. > > There is a bug in the kernel too, but if the correct kernel is used you > wouldn't see the issue, I think. I got that backwards. I think Anaconda should be installing the "largesmp" kernel, but it installs the "smp" kernel instead. |