507033 – multi-socket Intel 5500/Nehalem systems /sys shows all cores on first node, none on 2nd (this is not correct)

Bug 507033 - multi-socket Intel 5500/Nehalem systems /sys shows all cores on first node, none on 2nd (this is not correct)

Summary: multi-socket Intel 5500/Nehalem systems /sys shows all cores on first node, n...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	11
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-06-19 22:22 UTC by erikj
Modified:	2010-06-28 13:10 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-06-28 13:10:23 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/proc/cpuinfo output (6.21 KB, text/plain) 2009-06-19 22:23 UTC, erikj	no flags	Details
dmesg output (57.38 KB, text/plain) 2009-06-19 22:24 UTC, erikj	no flags	Details
View All

Description erikj 2009-06-19 22:22:39 UTC

This issue is somewhat similar to BZ 506805.  This is with respect to
Fedora11.

The issue is that, on multi-socket 5500 series / Nehalem systems like 
XE270, all of the cores are showing up on the first node.  Zero cores 
are showing up on the 2nd node.

Problem system:
SGI XE270, Supermicro X8DTN v 1.1 mainboard
Memory: 8 GB total, made up of 2GB DDR3 1066 MHz DIMMs, part number 
18JSF25672PY-1G1D1
2 4-core sockets, hyperthreading turned off.
cpu info: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
(cpuinfo will be attached).

For example, for the first node:

[root@cct201 ~]# ls /sys/devices/system/node/node0
cpu0  cpu2  cpu4  cpu6  cpulist  distance  numastat
cpu1  cpu3  cpu5  cpu7  cpumap   meminfo   scan_unevictable_pages
[root@cct201 ~]# cat /sys/devices/system/node/node0/cpulist
0-7

For the 2nd node: 

[root@cct201 ~]# ls /sys/devices/system/node/node2
cpulist  cpumap  distance  meminfo  numastat  scan_unevictable_pages
[root@cct201 ~]# cat /sys/devices/system/node/node2/cpulist

[root@cct201 ~]# 

/proc/cpuinfo confirms there are indeed two cores and that hyperthreading is
off.  cpu cores is 4, siblings is 4, and the total core count is 8.

kernel version is: 2.6.29.4-167.fc11.x86_64

It is reported by Kannan Somangili, who also observed this problem on a
different multi-socket Nehalem system, that RHEL 5.3 and SLES 10 SP2 do 
not suffer from this problem.

dmesg is being attached.  Here are some interesting dmesg segments:

[root@cct201 ~]# dmesg|grep SRAT
ACPI: SRAT BF79A4C0, 0150 (r1 052809 OEMSRAT         1 INTL        1)
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 2 -> Node 0
SRAT: PXM 0 -> APIC 4 -> Node 0
SRAT: PXM 0 -> APIC 6 -> Node 0
SRAT: PXM 1 -> APIC 16 -> Node 1
SRAT: PXM 1 -> APIC 18 -> Node 1
SRAT: PXM 1 -> APIC 20 -> Node 1
SRAT: PXM 1 -> APIC 22 -> Node 1
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 100000-c0000000
SRAT: Node 0 PXM 0 100000000-140000000
SRAT: Node 2 PXM 257 140000000-240000000


Here is a segment with tracebacks:

NUMA: Allocated memnodemap from 18000 - 1c880
NUMA: Using 20 for the hash shift.
Bootmem setup node 0 0000000000000000-0000000140000000
  NODE_DATA [000000000001c880 - 000000000003187f]
  bootmap [0000000000032000 -  0000000000059fff] pages 28
(8 early reservations) ==> bootmem [0000000000 - 0140000000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #2 [0000200000 - 0000ac700c]    TEXT DATA BSS ==> [0000200000 - 0000ac700c]
  #3 [0037c97000 - 0037fefdc0]          RAMDISK ==> [0037c97000 - 0037fefdc0]
  #4 [000009a800 - 0000100000]    BIOS reserved ==> [000009a800 - 0000100000]
  #5 [0000010000 - 0000013000]          PGTABLE ==> [0000010000 - 0000013000]
  #6 [0000013000 - 0000018000]          PGTABLE ==> [0000013000 - 0000018000]
  #7 [0000018000 - 000001c880]       MEMNODEMAP ==> [0000018000 - 000001c880]
Bootmem setup node 2 0000000140000000-0000000240000000
  NODE_DATA [0000000140000000 - 0000000140014fff]
  bootmap [0000000140015000 -  0000000140034fff] pages 20
(8 early reservations) ==> bootmem [0140000000 - 0240000000]
  #0 [0000000000 - 0000001000]   BIOS data page
  #1 [0000006000 - 0000008000]       TRAMPOLINE
  #2 [0000200000 - 0000ac700c]    TEXT DATA BSS
  #3 [0037c97000 - 0037fefdc0]          RAMDISK
  #4 [000009a800 - 0000100000]    BIOS reserved
  #5 [0000010000 - 0000013000]          PGTABLE
  #6 [0000013000 - 0000018000]          PGTABLE
  #7 [0000018000 - 000001c880]       MEMNODEMAP
found SMP MP-table at [ffff8800000ff780] 000ff780
 [ffffe20000000000-ffffe200045fffff] PMD -> [ffff880028200000-ffff88002b9fffff] on node 0
 [ffffe20004600000-ffffe20007dfffff] PMD -> [ffff880140200000-ffff8801439fffff] on node 2
Zone PFN ranges:

Comment 1 erikj 2009-06-19 22:23:47 UTC

Created attachment 348725 [details]
/proc/cpuinfo output

Comment 2 erikj 2009-06-19 22:24:59 UTC

Created attachment 348726 [details]
dmesg output

Comment 3 erikj 2009-06-19 22:29:29 UTC

Feel silly for saying they were tracebacks.  I'm just trying to go home for the day :)

Comment 4 erikj 2009-06-20 02:14:52 UTC

confirmed present in 2.6.29.5 community kernel.

Comment 5 erikj 2009-06-20 03:17:06 UTC

Also still a problem in 2.6.30-git14

Comment 6 erikj 2009-06-22 14:06:21 UTC

I applied the community fix (see the LKML links in 506805) and confirmed 
this issue is fixed with that patch set.  I applied the patch set against 
2.6.30-git14.

Comment 7 Bug Zapper 2010-04-27 15:06:28 UTC

This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Bug Zapper 2010-06-28 13:10:23 UTC

Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.