From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051130 Fedora/1.5-1.jw Firefox/1.5 Description of problem: I'm responsible for a number of Opteron clusters at work. Some of the older ones have compute nodes with dual Opteron 248 processors, and either 16 or 8 GB of RAM, the newer clusters have dual Opteron 252 processors and either 16 or 8 GB of RAM. NUMA hash table lookups (and thus memory controller setup/assignment for numactl) work fine on both the 16 and 8 GB Opteron 248 nodes, as well as on the 8 GB Opteron 252 nodes, but fail on the 16 GB Opteron 252 nodes. At system startup, we see the following: <6>BIOS-provided physical RAM map: <4> BIOS-e820: 0000000000000000 - 000000000009a800 (usable) <4> BIOS-e820: 000000000009a800 - 00000000000a0000 (reserved) <4> BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved) <4> BIOS-e820: 0000000000100000 - 00000000fbf7c000 (usable) <4> BIOS-e820: 00000000fbf7c000 - 00000000fbf80000 (ACPI NVS) <4> BIOS-e820: 00000000fbf80000 - 00000000fc000000 (reserved) <4> BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved) <4> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) <4> BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) <4> BIOS-e820: 0000000100000000 - 0000000404000000 (usable) <6>Scanning NUMA topology in Northbridge 24 <6>Number of nodes 2 (10010) <6>Node 0 MemBase 0000000000000000 Limit 0000000203ffffff <6>Node 1 MemBase 0000000204000000 Limit 0000000403ffffff <6>node 1 shift 24 addr 204000000 conflict 0 <6>node 1 shift 25 addr 204000000 conflict 0 <6>node 1 shift 26 addr 3fc000000 conflict 0 <6>node 1 shift 27 addr 204000000 conflict 0 <6>node 1 shift 28 addr 204000000 conflict 0 <6>node 1 shift 29 addr 204000000 conflict 0 <6>node 1 shift 30 addr 204000000 conflict 0 <6>node 1 shift 31 addr 204000000 conflict 0 <6>node 1 shift 32 addr 204000000 conflict 0 <6>node 1 shift 33 addr 204000000 conflict 0 <6>node 1 shift 34 addr 204000000 conflict 0 <6>node 1 shift 35 addr 204000000 conflict 0 <6>node 1 shift 36 addr 204000000 conflict 0 <6>node 1 shift 37 addr 204000000 conflict 0 <6>node 1 shift 38 addr 204000000 conflict 0 <6>node 1 shift 39 addr 204000000 conflict 0 <6>node 1 shift 40 addr 204000000 conflict 0 <6>node 1 shift 41 addr 204000000 conflict 0 <6>node 1 shift 42 addr 204000000 conflict 0 <6>node 1 shift 43 addr 204000000 conflict 0 <6>node 1 shift 44 addr 204000000 conflict 0 <6>node 1 shift 45 addr 204000000 conflict 0 <6>node 1 shift 46 addr 204000000 conflict 0 <6>node 1 shift 47 addr 204000000 conflict 0 <3>No NUMA node hash function found. Contact maintainer <6>No NUMA configuration found <6>Faking a node at 0000000000000000-0000000404000000 <4>Bootmem setup node 0 0000000000000000-0000000404000000 <6>No mptable found. <4>On node 0 totalpages: 4210688 <4> DMA zone: 4096 pages, LIFO batch:1 <4> Normal zone: 4206592 pages, LIFO batch:31 <4> HighMem zone: 0 pages, LIFO batch:1 Version-Release number of selected component (if applicable): kernel-smp-2.6.9-22.0.1.EL How reproducible: Always Steps to Reproduce: 1. Set up a dual Opteron 252 system with 16GB of RAM 2. Install latest smp kernel 3. Boot it up, check out your logs and the output of 'numactl --hardware' Actual Results: # numactl --hardware available: 1 nodes (0-0) node 0 size: 16576 MB node 0 free: 15887 MB Expected Results: # numactl --hardware available: 2 nodes (0-1) node 0 size: 8383 MB node 0 free: 7876 MB node 1 size: 8191 MB node 1 free: 7945 MB Additional info: This problem presents itself with SUSE Linux Enterprise Server 9, all kernels prior to their SP3 release, but was fixed in their SP3 kernel. Haven't tested under 2.6.9-22.0.2.EL, but didn't see anything in the changelogs to indicate it had been addressed yet. There's approximately a 15% degredation in compute node performance when running without the proper memory controllers set up (likely due to processes on cpu0 getting assigned memory on cpu1 and vice versa, instead of attempting to stick to memory on the local cpu's memory controller).
I'll see if I can't isolate the patch SUSE added to their SP3 kernel and slap it on top of 2.6.9-22.0.2.EL later tonight or tomorrow.
Dead simple patch, if this is really all that's needed. Will get this applied later today and see if the issue is resolved... -------- From: ak Subject: Increase NUMA node hash size Suse-bugzilla: 106287 Patch-mainline: yes This is needed on some systems with AMD E stepping CPUs which have memory hoisting enabled. The memory map is not unform enough for the 256 entry hash table. Enlarge to 0xfff diff -u linux-2.6.5-hack/include/asm-x86_64/mmzone.h-o linux-2.6.5-hack/include/asm-x86_64/mmzone.h --- linux-2.6.5-hack/include/asm-x86_64/mmzone.h-o 2004-04-04 05:38:00.000000000 +0200 +++ linux-2.6.5-hack/include/asm-x86_64/mmzone.h 2005-09-30 13:46:17.000000000 +0200 @@ -13,7 +13,7 @@ #include <asm/smp.h> #define MAXNODE 8 -#define NODEMAPSIZE 0xff +#define NODEMAPSIZE 0xfff /* Simple perfect hash to map physical addresses to node numbers */ extern int memnode_shift;
Well, apparently, that is NOT all that is required to fix this. I've verified that the kernel I'm running now does have this patch implemented, but the problem still exists. Back to the drawing board...
The NUMA hash function was re-implemented in RHEL4 Update 2. Please upgrade to Update 2 or later and inform us if the problem persists.
The problem still exists with kernel-smp-2.6.9-22.0.2.EL, as well as with a kernel built from the same sources w/the extra hash size patch (the reimplemented numa hash function may explain why that patch didn't help). All released updates have been applied to this system. I have yet to try out a U3 beta kernel though.
If that is the case, then please provide a console log of an affected system running the most recent kernel you have. Printouts of the form: <6>node 1 shift 29 addr 204000000 conflict 0 were eliminated in the re-implementation of the NUMA hash function for U2. Any boot log that shows lines like this must be prior to U2.
I believe the initial console log was from am earlier kernel, but 'numactl --hardware' on 2.6.9-22.0.2.EL does still show only a single memory controller. I'll grab current console output a bit later this afternoon.
Here's the console output w/kernel-smp-2.6.9-22.0.2.EL: Scanning NUMA topology in Northbridge 24 Number of nodes 2 (10010) Node 0 using interleaving mode 1/0 No NUMA configuration found Faking a node at 0000000000000000-0000000420000000 Bootmem setup node 0 0000000000000000-0000000420000000 No mptable found. On node 0 totalpages: 4325376 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 4321280 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1
Does this problem persist with the most recent kernel?
Unfortunately, I don't have access to the hardware to test this on anymore... Lemme see if I can ping someone back at my former employer to take a look though.
User jparadis's account has been closed
No access to hardware and nobody else has reported a problem in over a year. Closing INSUFFICIENT_DATA.