Description of problem: numactl --show reports wrong CPU binding Version-Release number of selected component (if applicable): 0.6.4-1.25 How reproducible: Problem is completely reproduceable on affected hardware. Steps to Reproduce: 1. Run 'numactl --show" on a system with more CPUs than numa nodes. This example is actual vs expected output for a 16 CPU rx8620 with 16 CPUs in 4 cells. Actual results: nodebind: 0 1 2 3 Expected results: nodebind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Additional info: The "numactl.c:show()" function is getting a list of CPUs from a call to numa_sched_getaffinity(). It then passes that to util.c:printmask(). But printmask() will only print up to numa_max_node() entries. On a system with more CPUs than nodes it won't show the binding to any of the CPUs above the maximum node number. This problem and several other problems are fixed in the more current versions of numactl available from ftp://ftp.suse.com/pub/people/ak/numa/ . The most complete fix would be to incorporate a newer version of numactl.
This is working as designed. The nodemask output is meant to show which numa nodes a process is limited to allocating memory from. Unless you have modified numactl to only bind to certain nodes, running numactl --show on a system with 4 numa nodes should show a nodebind output of: nodebind: 0 1 2 3 not 0 through 16 as you indicate. There is a cpubind output available in the latest version of numactl. That will be available in FC6 & RHEL5
I chose my example poorly. The current "numactl --show" output for "nodebind:" is not showing what nodes a process is bound to. It is showing a truncated list of the CPUs that a process is bound to. Here are three more interesting examples using a rx8620 with 16 CPUs in 4 cells. It has nodes 0 to 3 with 4 cpus and cell local memory, plus node 4 with no CPUs and the system's interleaved memory. Running numactl --cpubind 0 numactl --show binds to CPUs 0,1,2,3 in node 0 and produces nodebind: 0 1 2 3 reporting cpus instead of nodes. Running numactl --cpubind 1 numactl --show binds to CPUs 4,5,6,7 in node 1 and produces nodebind: 4 reporting cpu 4 instead of node 1. Node 4 actually has no CPU. Running numactl --cpubind 2,3 numactl --show binds to CPUs 8,9,10,11,12,13,14,15 in nodes 2 and 3 and produces nodebind: reporting the process is bound to no nodes. The numactl.c:show() function is calling numa_sched_getaffinity() and passing the CPU mask from that to util.cprintmask(). But printmask() will only print up to numa_max_node() entries. The 0.9.8 version of numactl has changed the show function to call libnuma.c:numa_get_run_node_mask() to really get a mask of nodes to use for the node mask passed into printmask(). The 0.6.4-1.25 version of numa_get_run_node_mask has its own problems. numa_get_run_node_mask is looping through comparing NUMA_NUM_NODES/BITS_PER_LONG array elements. But only CPU_WORDS(ncpus) elements of the arrays have real data. Beyond that can run past the end of the nodecpus and cpus arrays. The 0.9.8 version of numa_get_run_node_mask loops over just the number of CPUs. for (k = 0; k < CPU_LONGS(ncpus); k++) { There is also a problem in the 0.6.4-1.25 version of number_of_cpus(). If it can open /proc/cpuinfo then number_of_cpus returns the highest processor number read from that file. In that case it should return one higher than the highest processor number because the cpus are numbered from 0 to N-1. The 0.9.8 version uses return maxcpus + 1; The code for the case when /proc/cpuinfo is unreadable is still badly broken in version 0.9.8. It is changed to use maxcpus = i*sizeof(long)+k which is better, but it still loops over for (k = 0; k< 8; k++) bits when it should be looping over for (k = 0; k< sizeof(long); k++) I don't know if that code is ever used.