Bug 193803 - numactl --show reports wrong CPU binding
numactl --show reports wrong CPU binding
Red Hat Enterprise Linux 4
Assigned To: Neil Horman
Reported: 2006-06-01
Modified: 2007-11-30
Last Closed: 2006-06-14
Description Mike Stroyan 2006-06-01 14:25:32 EDT
Description of problem:  numactl --show reports wrong CPU binding

Version-Release number of selected component (if applicable): 0.6.4-1.25

How reproducible:

  Problem is completely reproduceable on affected hardware.

Steps to Reproduce:
1. Run 'numactl --show" on a system with more CPUs than numa nodes.
   This example is actual vs expected output for a 16 CPU rx8620
   with 16 CPUs in 4 cells.
Actual results:

nodebind: 0 1 2 3

Expected results:

nodebind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Additional info:
The "numactl.c:show()" function is getting a list of CPUs from a call to
numa_sched_getaffinity().  It then passes that to util.c:printmask().
But printmask() will only print up to numa_max_node() entries.
On a system with more CPUs than nodes it won't show the binding to any
of the CPUs above the maximum node number.

This problem and several other problems are fixed in the more current
versions of numactl available from ftp://ftp.suse.com/pub/people/ak/numa/ .
The most complete fix would be to incorporate a newer version of numactl.
Comment 1 Neil Horman 2006-06-14 07:16:14 EDT
This is working as designed.  The nodemask output is meant to show which numa
nodes a process is limited to allocating memory from.  Unless you have modified
numactl to only bind to certain nodes, running numactl --show on a system with 4
numa nodes should show a nodebind output of:
nodebind: 0 1 2 3
not 0 through 16 as you indicate.

There is a cpubind output available in the latest version of numactl.  That will
be available in FC6 & RHEL5
Comment 2 Mike Stroyan 2006-06-14 12:01:03 EDT
I chose my example poorly.  The current "numactl --show" output for
"nodebind:" is not showing what nodes a process is bound to.  It is
showing a truncated list of the CPUs that a process is bound to.
Here are three more interesting examples using a rx8620 with 16 CPUs
in 4 cells.  It has nodes 0 to 3 with 4 cpus and cell local memory,
plus node 4 with no CPUs and the system's interleaved memory.

 numactl --cpubind 0 numactl --show
binds to CPUs 0,1,2,3 in node 0 and produces
 nodebind: 0 1 2 3
reporting cpus instead of nodes.

 numactl --cpubind 1 numactl --show
binds to CPUs 4,5,6,7 in node 1 and produces
 nodebind: 4
reporting cpu 4 instead of node 1.  Node 4 actually has no CPU.

 numactl --cpubind 2,3 numactl --show
binds to CPUs 8,9,10,11,12,13,14,15 in nodes 2 and 3 and produces
reporting the process is bound to no nodes.

The numactl.c:show() function is calling numa_sched_getaffinity() and
passing the CPU mask from that to util.cprintmask().  But printmask()
will only print up to numa_max_node() entries.  The 0.9.8 version of
numactl has changed the show function to call
libnuma.c:numa_get_run_node_mask() to really get a mask of nodes to use
for the node mask passed into printmask().

The 0.6.4-1.25 version of numa_get_run_node_mask has its own problems.
numa_get_run_node_mask is looping through comparing
NUMA_NUM_NODES/BITS_PER_LONG array elements.  But only
CPU_WORDS(ncpus) elements of the arrays have real data.  Beyond
that can run past the end of the nodecpus and cpus arrays.
The 0.9.8 version of numa_get_run_node_mask loops over
just the number of CPUs.
for (k = 0; k < CPU_LONGS(ncpus); k++) {

There is also a problem in the 0.6.4-1.25 version of number_of_cpus().
If it can open /proc/cpuinfo then number_of_cpus returns the highest
processor number read from that file.  In that case it should return
one higher than the highest processor number because the cpus are
numbered from 0 to N-1.
The 0.9.8 version uses
        return maxcpus + 1;
The code for the case when /proc/cpuinfo is unreadable is still
badly broken in version 0.9.8.  It is changed to use
maxcpus = i*sizeof(long)+k
which is better, but it still loops over
for (k = 0; k< 8; k++)
bits when it should be looping over
for (k = 0; k< sizeof(long); k++)
I don't know if that code is ever used.

