193803 – numactl --show reports wrong CPU binding

Bug 193803 - numactl --show reports wrong CPU binding

Summary: numactl --show reports wrong CPU binding

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	numactl
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Neil Horman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-06-01 18:25 UTC by Mike Stroyan
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-06-14 11:16:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Mike Stroyan 2006-06-01 18:25:32 UTC

Description of problem:  numactl --show reports wrong CPU binding


Version-Release number of selected component (if applicable): 0.6.4-1.25


How reproducible:

  Problem is completely reproduceable on affected hardware.

Steps to Reproduce:
1. Run 'numactl --show" on a system with more CPUs than numa nodes.
   This example is actual vs expected output for a 16 CPU rx8620
   with 16 CPUs in 4 cells.
  
Actual results:

nodebind: 0 1 2 3


Expected results:

nodebind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Additional info:
The "numactl.c:show()" function is getting a list of CPUs from a call to
numa_sched_getaffinity().  It then passes that to util.c:printmask().
But printmask() will only print up to numa_max_node() entries.
On a system with more CPUs than nodes it won't show the binding to any
of the CPUs above the maximum node number.

This problem and several other problems are fixed in the more current
versions of numactl available from ftp://ftp.suse.com/pub/people/ak/numa/ .
The most complete fix would be to incorporate a newer version of numactl.

Comment 1 Neil Horman 2006-06-14 11:16:14 UTC

This is working as designed.  The nodemask output is meant to show which numa
nodes a process is limited to allocating memory from.  Unless you have modified
numactl to only bind to certain nodes, running numactl --show on a system with 4
numa nodes should show a nodebind output of:
nodebind: 0 1 2 3
not 0 through 16 as you indicate.

There is a cpubind output available in the latest version of numactl.  That will
be available in FC6 & RHEL5

Comment 2 Mike Stroyan 2006-06-14 16:01:03 UTC

I chose my example poorly.  The current "numactl --show" output for
"nodebind:" is not showing what nodes a process is bound to.  It is
showing a truncated list of the CPUs that a process is bound to.
Here are three more interesting examples using a rx8620 with 16 CPUs
in 4 cells.  It has nodes 0 to 3 with 4 cpus and cell local memory,
plus node 4 with no CPUs and the system's interleaved memory.

Running
 numactl --cpubind 0 numactl --show
binds to CPUs 0,1,2,3 in node 0 and produces
 nodebind: 0 1 2 3
reporting cpus instead of nodes.

Running
 numactl --cpubind 1 numactl --show
binds to CPUs 4,5,6,7 in node 1 and produces
 nodebind: 4
reporting cpu 4 instead of node 1.  Node 4 actually has no CPU.

Running
 numactl --cpubind 2,3 numactl --show
binds to CPUs 8,9,10,11,12,13,14,15 in nodes 2 and 3 and produces
 nodebind:
reporting the process is bound to no nodes.

The numactl.c:show() function is calling numa_sched_getaffinity() and
passing the CPU mask from that to util.cprintmask().  But printmask()
will only print up to numa_max_node() entries.  The 0.9.8 version of
numactl has changed the show function to call
libnuma.c:numa_get_run_node_mask() to really get a mask of nodes to use
for the node mask passed into printmask().

The 0.6.4-1.25 version of numa_get_run_node_mask has its own problems.
numa_get_run_node_mask is looping through comparing
NUMA_NUM_NODES/BITS_PER_LONG array elements.  But only
CPU_WORDS(ncpus) elements of the arrays have real data.  Beyond
that can run past the end of the nodecpus and cpus arrays.
The 0.9.8 version of numa_get_run_node_mask loops over
just the number of CPUs.
for (k = 0; k < CPU_LONGS(ncpus); k++) {

There is also a problem in the 0.6.4-1.25 version of number_of_cpus().
If it can open /proc/cpuinfo then number_of_cpus returns the highest
processor number read from that file.  In that case it should return
one higher than the highest processor number because the cpus are
numbered from 0 to N-1.
The 0.9.8 version uses
        return maxcpus + 1;
The code for the case when /proc/cpuinfo is unreadable is still
badly broken in version 0.9.8.  It is changed to use
maxcpus = i*sizeof(long)+k
which is better, but it still loops over
for (k = 0; k< 8; k++)
bits when it should be looping over
for (k = 0; k< sizeof(long); k++)
I don't know if that code is ever used.

Note You need to log in before you can comment on or make changes to this bug.