442521 – numactl fails to bind to cpus >= 64

Bug 442521 - numactl fails to bind to cpus >= 64

Summary: numactl fails to bind to cpus >= 64

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	numactl
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	391501
TreeView+	depends on / blocked

Reported:	2008-04-15 10:37 UTC by Adam Stokes
Modified:	2018-10-19 18:15 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-01-20 21:05:50 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Fix cpumask() to return number of cpus (264 bytes, patch) 2008-04-15 10:37 UTC, Adam Stokes	no flags	Details \| Diff
patch to properly check for bounding condition (541 bytes, patch) 2008-04-15 15:56 UTC, Neil Horman	no flags	Details \| Diff
updated patch (1.85 KB, patch) 2008-04-16 11:30 UTC, Neil Horman	no flags	Details \| Diff
Show Obsolete (2) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:0139	0	normal	SHIPPED_LIVE	numactl bug fix update	2009-01-20 16:05:02 UTC

Description Adam Stokes 2008-04-15 10:37:32 UTC

Description of problem:
numactl --physcpubind fails for cpus >= 64

Version-Release number of selected component (if applicable):
numactl 0.9.x

How reproducible:
100%

Steps to Reproduce:
1.# numactl --physcpubind=64 /bin/true

  
Actual results:
sched_setaffinity: Invalid argument

Expected results:
No error and bind succeeds


Additional info:
sched_setaffinity(0, 8,  { 0, 0 })      = -1 EINVAL (Invalid argument)

The second argument is the size of the bitmask, and numactl has it set at 8
bytes which is only good enough for 64bits.

Comment 1 Adam Stokes 2008-04-15 10:37:32 UTC

Created attachment 302434 [details]
Fix cpumask() to return number of cpus

Comment 2 Neil Horman 2008-04-15 11:48:26 UTC

Adam, do you have a >= 64 cpu machine I can look at this on?  I'm looking at the
code, and I can see how intuatively your patch would make sense.  But I think
the variable is just  poorly named.  Its called ncpus, but its passed around
consistently to mean the number of bytes in the cpumask (which is what
sched_setaffinity requires).  Looking at the error above, clearly the
appropriate bit never got set in the cpumask (since its all zeros).  My guess is
that this is instead either an off by one error, in which we should be setting
bit <cpu-1> in the array (given that the set_bit ops seem to assume setting of
bit 0 is valid, or we aren't computing the cpu buffer size correctly  when we
first allocate it (we would need to add 1 to numcpus at the start of cpumask to
make this work).

Comment 3 Neil Horman 2008-04-15 15:56:52 UTC

Created attachment 302483 [details]
patch to properly check for bounding condition

The more I look at it the more I'm convinced this is a bounding condition
check.	We're checking for arg > numcpus in cpumask, and thats wrong.  We
should be checking for >=, since in a 64 cpu system, cpu64 doesn exist, only
cpu0-cpu63.  Please try this patch and see if it fixes the problem.  Thanks!

Comment 4 Issue Tracker 2008-04-16 10:52:04 UTC

Neil,

From customer :

Hi, Im not sure I agree with the analysis. Looking at where cpumask() is
used:

               case 'C': /* --physcpubind */
               {
                       int ncpus;
                       unsigned long *cpubuf;
                       cpubuf = cpumask(optarg, &ncpus);
                       errno = 0;
                       check_cpubind(do_shm);
                       did_cpubind = 1;
                       numa_sched_setaffinity(0, CPU_BYTES(ncpus),
cpubuf);

Notice how we do CPU_BYTES(ncpus) a few lines down. If ncpus was meant to
be the
number of bytes, we shouldn't have used CPU_BYTES() on it.

--

still trying to get a hold of the right hardware for further testing.

Thanks

Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by astokes 
 issue 174584

Comment 5 Neil Horman 2008-04-16 11:30:38 UTC

Created attachment 302581 [details]
updated patch

oops, thank you. CPU_BYTES is going to round up the value of ncpus to the next
higest word size, and since we're passing a buffer length here thats harmless
(the kernel will correct our mistake when it sees our buffer is too large for
the number of cpus we pass.   Regardless though, you're right, its not correct
to have it there, since it is meant to translate the number of cpus we have
into the the size of the cpumask buffer we are passing.  

The bottom line is, this is the only call site for cpumask, and the only reason
cpumask takes that second parameter is so that we can know how big cpubuf is,
and pass that information on to sched_setaffinity (which takes a number of
bytes, not a number of CPU's).	This new patch takes that into account and
makes the fact that we're passing a buffer size, not a cpu count, clear.  It
also maintains my previous change, since on a 64 CPU system --physbind 64
should fail indicating the cpu number is out of range (only cpus0-cpu63 exist).

Comment 6 Issue Tracker 2008-04-21 19:44:33 UTC

----- Additional Comments From antonb.com (prefers email at
anton.com)  2008-04-21 15:36 EDT -------
Thanks, the patch works:

LD_LIBRARY_PATH=. strace ./numactl --physcpubind=64 /bin/true 2>&1 | grep
affinity

sched_setaffinity(0, 32,  { 0, 0, 1, 0, 0, 0, 0, 0 }) = 0 


This event sent from IssueTracker by jkachuck 
 issue 174584

Comment 7 Neil Horman 2008-04-21 19:53:13 UTC

ok, thanks, I'll flag this for 5.3 inclusion

Comment 9 RHEL Program Management 2008-06-02 20:07:03 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Neil Horman 2008-06-03 17:48:11 UTC

fixed in -3.el5

Comment 14 Chris Ward 2008-11-18 18:11:39 UTC

~~ Snapshot 3 is now available ~~ 

Snapshot 3 is now available for Partner Testing, which should contain a fix that resolves this bug. ISO's available as usual at ftp://partners.redhat.com. Your testing feedback is vital! Please let us know if you encounter any NEW issues (file a new bug) or if you have VERIFIED the fix is present and functioning as expected (add PartnerVerified Keyword).

Ping your Partner Manager with any additional questions. Thanks!

Comment 15 Chris Ward 2008-11-28 06:43:40 UTC

~~ Attention ~~ Snapshot 4 is now available for testing @ partners.redhat.com ~~

Partners, it is vital that we get your testing feedback on this important bug fix / feature request. If you are unable to test, please clearly indicate this in a comment to this bug or directly with your partner manager. If we do not receive your test feedback, this bug is at risk from being dropped from the release.

If you have VERIFIED the fix, please add PartnerVerified to the Bugzilla Keywords field, along with a description of the test results. 

If you encounter a new bug, CLONE this bug and request from your Partner manager to review. We are no longer excepting new bugs into the release, bar critical regressions.

Comment 16 Chris Ward 2008-12-04 12:48:52 UTC

Sorry for the spam. Moving to VERIFIED, as patch is present and equal to that which was review/verified by IBM in comment #5.

Comment 18 errata-xmlrpc 2009-01-20 21:05:50 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0139.html

Note You need to log in before you can comment on or make changes to this bug.