Description of problem: numactl --physcpubind fails for cpus >= 64 Version-Release number of selected component (if applicable): numactl 0.9.x How reproducible: 100% Steps to Reproduce: 1.# numactl --physcpubind=64 /bin/true Actual results: sched_setaffinity: Invalid argument Expected results: No error and bind succeeds Additional info: sched_setaffinity(0, 8, { 0, 0 }) = -1 EINVAL (Invalid argument) The second argument is the size of the bitmask, and numactl has it set at 8 bytes which is only good enough for 64bits.
Created attachment 302434 [details] Fix cpumask() to return number of cpus
Adam, do you have a >= 64 cpu machine I can look at this on? I'm looking at the code, and I can see how intuatively your patch would make sense. But I think the variable is just poorly named. Its called ncpus, but its passed around consistently to mean the number of bytes in the cpumask (which is what sched_setaffinity requires). Looking at the error above, clearly the appropriate bit never got set in the cpumask (since its all zeros). My guess is that this is instead either an off by one error, in which we should be setting bit <cpu-1> in the array (given that the set_bit ops seem to assume setting of bit 0 is valid, or we aren't computing the cpu buffer size correctly when we first allocate it (we would need to add 1 to numcpus at the start of cpumask to make this work).
Created attachment 302483 [details] patch to properly check for bounding condition The more I look at it the more I'm convinced this is a bounding condition check. We're checking for arg > numcpus in cpumask, and thats wrong. We should be checking for >=, since in a 64 cpu system, cpu64 doesn exist, only cpu0-cpu63. Please try this patch and see if it fixes the problem. Thanks!
Neil, From customer : Hi, Im not sure I agree with the analysis. Looking at where cpumask() is used: case 'C': /* --physcpubind */ { int ncpus; unsigned long *cpubuf; cpubuf = cpumask(optarg, &ncpus); errno = 0; check_cpubind(do_shm); did_cpubind = 1; numa_sched_setaffinity(0, CPU_BYTES(ncpus), cpubuf); Notice how we do CPU_BYTES(ncpus) a few lines down. If ncpus was meant to be the number of bytes, we shouldn't have used CPU_BYTES() on it. -- still trying to get a hold of the right hardware for further testing. Thanks Internal Status set to 'Waiting on Engineering' This event sent from IssueTracker by astokes issue 174584
Created attachment 302581 [details] updated patch oops, thank you. CPU_BYTES is going to round up the value of ncpus to the next higest word size, and since we're passing a buffer length here thats harmless (the kernel will correct our mistake when it sees our buffer is too large for the number of cpus we pass. Regardless though, you're right, its not correct to have it there, since it is meant to translate the number of cpus we have into the the size of the cpumask buffer we are passing. The bottom line is, this is the only call site for cpumask, and the only reason cpumask takes that second parameter is so that we can know how big cpubuf is, and pass that information on to sched_setaffinity (which takes a number of bytes, not a number of CPU's). This new patch takes that into account and makes the fact that we're passing a buffer size, not a cpu count, clear. It also maintains my previous change, since on a 64 CPU system --physbind 64 should fail indicating the cpu number is out of range (only cpus0-cpu63 exist).
----- Additional Comments From antonb.com (prefers email at anton.com) 2008-04-21 15:36 EDT ------- Thanks, the patch works: LD_LIBRARY_PATH=. strace ./numactl --physcpubind=64 /bin/true 2>&1 | grep affinity sched_setaffinity(0, 32, { 0, 0, 1, 0, 0, 0, 0, 0 }) = 0 This event sent from IssueTracker by jkachuck issue 174584
ok, thanks, I'll flag this for 5.3 inclusion
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
fixed in -3.el5
~~ Snapshot 3 is now available ~~ Snapshot 3 is now available for Partner Testing, which should contain a fix that resolves this bug. ISO's available as usual at ftp://partners.redhat.com. Your testing feedback is vital! Please let us know if you encounter any NEW issues (file a new bug) or if you have VERIFIED the fix is present and functioning as expected (add PartnerVerified Keyword). Ping your Partner Manager with any additional questions. Thanks!
~~ Attention ~~ Snapshot 4 is now available for testing @ partners.redhat.com ~~ Partners, it is vital that we get your testing feedback on this important bug fix / feature request. If you are unable to test, please clearly indicate this in a comment to this bug or directly with your partner manager. If we do not receive your test feedback, this bug is at risk from being dropped from the release. If you have VERIFIED the fix, please add PartnerVerified to the Bugzilla Keywords field, along with a description of the test results. If you encounter a new bug, CLONE this bug and request from your Partner manager to review. We are no longer excepting new bugs into the release, bar critical regressions.
Sorry for the spam. Moving to VERIFIED, as patch is present and equal to that which was review/verified by IBM in comment #5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0139.html