Bug 707113

Summary: No error when setting with NUMA memory policy with obvious incorrect nodemask.
Product: Red Hat Enterprise Linux 6 Reporter: Osier Yang <jyang>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: amwang, dallan, lwoodman, veillard
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-03 02:14:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 698825    

Description Osier Yang 2011-05-24 06:30:13 UTC
Description of problem:
When trying to add NUMA tuning in libvirt layer, found there is no error throwed
when setting with NUMA memory policy with obvious incorrect nodemask. Originally thought it was libnuma's problem, but tried with syscall directly then, still no error. This is high request to fix, as NUMA tuning is a feature request of libvirt for RHEL6.2.

=== System NUMA info ===

$ numactl --show
policy: default
preferred node: current
physcpubind: 0 1 
cpubind: 0 
nodebind: 0 
membind: 0 

=== Testing program to reproduce the problem ===

#include <stdio.h>
#include <stdlib.h>
#include <numa.h>
#include <errno.h>

#define _GNU_SOURCE        /* or _BSD_SOURCE or _SVID_SOURCE */
#include <unistd.h>
#include <sys/syscall.h>   /* For SYS_xxx definitions */

#define NUMA_NODES 10

int
main (int argc, char **argv) {
	if (numa_available() < 0) {
		fprintf(stderr, "NUMA is unavailable on your host\n");
		exit(EXIT_FAILURE);
	}

	numa_exit_on_error = 1;
	numa_exit_on_warn = 1;

	struct bitmask *mask = NULL;
	int i = 0;
	int mode = 2;
	int ret = -1;

	mask = numa_bitmask_alloc(NUMA_NODES);

	for (i = 0; i < NUMA_NODES; i++) {
		if (i % 2 == 0) {
			numa_bitmask_setbit(mask, i);
		}
	}

	errno = 0;

#if 0
	numa_set_bind_policy(1);
	numa_set_membind(mask);
#endif
	ret = syscall(__NR_set_mempolicy, 2, mask->maskp, NUMA_NODES);
	numa_set_bind_policy(0);

	if (errno) {
		fprintf(stderr, "Failed to set NUMA memory policy: %s\n",
                        strerror(errno));
		exit(EXIT_FAILURE);
	}

	printf("No error throwed when setting NUMA memory policy. ret = %d\n", ret);

	exit(EXIT_SUCCESS);
}

Version-Release number of selected component (if applicable):
kernel-2.6.32-122.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Cong Wang 2011-05-24 09:41:02 UTC
NUMA(3) said:
     numa_get_mems_allowed() returns the mask of nodes from which the process is allowed to allocate memory in it's current cpuset context.  Any nodes
       that are not included in the returned bitmask will be ignored in any of the following libnuma memory policy calls.

So, I think this is not a bug, all other nodes are silently ignored by the kernel.

Comment 3 Osier Yang 2011-05-25 06:09:48 UTC
No, the test program doesn't invoke numa_get_mems_allowed, and both from the codes and manual of numa_set_membind, error is expected.

<quote>
       numa_set_membind() sets the memory allocation mask.  The  thread  will
       only allocate memory from the nodes set in nodemask.  Passing an empty
       nodemask or a nodemask that contains nodes other  than  those  in  the
       mask returned by numa_get_mems_allowed() will result in an error.
</quote>

Comment 4 Osier Yang 2011-05-29 13:01:33 UTC
More info:

If I set the numa nodemask like "000000010" on my laptop, which has only one NUMA node. There will be error throwed like:

<snip>
set_mempolicy: Invalid argument
</snip>

The fix we need is: if I specify nodemask like "010101010101" on my laptop, there should be similiar error throwed.

Comment 5 Cong Wang 2011-06-03 02:14:46 UTC
For your 000000010 case, the man page explicitly says this:

EINVAL mode is invalid. <...>  Or, none of the node IDs specified by nodemask are on-line and allowed by the process's current cpuset context, or  none  of  the
              specified nodes contain memory. 

But in your program, what you set are bit 0, bit 2, bit 4, bit 6, bit 8, which contains a valid bit for node0 which is online. Thus no errors.

And I just spoke with Andi Kleen at LinuxCon, he confirmed this is not a bug, because we want to let all F's (e.g. 0xffffffff) express all the nodes of the system. So, closing this as NOTABUG.

Comment 6 Dave Allan 2011-06-03 03:51:32 UTC
Fair enough; thanks for the detailed explanation.