Bug 707113 - No error when setting with NUMA memory policy with obvious incorrect nodemask.
Summary: No error when setting with NUMA memory policy with obvious incorrect nodemask.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Larry Woodman
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 698825
TreeView+ depends on / blocked
 
Reported: 2011-05-24 06:30 UTC by Osier Yang
Modified: 2011-06-03 06:28 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-03 02:14:46 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Osier Yang 2011-05-24 06:30:13 UTC
Description of problem:
When trying to add NUMA tuning in libvirt layer, found there is no error throwed
when setting with NUMA memory policy with obvious incorrect nodemask. Originally thought it was libnuma's problem, but tried with syscall directly then, still no error. This is high request to fix, as NUMA tuning is a feature request of libvirt for RHEL6.2.

=== System NUMA info ===

$ numactl --show
policy: default
preferred node: current
physcpubind: 0 1 
cpubind: 0 
nodebind: 0 
membind: 0 

=== Testing program to reproduce the problem ===

#include <stdio.h>
#include <stdlib.h>
#include <numa.h>
#include <errno.h>

#define _GNU_SOURCE        /* or _BSD_SOURCE or _SVID_SOURCE */
#include <unistd.h>
#include <sys/syscall.h>   /* For SYS_xxx definitions */

#define NUMA_NODES 10

int
main (int argc, char **argv) {
	if (numa_available() < 0) {
		fprintf(stderr, "NUMA is unavailable on your host\n");
		exit(EXIT_FAILURE);
	}

	numa_exit_on_error = 1;
	numa_exit_on_warn = 1;

	struct bitmask *mask = NULL;
	int i = 0;
	int mode = 2;
	int ret = -1;

	mask = numa_bitmask_alloc(NUMA_NODES);

	for (i = 0; i < NUMA_NODES; i++) {
		if (i % 2 == 0) {
			numa_bitmask_setbit(mask, i);
		}
	}

	errno = 0;

#if 0
	numa_set_bind_policy(1);
	numa_set_membind(mask);
#endif
	ret = syscall(__NR_set_mempolicy, 2, mask->maskp, NUMA_NODES);
	numa_set_bind_policy(0);

	if (errno) {
		fprintf(stderr, "Failed to set NUMA memory policy: %s\n",
                        strerror(errno));
		exit(EXIT_FAILURE);
	}

	printf("No error throwed when setting NUMA memory policy. ret = %d\n", ret);

	exit(EXIT_SUCCESS);
}

Version-Release number of selected component (if applicable):
kernel-2.6.32-122.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Cong Wang 2011-05-24 09:41:02 UTC
NUMA(3) said:
     numa_get_mems_allowed() returns the mask of nodes from which the process is allowed to allocate memory in it's current cpuset context.  Any nodes
       that are not included in the returned bitmask will be ignored in any of the following libnuma memory policy calls.

So, I think this is not a bug, all other nodes are silently ignored by the kernel.

Comment 3 Osier Yang 2011-05-25 06:09:48 UTC
No, the test program doesn't invoke numa_get_mems_allowed, and both from the codes and manual of numa_set_membind, error is expected.

<quote>
       numa_set_membind() sets the memory allocation mask.  The  thread  will
       only allocate memory from the nodes set in nodemask.  Passing an empty
       nodemask or a nodemask that contains nodes other  than  those  in  the
       mask returned by numa_get_mems_allowed() will result in an error.
</quote>

Comment 4 Osier Yang 2011-05-29 13:01:33 UTC
More info:

If I set the numa nodemask like "000000010" on my laptop, which has only one NUMA node. There will be error throwed like:

<snip>
set_mempolicy: Invalid argument
</snip>

The fix we need is: if I specify nodemask like "010101010101" on my laptop, there should be similiar error throwed.

Comment 5 Cong Wang 2011-06-03 02:14:46 UTC
For your 000000010 case, the man page explicitly says this:

EINVAL mode is invalid. <...>  Or, none of the node IDs specified by nodemask are on-line and allowed by the process's current cpuset context, or  none  of  the
              specified nodes contain memory. 

But in your program, what you set are bit 0, bit 2, bit 4, bit 6, bit 8, which contains a valid bit for node0 which is online. Thus no errors.

And I just spoke with Andi Kleen at LinuxCon, he confirmed this is not a bug, because we want to let all F's (e.g. 0xffffffff) express all the nodes of the system. So, closing this as NOTABUG.

Comment 6 Dave Allan 2011-06-03 03:51:32 UTC
Fair enough; thanks for the detailed explanation.


Note You need to log in before you can comment on or make changes to this bug.