Hide Forgot
Description of problem: When trying to add NUMA tuning in libvirt layer, found there is no error throwed when setting with NUMA memory policy with obvious incorrect nodemask. Originally thought it was libnuma's problem, but tried with syscall directly then, still no error. This is high request to fix, as NUMA tuning is a feature request of libvirt for RHEL6.2. === System NUMA info === $ numactl --show policy: default preferred node: current physcpubind: 0 1 cpubind: 0 nodebind: 0 membind: 0 === Testing program to reproduce the problem === #include <stdio.h> #include <stdlib.h> #include <numa.h> #include <errno.h> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ #include <unistd.h> #include <sys/syscall.h> /* For SYS_xxx definitions */ #define NUMA_NODES 10 int main (int argc, char **argv) { if (numa_available() < 0) { fprintf(stderr, "NUMA is unavailable on your host\n"); exit(EXIT_FAILURE); } numa_exit_on_error = 1; numa_exit_on_warn = 1; struct bitmask *mask = NULL; int i = 0; int mode = 2; int ret = -1; mask = numa_bitmask_alloc(NUMA_NODES); for (i = 0; i < NUMA_NODES; i++) { if (i % 2 == 0) { numa_bitmask_setbit(mask, i); } } errno = 0; #if 0 numa_set_bind_policy(1); numa_set_membind(mask); #endif ret = syscall(__NR_set_mempolicy, 2, mask->maskp, NUMA_NODES); numa_set_bind_policy(0); if (errno) { fprintf(stderr, "Failed to set NUMA memory policy: %s\n", strerror(errno)); exit(EXIT_FAILURE); } printf("No error throwed when setting NUMA memory policy. ret = %d\n", ret); exit(EXIT_SUCCESS); } Version-Release number of selected component (if applicable): kernel-2.6.32-122.el6.x86_64 How reproducible: always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
NUMA(3) said: numa_get_mems_allowed() returns the mask of nodes from which the process is allowed to allocate memory in it's current cpuset context. Any nodes that are not included in the returned bitmask will be ignored in any of the following libnuma memory policy calls. So, I think this is not a bug, all other nodes are silently ignored by the kernel.
No, the test program doesn't invoke numa_get_mems_allowed, and both from the codes and manual of numa_set_membind, error is expected. <quote> numa_set_membind() sets the memory allocation mask. The thread will only allocate memory from the nodes set in nodemask. Passing an empty nodemask or a nodemask that contains nodes other than those in the mask returned by numa_get_mems_allowed() will result in an error. </quote>
More info: If I set the numa nodemask like "000000010" on my laptop, which has only one NUMA node. There will be error throwed like: <snip> set_mempolicy: Invalid argument </snip> The fix we need is: if I specify nodemask like "010101010101" on my laptop, there should be similiar error throwed.
For your 000000010 case, the man page explicitly says this: EINVAL mode is invalid. <...> Or, none of the node IDs specified by nodemask are on-line and allowed by the process's current cpuset context, or none of the specified nodes contain memory. But in your program, what you set are bit 0, bit 2, bit 4, bit 6, bit 8, which contains a valid bit for node0 which is online. Thus no errors. And I just spoke with Andi Kleen at LinuxCon, he confirmed this is not a bug, because we want to let all F's (e.g. 0xffffffff) express all the nodes of the system. So, closing this as NOTABUG.
Fair enough; thanks for the detailed explanation.