Created attachment 1205274 [details]
Milind Changire has reported that GlusterFS fails to build under RHEL5 because it does not have linux/oom.h header.
This header is used purely to obtain OOM-related constants.
Also, this issue raises the question how OOM should be managed under old kernels. From man 5 proc we see this:
/proc/[pid]/oom_adj (since Linux 2.6.11)
This file can be used to adjust the score used to select which process should be killed in an out-of-memory (OOM) situation. The kernel uses this value
for a bit-shift operation of the process's oom_score value: valid values are in the range -16 to +15, plus the special value -17, which disables OOM-
killing altogether for this process. A positive score increases the likelihood of this process being killed by the OOM-killer; a negative score decreases
The default value for this file is 0; a new process inherits its parent's oom_adj setting. A process must be privileged (CAP_SYS_RESOURCE) to update this
Since Linux 2.6.36, use of this file is deprecated in favor of /proc/[pid]/oom_score_adj.
/proc/[pid]/oom_score_adj (since Linux 2.6.36)
This file can be used to adjust the badness heuristic used to select which process gets killed in out-of-memory conditions.
The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted.
The units are roughly a proportion along that range of allowed memory the process may allocate from, based on an estimation of its current memory and swap
use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be
There is an additional factor included in the badness score: root processes are given 3% extra memory over other tasks.
The amount of "allowed" memory depends on the context in which the OOM-killer was called. If it is due to the memory assigned to the allocating task's
cpuset being exhausted, the allowed memory represents the set of mems assigned to that cpuset (see cpuset(7)). If it is due to a mempolicy's node(s)
being exhausted, the allowed memory represents the set of mempolicy nodes. If it is due to a memory limit (or swap limit) being reached, the allowed mem‐
ory is that configured limit. Finally, if it is due to the entire system being out of memory, the allowed memory represents all allocatable resources.
The value of oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000
(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). This allows user space to control the preference for OOM-killing, ranging from always preferring a cer‐
tain task or completely disabling it from OOM killing. The lowest possible value, -1000, is equivalent to disabling OOM-killing entirely for that task,
since it will always report a badness score of 0.
Consequently, it is very simple for user space to define the amount of memory to consider for each task. Setting a oom_score_adj value of +500, for exam‐
ple, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least
50% more memory. A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task's allowed memory from being considered as
scoring against the task.
For backward compatibility with previous kernels, /proc/[pid]/oom_adj can still be used to tune the badness score. Its value is scaled linearly with
Writing to /proc/[pid]/oom_score_adj or /proc/[pid]/oom_adj will change the other with its scaled value.
In summary, for kernels older that 2.6.11 we must disable OOM-related code completely, for kernels from 2.6.11 to 2.6.35 incl we must use old interface (/proc/[pid]/oom_adj), and starting from 2.6.36 we must use /proc/[pid]/oom_score_adj.
It is not that simple obviously. For example, RHEL6 while having 2.6.32 kernel, provides /proc/[pid]/oom_score_adj interface. So, I guess, we must to this:
1) if there is no /proc/self/oom_adj and no /proc/self/oom_score_adj, consider this kernel to be too old and disable OOM-related code (i.e. not define HAVE_LINUX_OOM_PROC);
2) if there is /proc/self/oom_adj, but no /proc/self/oom_score_adj, consider this kernel to be old and use old OOM /proc interface (define HAVE_LINUX_OOM_PROC and HAVE_LINUX_OOM_PROC_V1, for example);
3) if there is /proc/self/oom_score_adj, work as we do now (and define HAVE_LINUX_OOM_PROC_V2 or so);
4) if there is linux/oom.h, use it for constants (define HAVE_LINUX_OOM_H), otherwise define necessary constants manually.
Not defining HAVE_LINUX_OOM_PROC will throw away OOM-related code completely. HAVE_LINUX_OOM_V1/HAVE_LINUX_OOM_V2 option will switch the code to write to specific /proc file as well as constants to deal with. In case we have HAVE_LINUX_OOM_PROC (V1 or V2), but do not have HAVE_LINUX_OOM_H, we might end up doing this:
#define OOM_DISABLE -17
#define OOM_ADJUST_MIN -16
#define OOM_ADJUST_MAX 15
#define OOM_SCORE_ADJ_MIN (-1000)
#define OOM_SCORE_ADJ_MAX 1000
With this changes we'll cover all the possibilities one may face while compiling GlusterFS against relatively old kernel.
Attaching initial Milind's patch as a proof-of-concept, but will take care of adopting everything written above if there are no objections.
As ndevos pointed in #gluster-dev, /proc is rather run-time dep, not the compile-time, so we need to check for oom.h in configure.ac, and then make a decision in the runtime on which file to write to (or if to skip OOM code completely).
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#1) for review on master by Oleksandr Natalenko (firstname.lastname@example.org)
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#2) for review on master by Oleksandr Natalenko (email@example.com)
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#3) for review on master by Oleksandr Natalenko (firstname.lastname@example.org)
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#4) for review on master by Oleksandr Natalenko (email@example.com)
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#5) for review on master by Oleksandr Natalenko (firstname.lastname@example.org)
COMMIT: http://review.gluster.org/15587 committed in master by Kaleb KEITHLEY (email@example.com)
Author: Oleksandr Natalenko <firstname.lastname@example.org>
Date: Wed Sep 28 14:29:23 2016 +0200
glusterfsd/main: fix OOM adjustment for older kernels
Milind Changire reported that GlusterFS fails to build on RHEL5
because linux/oom.h is unavailable.
Milind's initial patch disables OOM adjustment completely
for those environments that do not have this header. However,
I'd take another approach that:
1) checks for linux/oom.h in compile-time and defines necessary
constants if the header is not present;
2) checks for available OOM API in /proc in run-time and uses it
This allows OOM to be adjusted properly on RHEL5 (the kernel is pretty new
to present /proc API for that) as well as RHEL6 (the kernel has many thing
backported including new /proc API).
Signed-off-by: Oleksandr Natalenko <email@example.com>
Tested-by: Oleksandr Natalenko <firstname.lastname@example.org>
Reviewed-by: Niels de Vos <email@example.com>
Smoke: Gluster Build System <firstname.lastname@example.org>
NetBSD-regression: NetBSD Build System <email@example.com>
CentOS-regression: Gluster Build System <firstname.lastname@example.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.
glusterfs-3.10.0 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.