Bug 1379769 - GlusterFS fails to build on old Linux distros with linux/oom.h missing
Summary: GlusterFS fails to build on old Linux distros with linux/oom.h missing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: All
OS: Linux
medium
low
Target Milestone: ---
Assignee: Oleksandr Natalenko
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-27 15:38 UTC by Oleksandr Natalenko
Modified: 2017-03-06 17:28 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-06 17:28:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
Initial patch (3.94 KB, application/mbox)
2016-09-27 15:38 UTC, Oleksandr Natalenko
no flags Details

Description Oleksandr Natalenko 2016-09-27 15:38:54 UTC
Created attachment 1205274 [details]
Initial patch

Milind Changire has reported that GlusterFS fails to build under RHEL5 because it does not have linux/oom.h header.

This header is used purely to obtain OOM-related constants.

Also, this issue raises the question how OOM should be managed under old kernels. From man 5 proc we see this:

===
       /proc/[pid]/oom_adj (since Linux 2.6.11)
              This file can be used to adjust the score used to select which process should be killed in an out-of-memory (OOM) situation.  The kernel uses  this  value
              for  a  bit-shift  operation  of  the process's oom_score value: valid values are in the range -16 to +15, plus the special value -17, which disables OOM-
              killing altogether for this process.  A positive score increases the likelihood of this process being killed by the OOM-killer; a negative score decreases
              the likelihood.

              The default value for this file is 0; a new process inherits its parent's oom_adj setting.  A process must be privileged (CAP_SYS_RESOURCE) to update this
              file.

              Since Linux 2.6.36, use of this file is deprecated in favor of /proc/[pid]/oom_score_adj.

...

       /proc/[pid]/oom_score_adj (since Linux 2.6.36)
              This file can be used to adjust the badness heuristic used to select which process gets killed in out-of-memory conditions.

              The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine  which  process  is  targeted.
              The units are roughly a proportion along that range of allowed memory the process may allocate from, based on an estimation of its current memory and swap
              use.  For example, if a task is using all allowed memory, its badness score will be 1000.  If it is using half of its allowed memory, its  score  will  be
              500.

              There is an additional factor included in the badness score: root processes are given 3% extra memory over other tasks.

              The  amount  of  "allowed" memory depends on the context in which the OOM-killer was called.  If it is due to the memory assigned to the allocating task's
              cpuset being exhausted, the allowed memory represents the set of mems assigned to that cpuset (see cpuset(7)).  If it is  due  to  a  mempolicy's  node(s)
              being exhausted, the allowed memory represents the set of mempolicy nodes.  If it is due to a memory limit (or swap limit) being reached, the allowed mem‐
              ory is that configured limit.  Finally, if it is due to the entire system being out of memory, the allowed memory represents all allocatable resources.

              The value of oom_score_adj is added to the badness score before it is used  to  determine  which  task  to  kill.   Acceptable  values  range  from  -1000
              (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows user space to control the preference for OOM-killing, ranging from always preferring a cer‐
              tain task or completely disabling it from OOM killing.  The lowest possible value, -1000, is equivalent to disabling OOM-killing entirely for  that  task,
              since it will always report a badness score of 0.

              Consequently, it is very simple for user space to define the amount of memory to consider for each task.  Setting a oom_score_adj value of +500, for exam‐
              ple, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use  at  least
              50% more memory.  A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task's allowed memory from being considered as
              scoring against the task.

              For backward compatibility with previous kernels, /proc/[pid]/oom_adj can still be used to tune the badness score.  Its  value  is  scaled  linearly  with
              oom_score_adj.

              Writing to /proc/[pid]/oom_score_adj or /proc/[pid]/oom_adj will change the other with its scaled value.
===

In summary, for kernels older that 2.6.11 we must disable OOM-related code completely, for kernels from 2.6.11 to 2.6.35 incl we must use old interface (/proc/[pid]/oom_adj), and starting from 2.6.36 we must use /proc/[pid]/oom_score_adj.

It is not that simple obviously. For example, RHEL6 while having 2.6.32 kernel, provides /proc/[pid]/oom_score_adj interface. So, I guess, we must to this:

1) if there is no /proc/self/oom_adj and no /proc/self/oom_score_adj, consider this kernel to be too old and disable OOM-related code (i.e. not define HAVE_LINUX_OOM_PROC);
2) if there is /proc/self/oom_adj, but no /proc/self/oom_score_adj, consider this kernel to be old and use old OOM /proc interface (define HAVE_LINUX_OOM_PROC and HAVE_LINUX_OOM_PROC_V1, for example);
3) if there is /proc/self/oom_score_adj, work as we do now (and define HAVE_LINUX_OOM_PROC_V2 or so);
4) if there is linux/oom.h, use it for constants (define HAVE_LINUX_OOM_H), otherwise define necessary constants manually.

Not defining HAVE_LINUX_OOM_PROC will throw away OOM-related code completely. HAVE_LINUX_OOM_V1/HAVE_LINUX_OOM_V2 option will switch the code to write to specific /proc file as well as constants to deal with. In case we have HAVE_LINUX_OOM_PROC (V1 or V2), but do not have HAVE_LINUX_OOM_H, we might end up doing this:

===
#define OOM_DISABLE -17
#define OOM_ADJUST_MIN -16
#define OOM_ADJUST_MAX 15
#define OOM_SCORE_ADJ_MIN       (-1000)
#define OOM_SCORE_ADJ_MAX       1000
===

With this changes we'll cover all the possibilities one may face while compiling GlusterFS against relatively old kernel.

Attaching initial Milind's patch as a proof-of-concept, but will take care of adopting everything written above if there are no objections.

Comment 1 Oleksandr Natalenko 2016-09-27 15:56:30 UTC
As ndevos pointed in #gluster-dev, /proc is rather run-time dep, not the compile-time, so we need to check for oom.h in configure.ac, and then make a decision in the runtime on which file to write to (or if to skip OOM code completely).

Comment 2 Worker Ant 2016-09-28 12:35:36 UTC
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#1) for review on master by Oleksandr Natalenko (oleksandr@natalenko.name)

Comment 3 Worker Ant 2016-09-28 12:51:28 UTC
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#2) for review on master by Oleksandr Natalenko (oleksandr@natalenko.name)

Comment 4 Worker Ant 2016-10-01 13:31:52 UTC
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#3) for review on master by Oleksandr Natalenko (oleksandr@natalenko.name)

Comment 5 Worker Ant 2016-10-02 09:25:55 UTC
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#4) for review on master by Oleksandr Natalenko (oleksandr@natalenko.name)

Comment 6 Worker Ant 2016-10-10 11:10:34 UTC
REVIEW: http://review.gluster.org/15587 (glusterfsd/main: fix OOM adjustment for older kernels) posted (#5) for review on master by Oleksandr Natalenko (oleksandr@natalenko.name)

Comment 7 Worker Ant 2016-10-11 12:18:09 UTC
COMMIT: http://review.gluster.org/15587 committed in master by Kaleb KEITHLEY (kkeithle@redhat.com) 
------
commit de07155bfae3c5846797cbb19ee044751cbe6f6e
Author: Oleksandr Natalenko <onatalen@redhat.com>
Date:   Wed Sep 28 14:29:23 2016 +0200

    glusterfsd/main: fix OOM adjustment for older kernels
    
    Milind Changire reported that GlusterFS fails to build on RHEL5
    because linux/oom.h is unavailable.
    
    Milind's initial patch disables OOM adjustment completely
    for those environments that do not have this header. However,
    I'd take another approach that:
    
    1) checks for linux/oom.h in compile-time and defines necessary
    constants if the header is not present;
    2) checks for available OOM API in /proc in run-time and uses it
    accordingly.
    
    This allows OOM to be adjusted properly on RHEL5 (the kernel is pretty new
    to present /proc API for that) as well as RHEL6 (the kernel has many thing
    backported including new /proc API).
    
    Change-Id: I1bc610586872d208430575c149a7d0c54bd82370
    BUG: 1379769
    Signed-off-by: Oleksandr Natalenko <onatalen@redhat.com>
    Reviewed-on: http://review.gluster.org/15587
    Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

Comment 8 Shyamsundar 2017-03-06 17:28:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.