Bug 360341 - add tunable to sshd initscript for protection from oom-kill
add tunable to sshd initscript for protection from oom-kill
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openssh (Show other bugs)
5.0
All Linux
low Severity low
: ---
: ---
Assigned To: Tomas Mraz
Brian Brock
: EasyFix, Patch
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-31 10:51 EDT by Paul Morgan
Modified: 2009-07-20 05:59 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-01 12:48:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
patch for /etc/init.d/sshd to use oom-kill tunable (340 bytes, patch)
2007-10-31 10:51 EDT, Paul Morgan
no flags Details | Diff

  None (edit)
Description Paul Morgan 2007-10-31 10:51:17 EDT
Description of problem:

Under heavy memory pressure, oom-kill starts killing non-interactive processes
based on sleep average. Unfortunately, sshd is usually one of the early
processes to get killed, which means admin must hike out to datacenter floor or
hopefully have a console switch in place. 

The attached patch to /etc/init.d/sshd uses the rhel5 kernel's oom_adj tunable
to provide moderate protection from oom-kill so that sshd remains running for a
bit longer.

Version-Release number of selected component (if applicable):
all versions beginning with rhel 5.0 (this tunable was added in rhel 5.0)


Steps to patch initscript:
1. cp sshd-oom_adj-example.patch /etc/init.d
2. cd /etc/init.d
3. cp sshd{,.orig}
4. patch -p0 < sshd-oom_adj-example.patch 


Expected results: sshd gains a tunable amount of protection from oom-kill


See also:

https://bugzilla.redhat.com/show_bug.cgi?id=244739 
HOWTO - oom kill policy tuning

https://bugzilla.redhat.com/show_bug.cgi?id=239313
Document oom_adj & oom_score
Comment 1 Paul Morgan 2007-10-31 10:51:17 EDT
Created attachment 244601 [details]
patch for /etc/init.d/sshd to use oom-kill tunable
Comment 2 Paul Morgan 2007-10-31 11:32:40 EDT
This BZ is an enchancement request for the sshd init script in openssh as
shipped by Red Hat. It has no effect on normal production; it only comes into
play when oom-kill starts killing non-interactive PIDs.
Comment 3 Tomas Mraz 2007-10-31 11:39:17 EDT
As the oom_adj value seems to be inherited from parent process to child I
propose a slightly different approach:

echo 3 >/proc/self/oom_adj 

just before the ssh daemon is executed.
Comment 4 RHEL Product and Program Management 2007-10-31 11:44:49 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 5 Paul Morgan 2007-10-31 11:46:50 EDT
excellent! Your suggestion also gets around the nastiness of interfering with
RETVAL from starting sshd. Have you tested yet, or should I?
Comment 6 Tomas Mraz 2007-10-31 11:56:04 EDT
On rawhide it seems to work fine.
Comment 7 Paul Morgan 2007-10-31 12:08:01 EDT
also works on rhel5 updated to 2.6.18-8.1.15.el5 kernel; will kickstart a 5.0
box and test there also; 

to be thorough, i also ran `find /proc -name oom_adj -exec cat {} \; | grep 3'
to make sure only sshd and its task had the new oom_adj score (as expected)
Comment 8 Paul Morgan 2007-10-31 12:36:51 EDT
works on rhel 5.0, too.

unfortunately: inheritance means that ssh client sessions pick up the oom_adj
for their bash shell, too. This means that oom_adj propagates...not the
intention imho.

Hmmmm....
Comment 9 Paul Morgan 2007-10-31 12:37:51 EDT
Larry, any comments or suggestions?
TIA.
Comment 10 Paul Morgan 2007-10-31 13:53:08 EDT
After experimenting with the effects of oom_adj on an ssh client session, it
appears that this is a useful side effect.

That is, on an active session, oom_score gets artificially raised above the
normal sleep avg adjustment. This is good since it actually gives the admin a
chance to get a server under control if it's undergoing oom-kill.

On an idle session, having oom_adj=3 artificially lowers the normal sleep avg
adjustment. This, too, seems good since an idle shell is not being used to
recover the machine.

The net effect on a box that's trying not to die seems like a positive.

Comments?
Comment 11 Paul Morgan 2007-10-31 14:50:46 EDT
No go; restarting a service from within the ssh session causes the restarted
service to have oom_adj=3, as well. 

Impacts normal production dangerously. It seems there is no transparent way to
protect sshd and just sshd at the moment.
Comment 12 Paul Morgan 2007-11-01 12:48:30 EDT
should not implement, closing as wontfix

Note You need to log in before you can comment on or make changes to this bug.