Bug 360341

Summary: add tunable to sshd initscript for protection from oom-kill
Product: Red Hat Enterprise Linux 5 Reporter: Paul Morgan <pmorgan>
Component: opensshAssignee: Tomas Mraz <tmraz>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: low    
Version: 5.0CC: lwoodman, tburke
Target Milestone: ---Keywords: EasyFix, Patch
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-01 16:48:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch for /etc/init.d/sshd to use oom-kill tunable none

Description Paul Morgan 2007-10-31 14:51:17 UTC
Description of problem:

Under heavy memory pressure, oom-kill starts killing non-interactive processes
based on sleep average. Unfortunately, sshd is usually one of the early
processes to get killed, which means admin must hike out to datacenter floor or
hopefully have a console switch in place. 

The attached patch to /etc/init.d/sshd uses the rhel5 kernel's oom_adj tunable
to provide moderate protection from oom-kill so that sshd remains running for a
bit longer.

Version-Release number of selected component (if applicable):
all versions beginning with rhel 5.0 (this tunable was added in rhel 5.0)


Steps to patch initscript:
1. cp sshd-oom_adj-example.patch /etc/init.d
2. cd /etc/init.d
3. cp sshd{,.orig}
4. patch -p0 < sshd-oom_adj-example.patch 


Expected results: sshd gains a tunable amount of protection from oom-kill


See also:

https://bugzilla.redhat.com/show_bug.cgi?id=244739 
HOWTO - oom kill policy tuning

https://bugzilla.redhat.com/show_bug.cgi?id=239313
Document oom_adj & oom_score

Comment 1 Paul Morgan 2007-10-31 14:51:17 UTC
Created attachment 244601 [details]
patch for /etc/init.d/sshd to use oom-kill tunable

Comment 2 Paul Morgan 2007-10-31 15:32:40 UTC
This BZ is an enchancement request for the sshd init script in openssh as
shipped by Red Hat. It has no effect on normal production; it only comes into
play when oom-kill starts killing non-interactive PIDs.

Comment 3 Tomas Mraz 2007-10-31 15:39:17 UTC
As the oom_adj value seems to be inherited from parent process to child I
propose a slightly different approach:

echo 3 >/proc/self/oom_adj 

just before the ssh daemon is executed.


Comment 4 RHEL Program Management 2007-10-31 15:44:49 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Paul Morgan 2007-10-31 15:46:50 UTC
excellent! Your suggestion also gets around the nastiness of interfering with
RETVAL from starting sshd. Have you tested yet, or should I?

Comment 6 Tomas Mraz 2007-10-31 15:56:04 UTC
On rawhide it seems to work fine.

Comment 7 Paul Morgan 2007-10-31 16:08:01 UTC
also works on rhel5 updated to 2.6.18-8.1.15.el5 kernel; will kickstart a 5.0
box and test there also; 

to be thorough, i also ran `find /proc -name oom_adj -exec cat {} \; | grep 3'
to make sure only sshd and its task had the new oom_adj score (as expected)

Comment 8 Paul Morgan 2007-10-31 16:36:51 UTC
works on rhel 5.0, too.

unfortunately: inheritance means that ssh client sessions pick up the oom_adj
for their bash shell, too. This means that oom_adj propagates...not the
intention imho.

Hmmmm....

Comment 9 Paul Morgan 2007-10-31 16:37:51 UTC
Larry, any comments or suggestions?
TIA.

Comment 10 Paul Morgan 2007-10-31 17:53:08 UTC
After experimenting with the effects of oom_adj on an ssh client session, it
appears that this is a useful side effect.

That is, on an active session, oom_score gets artificially raised above the
normal sleep avg adjustment. This is good since it actually gives the admin a
chance to get a server under control if it's undergoing oom-kill.

On an idle session, having oom_adj=3 artificially lowers the normal sleep avg
adjustment. This, too, seems good since an idle shell is not being used to
recover the machine.

The net effect on a box that's trying not to die seems like a positive.

Comments?

Comment 11 Paul Morgan 2007-10-31 18:50:46 UTC
No go; restarting a service from within the ssh session causes the restarted
service to have oom_adj=3, as well. 

Impacts normal production dangerously. It seems there is no transparent way to
protect sshd and just sshd at the moment.

Comment 12 Paul Morgan 2007-11-01 16:48:30 UTC
should not implement, closing as wontfix