360341 – add tunable to sshd initscript for protection from oom-kill

Bug 360341 - add tunable to sshd initscript for protection from oom-kill

Summary: add tunable to sshd initscript for protection from oom-kill

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	openssh
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Tomas Mraz
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-31 14:51 UTC by Paul Morgan
Modified:	2009-07-20 09:59 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-01 16:48:30 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch for /etc/init.d/sshd to use oom-kill tunable (340 bytes, patch) 2007-10-31 14:51 UTC, Paul Morgan	no flags	Details \| Diff
View All

Description Paul Morgan 2007-10-31 14:51:17 UTC

Description of problem:

Under heavy memory pressure, oom-kill starts killing non-interactive processes
based on sleep average. Unfortunately, sshd is usually one of the early
processes to get killed, which means admin must hike out to datacenter floor or
hopefully have a console switch in place. 

The attached patch to /etc/init.d/sshd uses the rhel5 kernel's oom_adj tunable
to provide moderate protection from oom-kill so that sshd remains running for a
bit longer.

Version-Release number of selected component (if applicable):
all versions beginning with rhel 5.0 (this tunable was added in rhel 5.0)


Steps to patch initscript:
1. cp sshd-oom_adj-example.patch /etc/init.d
2. cd /etc/init.d
3. cp sshd{,.orig}
4. patch -p0 < sshd-oom_adj-example.patch 


Expected results: sshd gains a tunable amount of protection from oom-kill


See also:

https://bugzilla.redhat.com/show_bug.cgi?id=244739 
HOWTO - oom kill policy tuning

https://bugzilla.redhat.com/show_bug.cgi?id=239313
Document oom_adj & oom_score

Comment 1 Paul Morgan 2007-10-31 14:51:17 UTC

Created attachment 244601 [details]
patch for /etc/init.d/sshd to use oom-kill tunable

Comment 2 Paul Morgan 2007-10-31 15:32:40 UTC

This BZ is an enchancement request for the sshd init script in openssh as
shipped by Red Hat. It has no effect on normal production; it only comes into
play when oom-kill starts killing non-interactive PIDs.

Comment 3 Tomas Mraz 2007-10-31 15:39:17 UTC

As the oom_adj value seems to be inherited from parent process to child I
propose a slightly different approach:

echo 3 >/proc/self/oom_adj 

just before the ssh daemon is executed.

Comment 4 RHEL Program Management 2007-10-31 15:44:49 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Paul Morgan 2007-10-31 15:46:50 UTC

excellent! Your suggestion also gets around the nastiness of interfering with
RETVAL from starting sshd. Have you tested yet, or should I?

Comment 6 Tomas Mraz 2007-10-31 15:56:04 UTC

On rawhide it seems to work fine.

Comment 7 Paul Morgan 2007-10-31 16:08:01 UTC

also works on rhel5 updated to 2.6.18-8.1.15.el5 kernel; will kickstart a 5.0
box and test there also; 

to be thorough, i also ran `find /proc -name oom_adj -exec cat {} \; | grep 3'
to make sure only sshd and its task had the new oom_adj score (as expected)

Comment 8 Paul Morgan 2007-10-31 16:36:51 UTC

works on rhel 5.0, too.

unfortunately: inheritance means that ssh client sessions pick up the oom_adj
for their bash shell, too. This means that oom_adj propagates...not the
intention imho.

Hmmmm....

Comment 9 Paul Morgan 2007-10-31 16:37:51 UTC

Larry, any comments or suggestions?
TIA.

Comment 10 Paul Morgan 2007-10-31 17:53:08 UTC

After experimenting with the effects of oom_adj on an ssh client session, it
appears that this is a useful side effect.

That is, on an active session, oom_score gets artificially raised above the
normal sleep avg adjustment. This is good since it actually gives the admin a
chance to get a server under control if it's undergoing oom-kill.

On an idle session, having oom_adj=3 artificially lowers the normal sleep avg
adjustment. This, too, seems good since an idle shell is not being used to
recover the machine.

The net effect on a box that's trying not to die seems like a positive.

Comments?

Comment 11 Paul Morgan 2007-10-31 18:50:46 UTC

No go; restarting a service from within the ssh session causes the restarted
service to have oom_adj=3, as well. 

Impacts normal production dangerously. It seems there is no transparent way to
protect sshd and just sshd at the moment.

Comment 12 Paul Morgan 2007-11-01 16:48:30 UTC

should not implement, closing as wontfix

Note You need to log in before you can comment on or make changes to this bug.