Bug 1326068

Summary: watchman gear_state_plugin.rb needs more elaborate processing monitoring inspection of gear state
Product: OpenShift Container Platform Reporter: Dave Sullivan <dsulliva>
Component: RFEAssignee: Mike Barrett <mbarrett>
Status: CLOSED WONTFIX QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.2.0CC: aos-bugs, erich, jokerman, mmccomas, xtian
Target Milestone: ---Keywords: RFE
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-12 13:41:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Dave Sullivan 2016-04-11 17:27:55 UTC
Description of problem:

If you kill -9/-15 processes under a gear watchman does not automatically restart the gear.

e.g. create an application with jbosseap and mysql5-5 cartridges

kill -15 <mysqld_pid> <mysqld_safe_pid>

watchman never restarts the application

A more elaborate approach is described here

https://bugzilla.redhat.com/show_bug.cgi?id=1133629#c4

watchman gear_state_plugin.rb should be able to monitor via process regex for each cartridge type, if those "required" processes are killed then watchman should restart the gear


Version-Release number of selected component (if applicable):

2.2.8


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

kill a gear's important pids and watchman doesn't restart the gear




Expected results:

If a deemed "required" process of a gear (defined by regex), see control--->status and possibly other processes logshifter, rhscl

If one or some number of those processes go away, have watchman restart the gear.


Additional info:

the oom plugin restarts the gears on oom events, so it would make sense for any plugin to on damaging event to restart the gear

If a gear's critical processes are killed watchman/plugin should take action.

Comment 1 Dave Sullivan 2016-04-11 17:38:09 UTC
So define proper regex for processes to monitor.

If one of these processes goes away put in Unknown state.

Unknown state already does a gear restart.


https://github.com/openshift/origin-server/blob/master/node-util/conf/watchman/plugins.d/gear_state_plugin.rb