Bug 1326068 - watchman gear_state_plugin.rb needs more elaborate processing monitoring inspection of gear state
Summary: watchman gear_state_plugin.rb needs more elaborate processing monitoring insp...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Mike Barrett
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-11 17:27 UTC by Dave Sullivan
Modified: 2019-10-10 11:50 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-12 13:41:15 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Dave Sullivan 2016-04-11 17:27:55 UTC
Description of problem:

If you kill -9/-15 processes under a gear watchman does not automatically restart the gear.

e.g. create an application with jbosseap and mysql5-5 cartridges

kill -15 <mysqld_pid> <mysqld_safe_pid>

watchman never restarts the application

A more elaborate approach is described here

https://bugzilla.redhat.com/show_bug.cgi?id=1133629#c4

watchman gear_state_plugin.rb should be able to monitor via process regex for each cartridge type, if those "required" processes are killed then watchman should restart the gear


Version-Release number of selected component (if applicable):

2.2.8


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

kill a gear's important pids and watchman doesn't restart the gear




Expected results:

If a deemed "required" process of a gear (defined by regex), see control--->status and possibly other processes logshifter, rhscl

If one or some number of those processes go away, have watchman restart the gear.


Additional info:

the oom plugin restarts the gears on oom events, so it would make sense for any plugin to on damaging event to restart the gear

If a gear's critical processes are killed watchman/plugin should take action.

Comment 1 Dave Sullivan 2016-04-11 17:38:09 UTC
So define proper regex for processes to monitor.

If one of these processes goes away put in Unknown state.

Unknown state already does a gear restart.


https://github.com/openshift/origin-server/blob/master/node-util/conf/watchman/plugins.d/gear_state_plugin.rb


Note You need to log in before you can comment on or make changes to this bug.