1133629 – [watchman] Watchman GearStartPlugin does not filter logshifter or haproxy processes

Bug 1133629 - [watchman] Watchman GearStartPlugin does not filter logshifter or haproxy processes

Summary: [watchman] Watchman GearStartPlugin does not filter logshifter or haproxy pro...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	2.x
Assignee:	Andy Goldstein
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1183070
TreeView+	depends on / blocked

Reported:	2014-08-25 15:41 UTC by Jhon Honce
Modified:	2015-05-14 23:37 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1183070 (view as bug list)
Environment:
Last Closed:	2015-03-05 19:56:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jhon Honce 2014-08-25 15:41:40 UTC

Description of problem:
When determining if a gear's expected processes are running, Watchman doesn't filter logshifter or haproxy processes.

Version-Release number of selected component (if applicable):


How reproducible:
every time

logshifter issue
Steps to Reproduce:
1. create javaews application
2. kill -9 the java process
3. ps ax --format 'uid,pid=,ppid=,ucmd=' |grep <gear uid>
4. logshifter will now have ppid == 1

haproxy issue
Steps to Reproduce:
1. create scaled javaews application on one gear
2. kill -9 the java process
3. ps ax --format 'uid,pid=,ppid=,ucmd=' |grep <gear uid>
4. haproxy will now have ppid == 1

Actual results:
Watchman never restarts gear

Expected results:
Watchman restarts gear

Additional info:

Comment 1 Andy Goldstein 2014-09-16 16:01:20 UTC

WIP: https://github.com/openshift/origin-server/pull/5814

Comment 2 openshift-github-bot 2014-09-17 15:47:43 UTC

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/7a12cdc7f0b943ff39617c430810cf47b1a73c83
Watchman filters out haproxy/logshifter

Make Watchman's GearStatePlugin filter out haproxy and logshifter
related processes so it can better determine when to start a gear whose
processes have died.

Bug 1133629

Comment 3 Meng Bo 2014-09-18 08:34:59 UTC

Checked on devenv_5175,
Watchman will restart the gear which has only haproxy and logshifter process running.

# ps ax --format 'uid,pid=,ppid=,args=' | grep 1002
 1002  5157     1 /usr/bin/logshifter -tag haproxy
 1002  5158     1 /usr/sbin/haproxy -f /var/lib/openshift/541ae22e53eb02da6a000233/haproxy//conf/haproxy.cfg
 1002  5159     1 bash /var/lib/openshift/541ae22e53eb02da6a000233/haproxy/usr/bin/haproxy_ctld
 1002  5160     1 /usr/bin/logshifter -tag haproxy_ctld
 1002  5167  5159 ruby /var/lib/openshift/541ae22e53eb02da6a000233/haproxy/usr/bin/haproxy_ctld.rb
 1002  9919  9905 sshd: 541ae22e53eb02da6a000233@pts/7
 1002  9921  9919 /bin/bash --init-file /usr/bin/rhcsh -i
    0 10256 23809 grep 1002


[root@ip-10-51-165-140 ~]# tailf /var/log/messages
Sep 18 10:04:42 ip-10-51-165-140 kernel: docker0: port 5(veth51e6) entering forwarding state
Sep 18 10:04:48 ip-10-51-165-140 watchman[4062]: watchman restarted user 541ae22e53eb02da6a000233: application jbews2s (retries: 0)
Sep 18 10:05:06 ip-10-51-165-140 root[12359]: user-cron-jobs :START: minutely run of all scheduled jobs

Comment 4 Andy Grimm 2014-11-12 21:01:06 UTC

I see that this code is already pushed into the product, but this fixes only a very specific case of a much more general problem.  The gear state plugin has no knowledge of whether a gear is running the "right" processes given the cartridges it contains.  A real solution to this would have to be a more rigorous check, where a cartridge manifest would describe what processes should be present (e.g., at least two httpd worker processes, one Rack process, whatever), and the gear state plugin would parse and verify that specification for each cartridge in a gear.  That's expensive, but it's the only way to verify that a running gear is really running.

That's clearly an RFE, not a bug, but it seemed worth mentioning here.

Note You need to log in before you can comment on or make changes to this bug.