Bug 1117004 - Use of "pgrep -F" in the bash SDK is unreliable
Summary: Use of "pgrep -F" in the bash SDK is unreliable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Miciah Dashiel Butler Masters
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1116135
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-07 19:30 UTC by Brenton Leanhardt
Modified: 2014-08-04 13:27 UTC (History)
13 users (show)

Fixed In Version: rubygem-openshift-origin-node-1.23.9.12-1
Doc Type: Bug Fix
Doc Text:
Often when a cartridge starts a runtime in a gear, the cartridge stores the pid of the runtime's process in a pidfile. Later, the cartridge may use the process_running function to determine whether that process is still running in the gear by checking whether any running process has a pid matching the pid saved in the pidfile. However, if the runtime's process had terminated and the operating system had subsequently assigned the same pid to a new process, the process_running function could return a false positive, interfering with cartridge control actions. This bug fix updates the process_running function to use the pgrep command with the -u option to restrict its search to processes belonging to the gear. As a result, the process_running function now has a much lower probability of returning a false positive.
Clone Of: 1116135
Environment:
Last Closed: 2014-08-04 13:27:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0999 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.1.4 bug fix and enhancement update 2014-08-04 17:26:43 UTC

Description Brenton Leanhardt 2014-07-07 19:30:35 UTC
+++ This bug was initially created as a clone of Bug #1116135 +++

Description of problem:

The process_running function in node/misc/usr/lib/cartridge_sdk/bash/sdk uses "pgrep -F" to determine whether a cartridge's processes are running.  The problem is that with this option, pgrep checks for these PIDs by traversing /proc.  It turns out that if another gear has a process with the pid being checked, pgrep -F will find it.  As a result, if gear A has a stale pidfile containing a pid matching a long-running process belonging to gear B, gear B will effectively prevent gear A from running, unless the owner knows to go remove the stale pid file.

Version-Release number of selected component (if applicable):

rubygem-openshift-origin-node-1.26.8-1.el6oso.noarch

How reproducible:

Easily

Steps to Reproduce:
1. rhc app create bztest nodejs-0.10 postgresql-9.2
2. rhc app stop
3. rhc ssh bztest
4. look in /proc for a process belonging to another gear (referred to below as $PID)
5. echo $PID > postgresql/pid/postgres.pid
6. gear start

Actual results:

start will fail for postgres with code 70

Expected results:

start should succeed

--- Additional comment from Jhon Honce on 2014-07-07 13:04:34 EDT ---

Fixed in https://github.com/openshift/origin-server/pull/5575

--- Additional comment from openshift-github-bot on 2014-07-07 13:54:29 EDT ---

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/16eb8a6e98def5a8c830757ad9fa9c0a3a3b4afe
Bug 1116135 - Add -u to bash sdk pgrep calls

* Since gears can "see" another gears pid files in the /proc filesystem,
  a stale pid file could block a cartridge from starting via the check
  in sdk#process_running()

Comment 1 Miciah Dashiel Butler Masters 2014-07-11 22:26:14 UTC
PR: https://github.com/openshift/enterprise-server/pull/320

Comment 5 Anping Li 2014-07-21 09:28:18 UTC
Verified and pass in puddle-2-1-2014-07-18

The bug can be recreated at puddle-2014-05-29.3
[bztest-hanli1dom.example.com 53ccdba3d42d02f3a70d3f50]\> gear start
Starting gear...
Could not start Postgres
An error occurred executing 'gear start' (exit code: 70)
Error message: CLIENT_ERROR: Failed to execute: 'control start' for /var/lib/openshift/53ccdba3d42d02f3a70d3f50/postgresql


Execute same steps in puddle-2-1-2014-07-18, No error was reported and app was started.
[bztest-hanli1dom.example.com 53ccda324cfeff7254000015]\> echo 20400 >> postgresql/pid/postgres.pid
[bztest-hanli1dom.example.com 53ccda324cfeff7254000015]\> gear start
Starting gear...
Starting Postgres cartridge
Postgres started
Starting NodeJS cartridge
Mon Jul 21 2014 05:27:17 GMT-0400 (EDT): Starting application 'bztest' ...

Comment 7 errata-xmlrpc 2014-08-04 13:27:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0999.html


Note You need to log in before you can comment on or make changes to this bug.