Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1117004 - Use of "pgrep -F" in the bash SDK is unreliable
Use of "pgrep -F" in the bash SDK is unreliable
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers (Show other bugs)
2.1.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Miciah Dashiel Butler Masters
libra bugs
: Upstream
Depends On: 1116135
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-07 15:30 EDT by Brenton Leanhardt
Modified: 2014-08-04 09:27 EDT (History)
13 users (show)

See Also:
Fixed In Version: rubygem-openshift-origin-node-1.23.9.12-1
Doc Type: Bug Fix
Doc Text:
Often when a cartridge starts a runtime in a gear, the cartridge stores the pid of the runtime's process in a pidfile. Later, the cartridge may use the process_running function to determine whether that process is still running in the gear by checking whether any running process has a pid matching the pid saved in the pidfile. However, if the runtime's process had terminated and the operating system had subsequently assigned the same pid to a new process, the process_running function could return a false positive, interfering with cartridge control actions. This bug fix updates the process_running function to use the pgrep command with the -u option to restrict its search to processes belonging to the gear. As a result, the process_running function now has a much lower probability of returning a false positive.
Story Points: ---
Clone Of: 1116135
Environment:
Last Closed: 2014-08-04 09:27:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0999 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.1.4 bug fix and enhancement update 2014-08-04 13:26:43 EDT

  None (edit)
Description Brenton Leanhardt 2014-07-07 15:30:35 EDT
+++ This bug was initially created as a clone of Bug #1116135 +++

Description of problem:

The process_running function in node/misc/usr/lib/cartridge_sdk/bash/sdk uses "pgrep -F" to determine whether a cartridge's processes are running.  The problem is that with this option, pgrep checks for these PIDs by traversing /proc.  It turns out that if another gear has a process with the pid being checked, pgrep -F will find it.  As a result, if gear A has a stale pidfile containing a pid matching a long-running process belonging to gear B, gear B will effectively prevent gear A from running, unless the owner knows to go remove the stale pid file.

Version-Release number of selected component (if applicable):

rubygem-openshift-origin-node-1.26.8-1.el6oso.noarch

How reproducible:

Easily

Steps to Reproduce:
1. rhc app create bztest nodejs-0.10 postgresql-9.2
2. rhc app stop
3. rhc ssh bztest
4. look in /proc for a process belonging to another gear (referred to below as $PID)
5. echo $PID > postgresql/pid/postgres.pid
6. gear start

Actual results:

start will fail for postgres with code 70

Expected results:

start should succeed

--- Additional comment from Jhon Honce on 2014-07-07 13:04:34 EDT ---

Fixed in https://github.com/openshift/origin-server/pull/5575

--- Additional comment from openshift-github-bot on 2014-07-07 13:54:29 EDT ---

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/16eb8a6e98def5a8c830757ad9fa9c0a3a3b4afe
Bug 1116135 - Add -u to bash sdk pgrep calls

* Since gears can "see" another gears pid files in the /proc filesystem,
  a stale pid file could block a cartridge from starting via the check
  in sdk#process_running()
Comment 1 Miciah Dashiel Butler Masters 2014-07-11 18:26:14 EDT
PR: https://github.com/openshift/enterprise-server/pull/320
Comment 5 Anping Li 2014-07-21 05:28:18 EDT
Verified and pass in puddle-2-1-2014-07-18

The bug can be recreated at puddle-2014-05-29.3
[bztest-hanli1dom.example.com 53ccdba3d42d02f3a70d3f50]\> gear start
Starting gear...
Could not start Postgres
An error occurred executing 'gear start' (exit code: 70)
Error message: CLIENT_ERROR: Failed to execute: 'control start' for /var/lib/openshift/53ccdba3d42d02f3a70d3f50/postgresql


Execute same steps in puddle-2-1-2014-07-18, No error was reported and app was started.
[bztest-hanli1dom.example.com 53ccda324cfeff7254000015]\> echo 20400 >> postgresql/pid/postgres.pid
[bztest-hanli1dom.example.com 53ccda324cfeff7254000015]\> gear start
Starting gear...
Starting Postgres cartridge
Postgres started
Starting NodeJS cartridge
Mon Jul 21 2014 05:27:17 GMT-0400 (EDT): Starting application 'bztest' ...
Comment 7 errata-xmlrpc 2014-08-04 09:27:40 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0999.html

Note You need to log in before you can comment on or make changes to this bug.