+++ This bug was initially created as a clone of Bug #1116135 +++ Description of problem: The process_running function in node/misc/usr/lib/cartridge_sdk/bash/sdk uses "pgrep -F" to determine whether a cartridge's processes are running. The problem is that with this option, pgrep checks for these PIDs by traversing /proc. It turns out that if another gear has a process with the pid being checked, pgrep -F will find it. As a result, if gear A has a stale pidfile containing a pid matching a long-running process belonging to gear B, gear B will effectively prevent gear A from running, unless the owner knows to go remove the stale pid file. Version-Release number of selected component (if applicable): rubygem-openshift-origin-node-1.26.8-1.el6oso.noarch How reproducible: Easily Steps to Reproduce: 1. rhc app create bztest nodejs-0.10 postgresql-9.2 2. rhc app stop 3. rhc ssh bztest 4. look in /proc for a process belonging to another gear (referred to below as $PID) 5. echo $PID > postgresql/pid/postgres.pid 6. gear start Actual results: start will fail for postgres with code 70 Expected results: start should succeed --- Additional comment from Jhon Honce on 2014-07-07 13:04:34 EDT --- Fixed in https://github.com/openshift/origin-server/pull/5575 --- Additional comment from openshift-github-bot on 2014-07-07 13:54:29 EDT --- Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/16eb8a6e98def5a8c830757ad9fa9c0a3a3b4afe Bug 1116135 - Add -u to bash sdk pgrep calls * Since gears can "see" another gears pid files in the /proc filesystem, a stale pid file could block a cartridge from starting via the check in sdk#process_running()
PR: https://github.com/openshift/enterprise-server/pull/320
Verified and pass in puddle-2-1-2014-07-18 The bug can be recreated at puddle-2014-05-29.3 [bztest-hanli1dom.example.com 53ccdba3d42d02f3a70d3f50]\> gear start Starting gear... Could not start Postgres An error occurred executing 'gear start' (exit code: 70) Error message: CLIENT_ERROR: Failed to execute: 'control start' for /var/lib/openshift/53ccdba3d42d02f3a70d3f50/postgresql Execute same steps in puddle-2-1-2014-07-18, No error was reported and app was started. [bztest-hanli1dom.example.com 53ccda324cfeff7254000015]\> echo 20400 >> postgresql/pid/postgres.pid [bztest-hanli1dom.example.com 53ccda324cfeff7254000015]\> gear start Starting gear... Starting Postgres cartridge Postgres started Starting NodeJS cartridge Mon Jul 21 2014 05:27:17 GMT-0400 (EDT): Starting application 'bztest' ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0999.html