Bug 1128524

Summary: Short timeout in wait_for_pid_file function maybe cause app creation failure.
Product: OpenShift Container Platform Reporter: Johnny Liu <jialiu>
Component: ContainersAssignee: Brenton Leanhardt <bleanhar>
Status: CLOSED EOL QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.2.0CC: jokerman, libra-onpremise-devel, mmccomas, rthrashe
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-13 22:19:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Johnny Liu 2014-08-11 03:00:36 UTC
Description of problem:
In /usr/lib/openshift/cartridge_sdk/bash/sdk, 
function wait_for_pid_file {
  [ -f "$1" ] && return 0
  for i in {1..20}; do
    sleep .5
    [ -f "$1" ] && break;
  done
}

when creating app, it will wait for about 10s to check if pid file is created, if not, will be a failure.

But in some poor performance machine, this timeout is a little short, if i change {1..20} to {1..60}, app is created successfully.

Version-Release number of selected component (if applicable):
rubygem-openshift-origin-node-1.23.9.15-1.el6op.noarch

How reproducible:
Always on some poor performance machine

Steps to Reproduce:
1.Add "set -x" in /var/lib/openshift/.cartridge_repository/redhat-php/0.0.16.1/bin/control
2.Create a app on a poor performance machine

Actual results:
$ rhc app create myapp php-5.3
<--snip-->
+ php_context 'nohup /usr/sbin/httpd -C '\''Include /var/lib/openshift/53e823a8ecf7d7b405000414/php//configuration/etc/conf.d/*.conf'\'' -f
/var/lib/openshift/53e823a8ecf7d7b405000414/php//configuration/etc/conf/httpd_nolog.conf -c '\''Include /etc/openshift/cart.conf.d/httpd/*.conf'\'' -c '\''Include
/etc/openshift/cart.conf.d/httpd/php/*.conf'\'' -D FOREGROUND |& /usr/bin/logshifter -tag php &'
+ case $OPENSHIFT_PHP_VERSION in
+ eval nohup /usr/sbin/httpd -C ''\''Include' '/var/lib/openshift/53e823a8ecf7d7b405000414/php//configuration/etc/conf.d/*.conf'\''' -f
/var/lib/openshift/53e823a8ecf7d7b405000414/php//configuration/etc/conf/httpd_nolog.conf -c ''\''Include' '/etc/openshift/cart.conf.d/httpd/*.conf'\''' -c ''\''Include'
'/etc/openshift/cart.conf.d/httpd/php/*.conf'\''' -D FOREGROUND '|&' /usr/bin/logshifter -tag php '&'
+ wait_for_pid_file /var/lib/openshift/53e823a8ecf7d7b405000414/php//run/httpd.pid
+ '[' -f /var/lib/openshift/53e823a8ecf7d7b405000414/php//run/httpd.pid ']'
++ nohup /usr/sbin/httpd -C 'Include /var/lib/openshift/53e823a8ecf7d7b405000414/php//configuration/etc/conf.d/*.conf' -f
/var/lib/openshift/53e823a8ecf7d7b405000414/php//configuration/etc/conf/httpd_nolog.conf -c 'Include /etc/openshift/cart.conf.d/httpd/*.conf' -c 'Include
/etc/openshift/cart.conf.d/httpd/php/*.conf' -D FOREGROUND
+ for i in '{1..20}'
+ sleep .5
++ /usr/bin/logshifter -tag php
+ '[' -f /var/lib/openshift/53e823a8ecf7d7b405000414/php//run/httpd.pid ']'
+ for i in '{1..20}'
<--snip-->

In the end, app creation failed, due to no http pid file is created when reach timeout. 

Expected results:
timeout in wait_for_pid_file should be longer, or configurable for user according to real env.

Additional info:
If I do not add "set -x" in /var/lib/openshift/.cartridge_repository/redhat-php/0.0.16.1/bin/control, app is failed to be created, just saying:
<--snip-->
Creating application 'php53app' ... 
Starting PHP 5.3 cartridge (Apache+mod_php)
Application directory "/" selected as DocumentRoot
Failed to execute: 'control start' for /var/lib/openshift/53e82302ecf7d7b405000402/php

And in log message, there is no useful message to help user debug.

Comment 2 Brenton Leanhardt 2014-08-11 20:54:07 UTC
We should definitely improve logging for this sort of situation.  My suspicion is that if a pid file is taking longer than 10 seconds to be written the environment will likely hit other problems.

Comment 3 Johnny Liu 2014-08-12 01:44:19 UTC
I also have such suspicion, I found this machine is getting slower and slower, but I did not find the root cause.

Comment 4 Rory Thrasher 2017-01-13 22:19:18 UTC
OpenShift Enterprise v2 has officially reached EoL.  This product is no longer supported and bugs will be closed.

Please look into the replacement enterprise-grade container option, OpenShift Container Platform v3.  https://www.openshift.com/container-platform/

More information can be found here: https://access.redhat.com/support/policy/updates/openshift/