Bug 1211526

Summary: HAProxy does not restart when pid is not found
Product: OpenShift Container Platform Reporter: gregory.nuyttens
Component: ImageStreamsAssignee: Timothy Williams <tiwillia>
Status: CLOSED ERRATA QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 2.2.0CC: adellape, bleanhar, bperkins, erich, gpei, jkaur, jokerman, libra-onpremise-devel, mmccomas, nicholas_schuetz, tiwillia
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openshift-origin-cartridge-haproxy-1.31.4.1-1.el6op Doc Type: Bug Fix
Doc Text:
In some cases, it was possible for a restart of a gear with an HAProxy cartridge to result in more than one HAProxy process running. This resulted in the HAProxy cartridge's process not being killed after a restart, and an HAProxy would be running without a proper pid file. Instead of determining the existence of an HAProxy process from the pid file, this bug fix updates the stop function to now check the process list. As a result, the HAProxy process is now properly killed if it still exists after the stop during a restart.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-17 17:09:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description gregory.nuyttens 2015-04-14 08:52:12 UTC
Description of problem:
For some reason (I suspect that sometimes the idling/unidling goes wrong) the HAProxy process exists without a pid file OR with a wrong pid in the file. 
But if I understand well the code on https://github.com/openshift/origin-server/blob/master/cartridges/openshift-origin-cartridge-haproxy/bin/control
the fact that the pid doesn't exist is not a problem for a restart because if the function "_stop_haproxy_service" return something different than the exit code 0 then a "pkill haproxy" should be done.

function _restart_haproxy_service() {
    _stop_haproxy_service || pkill haproxy || :
    _start_haproxy_service
}

But the problem is that this pkill is not done because the return code equals 0 from this line of the "_stop_haproxy_service" function: 
echo "Warning: HAProxy process exists without a pid file.  Use force-stop to kill." 1>&2

When I put "&& return 1" at the end of this line, it works.


Version-Release number of selected component (if applicable):
openshift-origin-cartridge-haproxy-1.30.0.1-1.el6op.noarch


How reproducible:
Remove or change the pid of the haproxy pid file. Then try to restart the application and you should see that nothing was restarted.

Steps to Reproduce:
1. Remove or change the pid of the haproxy pid file -> [gear directory]/haproxy/run/haproxy.pid
2. pgrep -l haproxy and check the current pid of the haproxy process
3. Inside the gear try to restart the haproxy cartridge -> gear restart --cart haproxy
4. pgrep -l haproxy and check the new one pid of the haproxy process and you will see that nothing happens

Actual results:
HAProxy doesn't restart

Expected results:
HAProxy does restart

Additional info:
Note that if you remove the haproxy pid file you can be in a situation with two haproxy running in the sametime. Perhaps moreover actions than a pkill should be needed.

Comment 2 Brenton Leanhardt 2015-04-14 12:59:56 UTC
Thanks for filing this.  The haproxy control script the code to stop the process does seem a little strange.  Looking through the git history I expected to see someone add those echo lines at a later date which could have introduced the bug.  That wasn't the case so this is probably just a bug we need to fix.

Comment 6 Timothy Williams 2015-10-13 14:09:42 UTC
This should be fixed with https://github.com/openshift/origin-server/pull/6268

Comment 7 openshift-github-bot 2015-10-13 23:01:45 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/203d57bfddb7b56d18e0832e434d3801fd9def0d
Kill haproxy during restart even when pid does not exist

Bug 1211526
https://bugzilla.redhat.com/show_bug.cgi?id=1211526

The haproxy process was not killed with pkill as expected due to a logging message being reported instead of returning non-zero at the end of the `_stop_haproxy_service` function. `pkill` is now run if the haproxy process still exists after the stop during a restart.

The `_stop_haproxy_ctld` method was setting $pid to the haproxy_ctld pid. This same variable is set in `_stop_haproxy_service` only if the haproxy pid file does exist. If the haproxy pid file doesn't exist, the $pid variable still contains the haproxy_ctld pid, causing us to assume that the haproxy pid file does exist.

The stop function would report that the haproxy instance is stopped when the haproxy pid files does not exist. Now `stop()` attempts to force-kill the haproxy instance if the process is still running, whether the pid file exists or not.

Lastly, modified several `echo` statements to use the sdk's client_* methods. This ensures that the user perfoming these actions using rhc will receive the message in the rhc output.

Comment 11 Gaoyun Pei 2015-11-19 09:41:45 UTC
Verify this bug with openshift-origin-cartridge-haproxy-1.31.4.1-1.el6op.noarch.

Steps:
1. Create a scalable app
2. Delete the haproxy pid file
[test1-yes.ose22-auto.com.cn yes-test1-1]\> rm haproxy/run/haproxy.pid
3. Restart the haproxy
[test1-yes.ose22-auto.com.cn yes-test1-1]\> ctl_app restart haproxy
Cart to restart?
1. ruby-2.0
2. haproxy-1.4
?  2
CLIENT_MESSAGE: Warning: HAProxy process exists without a pid file.
CLIENT_MESSAGE: Could not stop HAProxy. Forcefully killing the process.
CLIENT_RESULT: Restarted HAProxy instance
4. Check the haproxy process
[test1-yes.ose22-auto.com.cn yes-test1-1]\> ps -ef|grep haproxy
1734     26417     1  0 04:35 pts/0    00:00:00 /usr/bin/logshifter -tag haproxy
1734     26418     1  0 04:35 pts/0    00:00:00 /usr/sbin/haproxy -f /var/lib/openshift/yes-test1-1/haproxy//conf/haproxy.cfg
1734     26424     1  0 04:35 pts/0    00:00:00 bash /var/lib/openshift/yes-test1-1/haproxy/usr/bin/haproxy_ctld
1734     26425     1  0 04:35 pts/0    00:00:00 /usr/bin/logshifter -tag haproxy_ctld
1734     26433 26424  0 04:35 pts/0    00:00:00 ruby /var/lib/openshift/yes-test1-1/haproxy/usr/bin/haproxy_ctld.rb
1734     26483 23650  0 04:35 pts/0    00:00:00 grep haproxy
[test1-yes.ose22-auto.com.cn yes-test1-1]\> cat haproxy/run/haproxy.pid 
26418

Comment 13 errata-xmlrpc 2015-12-17 17:09:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2666.html