Bug 1211526 - HAProxy does not restart when pid is not found
Summary: HAProxy does not restart when pid is not found
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 2.2.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Timothy Williams
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-14 08:52 UTC by gregory.nuyttens
Modified: 2021-08-30 12:30 UTC (History)
11 users (show)

Fixed In Version: openshift-origin-cartridge-haproxy-1.31.4.1-1.el6op
Doc Type: Bug Fix
Doc Text:
In some cases, it was possible for a restart of a gear with an HAProxy cartridge to result in more than one HAProxy process running. This resulted in the HAProxy cartridge's process not being killed after a restart, and an HAProxy would be running without a proper pid file. Instead of determining the existence of an HAProxy process from the pid file, this bug fix updates the stop function to now check the process list. As a result, the HAProxy process is now properly killed if it still exists after the stop during a restart.
Clone Of:
Environment:
Last Closed: 2015-12-17 17:09:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2083063 0 None None None 2017-02-21 03:46:58 UTC
Red Hat Product Errata RHSA-2015:2666 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 2.2.8 security, bug fix, and enhancement update 2015-12-17 22:07:54 UTC

Description gregory.nuyttens 2015-04-14 08:52:12 UTC
Description of problem:
For some reason (I suspect that sometimes the idling/unidling goes wrong) the HAProxy process exists without a pid file OR with a wrong pid in the file. 
But if I understand well the code on https://github.com/openshift/origin-server/blob/master/cartridges/openshift-origin-cartridge-haproxy/bin/control
the fact that the pid doesn't exist is not a problem for a restart because if the function "_stop_haproxy_service" return something different than the exit code 0 then a "pkill haproxy" should be done.

function _restart_haproxy_service() {
    _stop_haproxy_service || pkill haproxy || :
    _start_haproxy_service
}

But the problem is that this pkill is not done because the return code equals 0 from this line of the "_stop_haproxy_service" function: 
echo "Warning: HAProxy process exists without a pid file.  Use force-stop to kill." 1>&2

When I put "&& return 1" at the end of this line, it works.


Version-Release number of selected component (if applicable):
openshift-origin-cartridge-haproxy-1.30.0.1-1.el6op.noarch


How reproducible:
Remove or change the pid of the haproxy pid file. Then try to restart the application and you should see that nothing was restarted.

Steps to Reproduce:
1. Remove or change the pid of the haproxy pid file -> [gear directory]/haproxy/run/haproxy.pid
2. pgrep -l haproxy and check the current pid of the haproxy process
3. Inside the gear try to restart the haproxy cartridge -> gear restart --cart haproxy
4. pgrep -l haproxy and check the new one pid of the haproxy process and you will see that nothing happens

Actual results:
HAProxy doesn't restart

Expected results:
HAProxy does restart

Additional info:
Note that if you remove the haproxy pid file you can be in a situation with two haproxy running in the sametime. Perhaps moreover actions than a pkill should be needed.

Comment 2 Brenton Leanhardt 2015-04-14 12:59:56 UTC
Thanks for filing this.  The haproxy control script the code to stop the process does seem a little strange.  Looking through the git history I expected to see someone add those echo lines at a later date which could have introduced the bug.  That wasn't the case so this is probably just a bug we need to fix.

Comment 6 Timothy Williams 2015-10-13 14:09:42 UTC
This should be fixed with https://github.com/openshift/origin-server/pull/6268

Comment 7 openshift-github-bot 2015-10-13 23:01:45 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/203d57bfddb7b56d18e0832e434d3801fd9def0d
Kill haproxy during restart even when pid does not exist

Bug 1211526
https://bugzilla.redhat.com/show_bug.cgi?id=1211526

The haproxy process was not killed with pkill as expected due to a logging message being reported instead of returning non-zero at the end of the `_stop_haproxy_service` function. `pkill` is now run if the haproxy process still exists after the stop during a restart.

The `_stop_haproxy_ctld` method was setting $pid to the haproxy_ctld pid. This same variable is set in `_stop_haproxy_service` only if the haproxy pid file does exist. If the haproxy pid file doesn't exist, the $pid variable still contains the haproxy_ctld pid, causing us to assume that the haproxy pid file does exist.

The stop function would report that the haproxy instance is stopped when the haproxy pid files does not exist. Now `stop()` attempts to force-kill the haproxy instance if the process is still running, whether the pid file exists or not.

Lastly, modified several `echo` statements to use the sdk's client_* methods. This ensures that the user perfoming these actions using rhc will receive the message in the rhc output.

Comment 11 Gaoyun Pei 2015-11-19 09:41:45 UTC
Verify this bug with openshift-origin-cartridge-haproxy-1.31.4.1-1.el6op.noarch.

Steps:
1. Create a scalable app
2. Delete the haproxy pid file
[test1-yes.ose22-auto.com.cn yes-test1-1]\> rm haproxy/run/haproxy.pid
3. Restart the haproxy
[test1-yes.ose22-auto.com.cn yes-test1-1]\> ctl_app restart haproxy
Cart to restart?
1. ruby-2.0
2. haproxy-1.4
?  2
CLIENT_MESSAGE: Warning: HAProxy process exists without a pid file.
CLIENT_MESSAGE: Could not stop HAProxy. Forcefully killing the process.
CLIENT_RESULT: Restarted HAProxy instance
4. Check the haproxy process
[test1-yes.ose22-auto.com.cn yes-test1-1]\> ps -ef|grep haproxy
1734     26417     1  0 04:35 pts/0    00:00:00 /usr/bin/logshifter -tag haproxy
1734     26418     1  0 04:35 pts/0    00:00:00 /usr/sbin/haproxy -f /var/lib/openshift/yes-test1-1/haproxy//conf/haproxy.cfg
1734     26424     1  0 04:35 pts/0    00:00:00 bash /var/lib/openshift/yes-test1-1/haproxy/usr/bin/haproxy_ctld
1734     26425     1  0 04:35 pts/0    00:00:00 /usr/bin/logshifter -tag haproxy_ctld
1734     26433 26424  0 04:35 pts/0    00:00:00 ruby /var/lib/openshift/yes-test1-1/haproxy/usr/bin/haproxy_ctld.rb
1734     26483 23650  0 04:35 pts/0    00:00:00 grep haproxy
[test1-yes.ose22-auto.com.cn yes-test1-1]\> cat haproxy/run/haproxy.pid 
26418

Comment 13 errata-xmlrpc 2015-12-17 17:09:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2666.html


Note You need to log in before you can comment on or make changes to this bug.