Description of problem:
If update-cluster is called twice in a short span of time, it can expose a race condition in the haproxy control script's reload function. The reload process is currently this:
1) grab the current PID from haproxy/run/haproxy.pid
2) pings all of the scaled web gears to ensure they are "awake" (!!!)
3) sets up logshifter
4) executes haproxy with "-sf <PID>" to cause the old haproxy instance to finish handling the current requests and exit
5) writes the new PID to haproxy/run/haproxy.pid
If another process calls "haproxy/bin/control reload" during the execution of steps 2 through 4, it will get the wrong PID from the PID file, so it will start without signaling the previous process to terminate.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
I do not have exact reproduction steps for a real-world scenario. We've seen this happen when an application scales to 10 gears or so, which makes sense, since that makes the "ping_server_gears" process take longer to execute. Having a deliberately slow-loading root URL in your web app would also make this easier to reproduce.
You can trivially reproduce most of the time with this script:
gear reload --cart haproxy-1.4 &
gear reload --cart haproxy-1.4
I can reproduce this on devenv_5011, with 10 gears scalable app, reload haproxy twice at the same time will cause multiple haproxy processes exist in the gear.
Meng did you intend to mark this failedqa then?
fyi this appears to have gotten hung up in the merge queue and only made it into the build this morning, in devenv_5013.
Ben, yeah, I left the status as ON_QA since the PR was not merged yesterday when I try.
Checked again on devenv_5020 with same steps in comment#2. No such issue anymore.
Move bug to verified.
Still seeing multiple haproxy processes intermittently in devenv_5489 (ami-3c9aae54), you may have to do the above reproducible steps a few times:
\> ps -ef | grep haproxy
1000 12591 1 0 16:26 ? 00:00:00 bash /var/lib/openshift/551ef750e54eae062f000297/haproxy/usr/bin/haproxy_ctld
1000 12592 1 0 16:26 ? 00:00:00 /usr/bin/logshifter -tag haproxy_ctld
1000 12598 12591 0 16:26 ? 00:00:00 ruby /var/lib/openshift/551ef750e54eae062f000297/haproxy/usr/bin/haproxy_ctld.rb
1000 19898 1 0 16:28 ? 00:00:00 /usr/bin/logshifter -tag haproxy
1000 19899 1 0 16:28 ? 00:00:00 /usr/sbin/haproxy -f /var/lib/openshift/551ef750e54eae062f000297/haproxy//conf/haproxy.cfg -sf 19711
1000 19935 1 0 16:28 ? 00:00:00 /usr/bin/logshifter -tag haproxy
1000 19936 1 0 16:28 ? 00:00:00 /usr/sbin/haproxy -f /var/lib/openshift/551ef750e54eae062f000297/haproxy//conf/haproxy.cfg -sf 19899