+++ This bug was initially created as a clone of Bug #1123054 +++ Description of problem: If update-cluster is called twice in a short span of time, it can expose a race condition in the haproxy control script's reload function. The reload process is currently this: 1) grab the current PID from haproxy/run/haproxy.pid 2) pings all of the scaled web gears to ensure they are "awake" (!!!) 3) sets up logshifter 4) executes haproxy with "-sf <PID>" to cause the old haproxy instance to finish handling the current requests and exit 5) writes the new PID to haproxy/run/haproxy.pid If another process calls "haproxy/bin/control reload" during the execution of steps 2 through 4, it will get the wrong PID from the PID file, so it will start without signaling the previous process to terminate. Version-Release number of selected component (if applicable): openshift-origin-cartridge-haproxy-1.25.3-1.el6oso.noarch How reproducible: Often Steps to Reproduce: I do not have exact reproduction steps for a real-world scenario. We've seen this happen when an application scales to 10 gears or so, which makes sense, since that makes the "ping_server_gears" process take longer to execute. Having a deliberately slow-loading root URL in your web app would also make this easier to reproduce. You can trivially reproduce most of the time with this script: #!/bin/bash gear reload --cart haproxy-1.4 & gear reload --cart haproxy-1.4
Upstream commit: commit e1f10426e72aef21f738a80fed8f16ce08845886 Author: Ben Parees <bparees> Date: Thu Jul 24 15:14:48 2014 -0400 race condition in haproxy reload https://bugzilla.redhat.com/show_bug.cgi?id=1123054
Verify this bug with openshift-origin-cartridge-haproxy-1.23.5.5-1.el6op.noarch Related code has been merged into this package. 1. Create a scalable app, scale it up to 10 gears. 2. ssh into the app, execute the script below for several times #!/bin/bash gear reload --cart haproxy-1.4 & gear reload --cart haproxy-1.4 No multiple haproxy processes running in gears.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1095.html