Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1123077 - race condition in haproxy reload
race condition in haproxy reload
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image (Show other bugs)
2.1.0
Unspecified Unspecified
high Severity medium
: ---
: ---
Assigned To: Brenton Leanhardt
libra bugs
: Upstream
Depends On: 1123054
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-24 15:20 EDT by Brenton Leanhardt
Modified: 2014-08-26 09:52 EDT (History)
10 users (show)

See Also:
Fixed In Version: openshift-origin-cartridge-haproxy-1.23.5.5-1.el6op
Doc Type: Bug Fix
Doc Text:
The HAProxy cartridge was missing locking in its reload logic, and a race condition could lead to multiple HAProxy processes inside a gear. This bug fix updates the HAproxy cartridge to add proper locking, and this issue no longer occurs as a result. A cartridge upgrade is required after applying this fix.
Story Points: ---
Clone Of: 1123054
Environment:
Last Closed: 2014-08-26 09:52:56 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1095 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.1.5 bug fix and enhancement update 2014-08-26 13:51:34 EDT

  None (edit)
Description Brenton Leanhardt 2014-07-24 15:20:03 EDT
+++ This bug was initially created as a clone of Bug #1123054 +++

Description of problem:

If update-cluster is called twice in a short span of time, it can expose a race condition in the haproxy control script's reload function.  The reload process is currently this:

1) grab the current PID from haproxy/run/haproxy.pid
2) pings all of the scaled web gears to ensure they are "awake" (!!!)
3) sets up logshifter
4) executes haproxy with "-sf <PID>" to cause the old haproxy instance to finish handling the current requests and exit
5) writes the new PID to haproxy/run/haproxy.pid 

If another process calls "haproxy/bin/control reload" during the execution of steps 2 through 4, it will get the wrong PID from the PID file, so it will start without signaling the previous process to terminate.

Version-Release number of selected component (if applicable):

openshift-origin-cartridge-haproxy-1.25.3-1.el6oso.noarch

How reproducible:

Often

Steps to Reproduce:

I do not have exact reproduction steps for a real-world scenario.  We've seen this happen when an application scales to 10 gears or so, which makes sense, since that makes the "ping_server_gears" process take longer to execute.  Having a deliberately slow-loading root URL in your web app would also make this easier to reproduce.

You can trivially reproduce most of the time with this script:

#!/bin/bash
gear reload --cart haproxy-1.4 &
gear reload --cart haproxy-1.4
Comment 1 Brenton Leanhardt 2014-08-01 10:06:09 EDT
Upstream commit:

commit e1f10426e72aef21f738a80fed8f16ce08845886
Author: Ben Parees <bparees@redhat.com>
Date:   Thu Jul 24 15:14:48 2014 -0400

    race condition in haproxy reload
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1123054
Comment 4 Gaoyun Pei 2014-08-04 03:36:01 EDT
Verify this bug with openshift-origin-cartridge-haproxy-1.23.5.5-1.el6op.noarch
Related code has been merged into this package. 

1. Create a scalable app, scale it up to 10 gears.
2. ssh into the app, execute the script below for several times
  #!/bin/bash
  gear reload --cart haproxy-1.4 &
  gear reload --cart haproxy-1.4

  No multiple haproxy processes running in gears.
Comment 6 errata-xmlrpc 2014-08-26 09:52:56 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1095.html

Note You need to log in before you can comment on or make changes to this bug.