1123054 – race condition in haproxy reload

Bug 1123054 - race condition in haproxy reload

Summary: race condition in haproxy reload

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Image
Sub Component:
Version:	1.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Ben Parees
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1123077
TreeView+	depends on / blocked

Reported:	2014-07-24 18:08 UTC by Andy Grimm
Modified:	2018-12-06 17:27 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1123077 (view as bug list)
Environment:
Last Closed:	2014-10-10 00:49:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andy Grimm 2014-07-24 18:08:45 UTC

Description of problem:

If update-cluster is called twice in a short span of time, it can expose a race condition in the haproxy control script's reload function.  The reload process is currently this:

1) grab the current PID from haproxy/run/haproxy.pid
2) pings all of the scaled web gears to ensure they are "awake" (!!!)
3) sets up logshifter
4) executes haproxy with "-sf <PID>" to cause the old haproxy instance to finish handling the current requests and exit
5) writes the new PID to haproxy/run/haproxy.pid 

If another process calls "haproxy/bin/control reload" during the execution of steps 2 through 4, it will get the wrong PID from the PID file, so it will start without signaling the previous process to terminate.

Version-Release number of selected component (if applicable):

openshift-origin-cartridge-haproxy-1.25.3-1.el6oso.noarch

How reproducible:

Often

Steps to Reproduce:

I do not have exact reproduction steps for a real-world scenario.  We've seen this happen when an application scales to 10 gears or so, which makes sense, since that makes the "ping_server_gears" process take longer to execute.  Having a deliberately slow-loading root URL in your web app would also make this easier to reproduce.

You can trivially reproduce most of the time with this script:

#!/bin/bash
gear reload --cart haproxy-1.4 &
gear reload --cart haproxy-1.4

Comment 1 Ben Parees 2014-07-25 20:05:36 UTC

https://github.com/openshift/origin-server/pull/5657

Comment 2 Meng Bo 2014-07-28 05:36:17 UTC

I can reproduce this on devenv_5011, with 10 gears scalable app, reload haproxy twice at the same time will cause multiple haproxy processes exist in the gear.

Comment 3 Ben Parees 2014-07-28 12:30:20 UTC

Meng did you intend to mark this failedqa then?

Comment 4 Ben Parees 2014-07-28 13:06:21 UTC

fyi this appears to have gotten hung up in the merge queue and only made it into the build this morning, in devenv_5013.

Comment 5 Meng Bo 2014-07-29 02:19:02 UTC

Ben, yeah, I left the status as ON_QA since the PR was not merged yesterday when I try.

Checked again on devenv_5020 with same steps in comment#2. No such issue anymore.

Move bug to verified.

Comment 6 Chris Ryan 2015-04-03 20:49:38 UTC

Still seeing multiple haproxy processes intermittently in devenv_5489 (ami-3c9aae54), you may have to do the above reproducible steps a few times:

\> ps -ef | grep haproxy
      1000     12591     1  0 16:26 ?        00:00:00 bash /var/lib/openshift/551ef750e54eae062f000297/haproxy/usr/bin/haproxy_ctld
      1000     12592     1  0 16:26 ?        00:00:00 /usr/bin/logshifter -tag haproxy_ctld
      1000     12598 12591  0 16:26 ?        00:00:00 ruby /var/lib/openshift/551ef750e54eae062f000297/haproxy/usr/bin/haproxy_ctld.rb
      1000     19898     1  0 16:28 ?        00:00:00 /usr/bin/logshifter -tag haproxy
      1000     19899     1  0 16:28 ?        00:00:00 /usr/sbin/haproxy -f /var/lib/openshift/551ef750e54eae062f000297/haproxy//conf/haproxy.cfg -sf 19711
      1000     19935     1  0 16:28 ?        00:00:00 /usr/bin/logshifter -tag haproxy
      1000     19936     1  0 16:28 ?        00:00:00 /usr/sbin/haproxy -f /var/lib/openshift/551ef750e54eae062f000297/haproxy//conf/haproxy.cfg -sf 19899

Note You need to log in before you can comment on or make changes to this bug.