Bug 1155794 - [2.1 backport] Race condition in `oo-httpd-singular graceful` when using apache-vhost
Summary: [2.1 backport] Race condition in `oo-httpd-singular graceful` when using apac...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 2.1.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Luke Meyer
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1147054 1151744
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-22 20:07 UTC by Luke Meyer
Modified: 2018-12-09 18:56 UTC (History)
9 users (show)

Fixed In Version: openshift-origin-node-util-1.22.20.5-1.el6op rubygem-openshift-origin-frontend-apachedb-0.4.1.2-1.el6op rubygem-openshift-origin-frontend-apache-vhost-0.5.2.6-1.el6op
Doc Type: Bug Fix
Doc Text:
Previously, there was a race condition when using the apache-vhost front-end server plug-in. If the "oo-httpd-singular graceful" command was run to incorporate one gear vhost update while another gear was creating its vhost configuration, the configuration was left in a bad state and the httpd service would not restart. As a result, the vhost configuration would cease being updated and newly-added gears would be unreachable via the vhost front-end server. If the httpd service was stopped, it would fail to start until the configuration was fixed. This bug fix backports an OpenShift Enterprise 2.2 fix to extend a lock around the call to the oo-httpd-singular command, and as a result the race condition no longer occurs.
Clone Of: 1148192
Environment:
Last Closed: 2014-11-25 18:19:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:1906 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Enterprise 2.1.9 security, bug fix, and enhancement update 2014-11-25 23:19:05 UTC

Description Luke Meyer 2014-10-22 20:07:52 UTC
+++ This bug was initially created as a clone of Bug #1148192 +++

Description of problem:
There appears to be a race condition when using the apache-vhost frontend. If oo-httpd-singluar graceful is run while an application is creating its frontend configuration, the configuration is in a bad state and httpd will not start.

Version-Release number of selected component (if applicable):
2.1.6

How reproducible:
Very Rarely

Steps to Reproduce:
1. Create many applications while removing others
2.
3.

Actual results:
We see the following in the node's platform.log. 
-=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=-
September 22 13:39:32 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=0 out=
September 22 13:39:32 INFO Connecting frontend mapping for 54206ba04970fa5329000003/haproxy: [/haproxy-status] => [127.2.20.3:8080/] with options: {"protocols"=>["http"]}
September 22 13:39:32 WARN V2CartModel#connect_frontend: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `initialize'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `open'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `open'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-frontend-apache-vhost-0.5.2.4/lib/openshift/runtime/frontend/http/plugins/apache-vhost.rb:152:in `block (2 levels) in connect'
  [... backtrace cut for clarity ...]
September 22 13:39:32 ERROR Unexpected error during configure: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf (Errno::ENOENT)
September 22 13:39:32 INFO openshift-agent: request end: action=cartridge_do, requestid=5373aabdbd865534bec108f8bf32d199, senderid=lae-alln-brk02, statuscode=1, data={:time=>nil, :output=>"CLIENT_ERROR: Unexpected error: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf\n", :exitcode=>1, :addtl_params=>nil}
September 22 13:39:32 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=1 out=
September 22 13:39:32 ERROR ERROR: failure from oo-httpd-singular(1): : stdout:  stderr:httpd.worker: Syntax error on line 221 of /etc/httpd/conf/httpd.conf: Syntax error on line 45 of /etc/httpd/conf.d/000001_openshift_origin_frontend_vhost.conf: Syntax error on line 29 of /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_0_54206ba04970fa5329000003.conf: Include directory '/etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003' not found
-=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=-


Expected results:
All applications created/removed successfully

Additional info:
This looks very similar to Bugzilla 1146194 but the same error messages are not observed.

--- Additional comment from Luke Meyer on 2014-10-01 15:23:28 EDT ---

Per bug 1147054 the fix should be coming to the next 2.2 rebase.

--- Additional comment from Luke Meyer on 2014-10-22 16:06:22 EDT ---

In addition to rebase,  https://github.com/openshift/origin-server/pull/5885 is required.

Comment 1 Luke Meyer 2014-10-22 20:09:04 UTC
This bug is for the 2.1.z backport.

Comment 2 Luke Meyer 2014-11-04 19:47:29 UTC
I'm going to hijack this bug and pull in a bunch of related frontend changes that ought to be fixed under vhost. These are the origin-server commits cherry-picked:

commit b2a95ec8848d9e0752f8cce1357c356696e1309a
Author: Rajat Chopra <rchopra>
Date:   Tue Aug 19 12:51:10 2014 -0700

    bz1131404 - ProxyPassReverse fix


commit 6b7c5d7c5fb28312a3d7a99ef45213d37d97ca26
Author: Rajat Chopra <rchopra>
Date:   Wed Aug 20 14:09:31 2014 -0700

    put apache reload in guard of an env variable


commit df4b6b28b7426a53fa24d1c959c32d99face842d
Author: Rajat Chopra <rchopra>
Date:   Thu Aug 21 11:30:40 2014 -0700

    move env var guard for all http plugins and not just the vhost plugin


commit 19b8b8153ac97f1e6b28230588bd4a49c7994799
Author: Rajat Chopra <rchopra>
Date:   Mon Aug 25 14:50:44 2014 -0700

    consistent trailing slashes - bz1133694


commit 3263891fb996164b0d101234ea43e229b2df2609
Author: Dan Mace <ironcladlou>
Date:   Mon Sep 8 10:32:51 2014 -0400

    Apply more restrictive permissions to cert files
    
    Resolve bug https://bugzilla.redhat.com/show_bug.cgi?id=1138652


commit 1cab19a4a88b0717b5199fcf85d148217ff198ca
Author: Rajat Chopra <rchopra>
Date:   Fri Sep 26 11:58:50 2014 -0700

    bz1147054 - use common lockfile


commit dc07a6d177263a128f9b1506db9a7a20e64df451
Author: Rajat Chopra <rchopra>
Date:   Fri Oct 17 13:43:53 2014 -0700

    bz1151744 - wrap the wait for reload to finish inside of the lockfile


commit 5294046eb63f3140a39b5d48065b495532d52159
Author: Rajat Chopra <rchopra>
Date:   Fri Oct 24 16:54:38 2014 -0700

    fix bz 1156361. Race condition between destroy-app and configure.


... thereby addressing at least these related bugs (though not all matter for Enterprise):

 * Bug 1131404 - Wordpress quickstart with phpmyadmin embedded will direct to a wrong page when using vhost plugin
 * Bug 1133694 - App vhost proxy directives need to use slashes consistently
 * Bug 1138652 - SSL alias' certificates are world readable when using vhost frontend plugin
 * Bug 1147054 - graceful restarts with vhost front end cause corrupt configurations under load
 * Bug 1156361 - (AKA) Race condition between destroy-app and configure leaves broken vhost conf

Comment 6 Johnny Liu 2014-11-07 10:00:56 UTC
Reproduce steps:
1. Install env, on node enable vhost frontend, and enable http worker.
2. Open two terminal, ssh into node, and run the following command to monitor the output.
Terminal 1:
# while :; do ps -ef|grep oo-httpd-singular|grep -v grep; sleep 0.5; echo "------------"; done
Terminal 2:
# tailf /var/log/openshift/node/platform.log|grep oo-httpd-singular
3. Create 20 scaling app.
4. Open two terminal on client, on one terminal, create 10 new scaling app in parallel. on another terminal, destroy 20 existing apps created in the above step in parallel.
# for i in {1..10}; do rhc app-create myapp${i} php-5.3 --no-git --no-dns -s & done
# for i in {1..20}; do rhc app delete myappcc${i} --confirm & done


Output:
Step 2:
Terminal 1:
<--snip-->
------------
root     31744  1196  8 15:43 ?        00:00:00 ruby /usr/sbin/oo-httpd-singular graceful
root     31891  1196  7 15:43 ?        00:00:00 ruby /usr/sbin/oo-httpd-singular graceful
root     31947  1196 11 15:43 ?        00:00:00 ruby /usr/sbin/oo-httpd-singular graceful
root     32179  1196 18 15:43 ?        00:00:00 ruby /usr/sbin/oo-httpd-singular graceful
root     32184  1196 21 15:43 ?        00:00:00 ruby /usr/sbin/oo-httpd-singular graceful
------------
<--snip-->
Terminal 2:
November 07 15:55:23 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=1 out=
November 07 15:55:23 ERROR ERROR: failure from oo-httpd-singular(1): : stdout:  stderr:httpd.worker: Syntax error on line 221 of /etc/httpd/conf/httpd.conf: Syntax error on line 53 of /etc/httpd/conf.d/000001_openshift_origin_frontend_vhost.conf: Could not open configuration file /etc/httpd/conf.d/openshift/545c791987692be53a000010_jialiu_0_myapp4.conf: No such file or directory
November 07 15:55:25 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=0 out=


The above output indicate that multiple "oo-httpd-singular graceful" processes are triggered at the same time, that cause the failure in the output of terminal 2.

NOTE:
The above steps maybe can not reproduce this issue with 100% percent, could try several times.

Verify this bug using the above steps with 2.1.z/2014-11-06.1 for 3 times, no error is found in platform.log, so PASS.

Comment 8 errata-xmlrpc 2014-11-25 18:19:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2014-1906.html


Note You need to log in before you can comment on or make changes to this bug.