Bug 1148192 - Race condition in `oo-httpd-singular graceful` when using apache-vhost
Summary: Race condition in `oo-httpd-singular graceful` when using apache-vhost
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 2.1.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Luke Meyer
QA Contact: libra bugs
URL:
Whiteboard:
: 1154645 (view as bug list)
Depends On: 1147054 1151744
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-30 22:13 UTC by Timothy Williams
Modified: 2018-12-09 18:43 UTC (History)
7 users (show)

Fixed In Version: openshift-origin-node-util-1.30.3.1-1.el6op
Doc Type: Bug Fix
Doc Text:
Cause: There was a race condition when using the apache-vhost frontend. If "oo-httpd-singular graceful" is run to incorporate one gear vhost update while another gear is creating its vhost configuration, the configuration is left in a bad state and httpd will not (re)start. Consequence: When this condition is hit, vhost configuration will cease being updated and newly-added gears will be unreachable via the vhost frontend. If httpd is stopped, it will fail to start until the config is fixed. Fix: A lock was extended around the call to oo-httpd-singular preventing the race condition. Result: This should no longer occur.
Clone Of:
: 1148418 1155794 (view as bug list)
Environment:
Last Closed: 2014-11-03 19:55:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:1796 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Enterprise 2.2 Release Advisory 2014-11-04 00:52:02 UTC

Description Timothy Williams 2014-09-30 22:13:24 UTC
Description of problem:
There appears to be a race condition when using the apache-vhost frontend. If oo-httpd-singluar graceful is run while an application is creating its frontend configuration, the configuration is in a bad state and httpd will not start.

Version-Release number of selected component (if applicable):
2.1.6

How reproducible:
Very Rarely

Steps to Reproduce:
1. Create many applications while removing others
2.
3.

Actual results:
We see the following in the node's platform.log. 
-=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=-
September 22 13:39:32 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=0 out=
September 22 13:39:32 INFO Connecting frontend mapping for 54206ba04970fa5329000003/haproxy: [/haproxy-status] => [127.2.20.3:8080/] with options: {"protocols"=>["http"]}
September 22 13:39:32 WARN V2CartModel#connect_frontend: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `initialize'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `open'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `open'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-frontend-apache-vhost-0.5.2.4/lib/openshift/runtime/frontend/http/plugins/apache-vhost.rb:152:in `block (2 levels) in connect'
  [... backtrace cut for clarity ...]
September 22 13:39:32 ERROR Unexpected error during configure: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf (Errno::ENOENT)
September 22 13:39:32 INFO openshift-agent: request end: action=cartridge_do, requestid=5373aabdbd865534bec108f8bf32d199, senderid=lae-alln-brk02, statuscode=1, data={:time=>nil, :output=>"CLIENT_ERROR: Unexpected error: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf\n", :exitcode=>1, :addtl_params=>nil}
September 22 13:39:32 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=1 out=
September 22 13:39:32 ERROR ERROR: failure from oo-httpd-singular(1): : stdout:  stderr:httpd.worker: Syntax error on line 221 of /etc/httpd/conf/httpd.conf: Syntax error on line 45 of /etc/httpd/conf.d/000001_openshift_origin_frontend_vhost.conf: Syntax error on line 29 of /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_0_54206ba04970fa5329000003.conf: Include directory '/etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003' not found
-=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=-


Expected results:
All applications created/removed successfully

Additional info:
This looks very similar to Bugzilla 1146194 but the same error messages are not observed.

Comment 3 Luke Meyer 2014-10-01 19:23:28 UTC
Per bug 1147054 the fix should be coming to the next 2.2 rebase.

Comment 6 Luke Meyer 2014-10-22 20:06:22 UTC
In addition to rebase,  https://github.com/openshift/origin-server/pull/5885 is required.

Comment 7 Luke Meyer 2014-10-22 20:13:48 UTC
origin-server cherrypick:
commit dc07a6d177263a128f9b1506db9a7a20e64df451
Author: Rajat Chopra <rchopra>
Date:   Fri Oct 17 13:43:53 2014 -0700
    bz1151744 - wrap the wait for reload to finish inside of the lockfile

Comment 11 Anping Li 2014-10-24 12:22:21 UTC
Verified and pass
1) enable httpd.worker
[root@node1 ~]# ps -ef|grep httpd
root     27730     1  0 05:01 ?        00:00:00 /usr/sbin/httpd.worker
apache   27732 27730  0 05:01 ?        00:00:00 /usr/sbin/httpd.worker
apache   27733 27730  0 05:01 ?        00:00:00 /usr/sbin/httpd.worker
apache   27735 27730  0 05:01 ?        00:00:00 /usr/sbin/httpd.worker
root     29466 26319  0 05:04 pts/0    00:00:00 grep httpd

2)  Create many applications while removing others  and run test regression testing.

3) check platform.log. oo-httpd-singular was executed and there isn't singular  error  was reported.
[root@node1 node]# cat platform.log|grep oo-httpd
October 24 05:12:57 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=0 out=
October 24 05:12:58 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=0 out=
October 24 05:13:00 INFO Shell command '/usr/sbin/oo-httpd-singular  graceful' ran. rc=0 out

[root@node1 node]# grep error platform.log
[root@node1 node]# grep warn platform.log
git archive --format=tar master | (cd /var/lib/openshift/544a4251e5fed5c217000186/app-root/runtime/repo && tar --warning=no-timestamp -xf -);

Comment 12 Luke Meyer 2014-10-31 15:08:19 UTC
*** Bug 1154645 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2014-11-03 19:55:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2014-1796.html


Note You need to log in before you can comment on or make changes to this bug.