+++ This bug was initially created as a clone of Bug #1148192 +++ Description of problem: There appears to be a race condition when using the apache-vhost frontend. If oo-httpd-singluar graceful is run while an application is creating its frontend configuration, the configuration is in a bad state and httpd will not start. Version-Release number of selected component (if applicable): 2.1.6 How reproducible: Very Rarely Steps to Reproduce: 1. Create many applications while removing others 2. 3. Actual results: We see the following in the node's platform.log. -=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=- September 22 13:39:32 INFO Shell command '/usr/sbin/oo-httpd-singular graceful' ran. rc=0 out= September 22 13:39:32 INFO Connecting frontend mapping for 54206ba04970fa5329000003/haproxy: [/haproxy-status] => [127.2.20.3:8080/] with options: {"protocols"=>["http"]} September 22 13:39:32 WARN V2CartModel#connect_frontend: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `initialize' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `open' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.22.5.11/lib/openshift-origin-common/utils/file_needs_sync.rb:36:in `open' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-frontend-apache-vhost-0.5.2.4/lib/openshift/runtime/frontend/http/plugins/apache-vhost.rb:152:in `block (2 levels) in connect' [... backtrace cut for clarity ...] September 22 13:39:32 ERROR Unexpected error during configure: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf (Errno::ENOENT) September 22 13:39:32 INFO openshift-agent: request end: action=cartridge_do, requestid=5373aabdbd865534bec108f8bf32d199, senderid=lae-alln-brk02, statuscode=1, data={:time=>nil, :output=>"CLIENT_ERROR: Unexpected error: No such file or directory - /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003/599984_element-_haproxy-status.conf\n", :exitcode=>1, :addtl_params=>nil} September 22 13:39:32 INFO Shell command '/usr/sbin/oo-httpd-singular graceful' ran. rc=1 out= September 22 13:39:32 ERROR ERROR: failure from oo-httpd-singular(1): : stdout: stderr:httpd.worker: Syntax error on line 221 of /etc/httpd/conf/httpd.conf: Syntax error on line 45 of /etc/httpd/conf.d/000001_openshift_origin_frontend_vhost.conf: Syntax error on line 29 of /etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_0_54206ba04970fa5329000003.conf: Include directory '/etc/httpd/conf.d/openshift/54206ba04970fa5329000003_e2e_54206ba04970fa5329000003' not found -=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=- Expected results: All applications created/removed successfully Additional info: This looks very similar to Bugzilla 1146194 but the same error messages are not observed. --- Additional comment from Luke Meyer on 2014-10-01 15:23:28 EDT --- Per bug 1147054 the fix should be coming to the next 2.2 rebase. --- Additional comment from Luke Meyer on 2014-10-22 16:06:22 EDT --- In addition to rebase, https://github.com/openshift/origin-server/pull/5885 is required.
This bug is for the 2.1.z backport.
I'm going to hijack this bug and pull in a bunch of related frontend changes that ought to be fixed under vhost. These are the origin-server commits cherry-picked: commit b2a95ec8848d9e0752f8cce1357c356696e1309a Author: Rajat Chopra <rchopra> Date: Tue Aug 19 12:51:10 2014 -0700 bz1131404 - ProxyPassReverse fix commit 6b7c5d7c5fb28312a3d7a99ef45213d37d97ca26 Author: Rajat Chopra <rchopra> Date: Wed Aug 20 14:09:31 2014 -0700 put apache reload in guard of an env variable commit df4b6b28b7426a53fa24d1c959c32d99face842d Author: Rajat Chopra <rchopra> Date: Thu Aug 21 11:30:40 2014 -0700 move env var guard for all http plugins and not just the vhost plugin commit 19b8b8153ac97f1e6b28230588bd4a49c7994799 Author: Rajat Chopra <rchopra> Date: Mon Aug 25 14:50:44 2014 -0700 consistent trailing slashes - bz1133694 commit 3263891fb996164b0d101234ea43e229b2df2609 Author: Dan Mace <ironcladlou> Date: Mon Sep 8 10:32:51 2014 -0400 Apply more restrictive permissions to cert files Resolve bug https://bugzilla.redhat.com/show_bug.cgi?id=1138652 commit 1cab19a4a88b0717b5199fcf85d148217ff198ca Author: Rajat Chopra <rchopra> Date: Fri Sep 26 11:58:50 2014 -0700 bz1147054 - use common lockfile commit dc07a6d177263a128f9b1506db9a7a20e64df451 Author: Rajat Chopra <rchopra> Date: Fri Oct 17 13:43:53 2014 -0700 bz1151744 - wrap the wait for reload to finish inside of the lockfile commit 5294046eb63f3140a39b5d48065b495532d52159 Author: Rajat Chopra <rchopra> Date: Fri Oct 24 16:54:38 2014 -0700 fix bz 1156361. Race condition between destroy-app and configure. ... thereby addressing at least these related bugs (though not all matter for Enterprise): * Bug 1131404 - Wordpress quickstart with phpmyadmin embedded will direct to a wrong page when using vhost plugin * Bug 1133694 - App vhost proxy directives need to use slashes consistently * Bug 1138652 - SSL alias' certificates are world readable when using vhost frontend plugin * Bug 1147054 - graceful restarts with vhost front end cause corrupt configurations under load * Bug 1156361 - (AKA) Race condition between destroy-app and configure leaves broken vhost conf
Reproduce steps: 1. Install env, on node enable vhost frontend, and enable http worker. 2. Open two terminal, ssh into node, and run the following command to monitor the output. Terminal 1: # while :; do ps -ef|grep oo-httpd-singular|grep -v grep; sleep 0.5; echo "------------"; done Terminal 2: # tailf /var/log/openshift/node/platform.log|grep oo-httpd-singular 3. Create 20 scaling app. 4. Open two terminal on client, on one terminal, create 10 new scaling app in parallel. on another terminal, destroy 20 existing apps created in the above step in parallel. # for i in {1..10}; do rhc app-create myapp${i} php-5.3 --no-git --no-dns -s & done # for i in {1..20}; do rhc app delete myappcc${i} --confirm & done Output: Step 2: Terminal 1: <--snip--> ------------ root 31744 1196 8 15:43 ? 00:00:00 ruby /usr/sbin/oo-httpd-singular graceful root 31891 1196 7 15:43 ? 00:00:00 ruby /usr/sbin/oo-httpd-singular graceful root 31947 1196 11 15:43 ? 00:00:00 ruby /usr/sbin/oo-httpd-singular graceful root 32179 1196 18 15:43 ? 00:00:00 ruby /usr/sbin/oo-httpd-singular graceful root 32184 1196 21 15:43 ? 00:00:00 ruby /usr/sbin/oo-httpd-singular graceful ------------ <--snip--> Terminal 2: November 07 15:55:23 INFO Shell command '/usr/sbin/oo-httpd-singular graceful' ran. rc=1 out= November 07 15:55:23 ERROR ERROR: failure from oo-httpd-singular(1): : stdout: stderr:httpd.worker: Syntax error on line 221 of /etc/httpd/conf/httpd.conf: Syntax error on line 53 of /etc/httpd/conf.d/000001_openshift_origin_frontend_vhost.conf: Could not open configuration file /etc/httpd/conf.d/openshift/545c791987692be53a000010_jialiu_0_myapp4.conf: No such file or directory November 07 15:55:25 INFO Shell command '/usr/sbin/oo-httpd-singular graceful' ran. rc=0 out= The above output indicate that multiple "oo-httpd-singular graceful" processes are triggered at the same time, that cause the failure in the output of terminal 2. NOTE: The above steps maybe can not reproduce this issue with 100% percent, could try several times. Verify this bug using the above steps with 2.1.z/2014-11-06.1 for 3 times, no error is found in platform.log, so PASS.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2014-1906.html