Bug 836395 - Apache 503 after running aeolus-upgrade - httpd restart required
Apache 503 after running aeolus-upgrade - httpd restart required
Status: CLOSED WONTFIX
Product: CloudForms Cloud Engine
Classification: Red Hat
Component: aeolus-conductor (Show other bugs)
1.0.0
Unspecified Unspecified
unspecified Severity medium
: 1.0.1
: ---
Assigned To: Steve Linabery
Rehana
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-28 18:20 EDT by James Laska
Modified: 2013-09-02 03:02 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Conductor traffic passes through httpd+mod_proxy before reaching conductor itself (conductor runs via thin). During upgrade (or running aeolus-restart-services), conductor is restarted. The conductor init script will return successfully before thin has fully loaded the conductor app. Accessing conductor during this brief window will result in an HTTP/503 status code (Temporarily Unavailable). The default behavior of httpd/mod_proxy is to mark a backend worker as disabled for 60 seconds if the backend is determined to be unavailable; once the worker is marked as such, mod_proxy will not attempt to pass any traffic to the backend until the retry period expires. Therefor, if a user attempt to access conductor immediately after it starts, but before it has loaded completely, the user will receive an HTTP/503 error for approximately 60 seconds until the mod_proxy retry interval expires and httpd starts forwarding requests to conductor again.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-08 15:45:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Upgrade output.txt (2.63 KB, text/plain)
2012-06-28 18:20 EDT, James Laska
no flags Details
aeolus-debug.tgz (397.24 KB, application/octet-stream)
2012-06-28 18:21 EDT, James Laska
no flags Details

  None (edit)
Description James Laska 2012-06-28 18:20:38 EDT
Created attachment 595152 [details]
Upgrade output.txt

Description of problem:

After upgrading to CloudForms-1.0.1, and running /usr/share/aeolus-conductor/script/upgrade, the conductor web-ui presents a 503 error.  Restarting httpd resolves the problem and gets conductor working again.

Version-Release number of selected component (if applicable):
 * aeolus-conductor-0.8.33-1.el6cf.src.rpm
 * aeolus-configure-2.5.10-1.el6cf.src.rpm
 * deltacloud-core-0.5.0-10.el6_2.src.rpm
 * rubygem-aeolus-cli-0.3.3-2.el6_2.src.rpm
 * rubygem-aeolus-image-0.3.0-12.el6.src.rpm
 * rubygem-deltacloud-client-0.5.0-2.el6.src.rpm

How reproducible:


Steps to Reproduce:
1. Install CloudForms-1.0
2. Configure rhevm, vsphere and ec2 providers
3. Build/push and deploy images into each provider
4. Upgrade to CloudForms-1.0.1
5. Run /usr/share/aeolus-conductor/script/upgrade
6. Access the conductor web-ui

Actual results:

Step#6 fails with a 503 error

Expected results:

The upgrade script should 
 1) restart httpd in proper order, 
 2) or instruct the admin to restart httpd when complete.

Additional info:

 * See attached command-line output capturing the "Steps to reproduce"
 * See attached aeolus-debug.tgz
 * Restarting httpd resolved the problem

> 10.11.11.9 - - [28/Jun/2012:17:12:06 -0400] "GET /conductor/pools HTTP/1.1" 503 421
> 10.11.11.9 - - [28/Jun/2012:17:12:08 -0400] "GET /conductor/pools HTTP/1.1" 503 421

# service httpd restart

> 10.11.11.9 - - [28/Jun/2012:17:12:15 -0400] "GET /conductor/pools HTTP/1.1" 200 17252
Comment 1 James Laska 2012-06-28 18:21:25 EDT
Created attachment 595153 [details]
aeolus-debug.tgz
Comment 2 Ronelle Landy 2012-06-29 11:39:13 EDT
Followed through the above reproduction steps.

I do get the 'website unavailable' message and then the Apache 503 error if I try access conductor right after running the update script. Refreshing the browser does not help right away.
But, after about a minute,  refresh returned the conductor interface (according to eck, this is expected - copying chat comment ...

<eck> yeah it's 60 seconds - http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass )

Running >> aeolus-restart-services and then accessing conductor right away shows the same behaviour: 
unavailable -> 503 -> conductor shows up

rpms tested:

>> rpm -qa |grep aeolus
aeolus-conductor-0.8.33-1.el6cf.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
aeolus-conductor-daemons-0.8.33-1.el6cf.noarch
aeolus-conductor-doc-0.8.33-1.el6cf.noarch
rubygem-aeolus-cli-0.3.3-2.el6_2.noarch
aeolus-configure-2.5.10-1.el6cf.noarch
aeolus-all-0.8.33-1.el6cf.noarch
Comment 3 John Eckersberg 2012-07-03 10:57:44 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Conductor traffic passes through httpd+mod_proxy before reaching conductor itself (conductor runs via thin).  During upgrade (or running aeolus-restart-services), conductor is restarted.  The conductor init script will return successfully before thin has fully loaded the conductor app.  Accessing conductor during this brief window will result in an HTTP/503 status code (Temporarily Unavailable).  The default behavior of httpd/mod_proxy is to mark a backend worker as disabled for 60 seconds if the backend is determined to be unavailable; once the worker is marked as such, mod_proxy will not attempt to pass any traffic to the backend until the retry period expires.  Therefor, if a user attempt to access conductor immediately after it starts, but before it has loaded completely, the user will receive an HTTP/503 error for approximately 60 seconds until the mod_proxy retry interval expires and httpd starts forwarding requests to conductor again.
Comment 5 Mike Orazi 2012-08-03 13:22:32 EDT
Should this be closed as the release notes were added or are we trying to track something additional here?
Comment 6 James Laska 2012-08-03 14:57:26 EDT
(In reply to comment #5)
> Should this be closed as the release notes were added or are we trying to
> track something additional here?

Is there any interest in eliminating, or reducing, the 60 window where conductor may return a HTTP/503?  

If not, I believe the release note that Dan added documents this behavior.  With that in place, we can CLOSED NOTABUG.
Comment 7 Steve Linabery 2012-08-08 15:25:36 EDT
Since the backend downtime is unpredictable (less than 60, but how much less?) my vote is CLOSED NOTABUG on this.
Comment 8 James Laska 2012-08-08 15:45:14 EDT
By the power vested in me ... I close this bug and cast it out!

I changed my mind and went with CLOSED WONTFIX.  I'm waffling between that an NOTABUG.  Either way ... we aren't going to fix it.

Note You need to log in before you can comment on or make changes to this bug.