Bug 836395 - Apache 503 after running aeolus-upgrade - httpd restart required
Summary: Apache 503 after running aeolus-upgrade - httpd restart required
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: aeolus-conductor
Version: 1.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: 1.0.1
Assignee: Steve Linabery
QA Contact: Rehana
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-28 22:20 UTC by James Laska
Modified: 2013-09-02 07:02 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Conductor traffic passes through httpd+mod_proxy before reaching conductor itself (conductor runs via thin). During upgrade (or running aeolus-restart-services), conductor is restarted. The conductor init script will return successfully before thin has fully loaded the conductor app. Accessing conductor during this brief window will result in an HTTP/503 status code (Temporarily Unavailable). The default behavior of httpd/mod_proxy is to mark a backend worker as disabled for 60 seconds if the backend is determined to be unavailable; once the worker is marked as such, mod_proxy will not attempt to pass any traffic to the backend until the retry period expires. Therefor, if a user attempt to access conductor immediately after it starts, but before it has loaded completely, the user will receive an HTTP/503 error for approximately 60 seconds until the mod_proxy retry interval expires and httpd starts forwarding requests to conductor again.
Clone Of:
Environment:
Last Closed: 2012-08-08 19:45:14 UTC
Embargoed:


Attachments (Terms of Use)
Upgrade output.txt (2.63 KB, text/plain)
2012-06-28 22:20 UTC, James Laska
no flags Details
aeolus-debug.tgz (397.24 KB, application/octet-stream)
2012-06-28 22:21 UTC, James Laska
no flags Details

Description James Laska 2012-06-28 22:20:38 UTC
Created attachment 595152 [details]
Upgrade output.txt

Description of problem:

After upgrading to CloudForms-1.0.1, and running /usr/share/aeolus-conductor/script/upgrade, the conductor web-ui presents a 503 error.  Restarting httpd resolves the problem and gets conductor working again.

Version-Release number of selected component (if applicable):
 * aeolus-conductor-0.8.33-1.el6cf.src.rpm
 * aeolus-configure-2.5.10-1.el6cf.src.rpm
 * deltacloud-core-0.5.0-10.el6_2.src.rpm
 * rubygem-aeolus-cli-0.3.3-2.el6_2.src.rpm
 * rubygem-aeolus-image-0.3.0-12.el6.src.rpm
 * rubygem-deltacloud-client-0.5.0-2.el6.src.rpm

How reproducible:


Steps to Reproduce:
1. Install CloudForms-1.0
2. Configure rhevm, vsphere and ec2 providers
3. Build/push and deploy images into each provider
4. Upgrade to CloudForms-1.0.1
5. Run /usr/share/aeolus-conductor/script/upgrade
6. Access the conductor web-ui

Actual results:

Step#6 fails with a 503 error

Expected results:

The upgrade script should 
 1) restart httpd in proper order, 
 2) or instruct the admin to restart httpd when complete.

Additional info:

 * See attached command-line output capturing the "Steps to reproduce"
 * See attached aeolus-debug.tgz
 * Restarting httpd resolved the problem

> 10.11.11.9 - - [28/Jun/2012:17:12:06 -0400] "GET /conductor/pools HTTP/1.1" 503 421
> 10.11.11.9 - - [28/Jun/2012:17:12:08 -0400] "GET /conductor/pools HTTP/1.1" 503 421

# service httpd restart

> 10.11.11.9 - - [28/Jun/2012:17:12:15 -0400] "GET /conductor/pools HTTP/1.1" 200 17252

Comment 1 James Laska 2012-06-28 22:21:25 UTC
Created attachment 595153 [details]
aeolus-debug.tgz

Comment 2 Ronelle Landy 2012-06-29 15:39:13 UTC
Followed through the above reproduction steps.

I do get the 'website unavailable' message and then the Apache 503 error if I try access conductor right after running the update script. Refreshing the browser does not help right away.
But, after about a minute,  refresh returned the conductor interface (according to eck, this is expected - copying chat comment ...

<eck> yeah it's 60 seconds - http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass )

Running >> aeolus-restart-services and then accessing conductor right away shows the same behaviour: 
unavailable -> 503 -> conductor shows up

rpms tested:

>> rpm -qa |grep aeolus
aeolus-conductor-0.8.33-1.el6cf.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
aeolus-conductor-daemons-0.8.33-1.el6cf.noarch
aeolus-conductor-doc-0.8.33-1.el6cf.noarch
rubygem-aeolus-cli-0.3.3-2.el6_2.noarch
aeolus-configure-2.5.10-1.el6cf.noarch
aeolus-all-0.8.33-1.el6cf.noarch

Comment 3 John Eckersberg 2012-07-03 14:57:44 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Conductor traffic passes through httpd+mod_proxy before reaching conductor itself (conductor runs via thin).  During upgrade (or running aeolus-restart-services), conductor is restarted.  The conductor init script will return successfully before thin has fully loaded the conductor app.  Accessing conductor during this brief window will result in an HTTP/503 status code (Temporarily Unavailable).  The default behavior of httpd/mod_proxy is to mark a backend worker as disabled for 60 seconds if the backend is determined to be unavailable; once the worker is marked as such, mod_proxy will not attempt to pass any traffic to the backend until the retry period expires.  Therefor, if a user attempt to access conductor immediately after it starts, but before it has loaded completely, the user will receive an HTTP/503 error for approximately 60 seconds until the mod_proxy retry interval expires and httpd starts forwarding requests to conductor again.

Comment 5 Mike Orazi 2012-08-03 17:22:32 UTC
Should this be closed as the release notes were added or are we trying to track something additional here?

Comment 6 James Laska 2012-08-03 18:57:26 UTC
(In reply to comment #5)
> Should this be closed as the release notes were added or are we trying to
> track something additional here?

Is there any interest in eliminating, or reducing, the 60 window where conductor may return a HTTP/503?  

If not, I believe the release note that Dan added documents this behavior.  With that in place, we can CLOSED NOTABUG.

Comment 7 Steve Linabery 2012-08-08 19:25:36 UTC
Since the backend downtime is unpredictable (less than 60, but how much less?) my vote is CLOSED NOTABUG on this.

Comment 8 James Laska 2012-08-08 19:45:14 UTC
By the power vested in me ... I close this bug and cast it out!

I changed my mind and went with CLOSED WONTFIX.  I'm waffling between that an NOTABUG.  Either way ... we aren't going to fix it.


Note You need to log in before you can comment on or make changes to this bug.