Bug 836395

Summary:

Apache 503 after running aeolus-upgrade - httpd restart required

Product:

[Retired] CloudForms Cloud Engine

Reporter:

James Laska <jlaska>

Component:

aeolus-conductor

Assignee:

Steve Linabery <slinaber>

Status:

CLOSED WONTFIX

QA Contact:

Rehana <aeolus-qa-list>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

1.0.0

CC:

cpelland, dmacpher, jeckersb, jturner, morazi, rlandy

Target Milestone:

1.0.1

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Conductor traffic passes through httpd+mod_proxy before reaching conductor itself (conductor runs via thin). During upgrade (or running aeolus-restart-services), conductor is restarted. The conductor init script will return successfully before thin has fully loaded the conductor app. Accessing conductor during this brief window will result in an HTTP/503 status code (Temporarily Unavailable). The default behavior of httpd/mod_proxy is to mark a backend worker as disabled for 60 seconds if the backend is determined to be unavailable; once the worker is marked as such, mod_proxy will not attempt to pass any traffic to the backend until the retry period expires. Therefor, if a user attempt to access conductor immediately after it starts, but before it has loaded completely, the user will receive an HTTP/503 error for approximately 60 seconds until the mod_proxy retry interval expires and httpd starts forwarding requests to conductor again.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-08-08 19:45:14 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Upgrade output.txt	none
aeolus-debug.tgz	none

Description James Laska 2012-06-28 22:20:38 UTC

Created attachment 595152 [details]
Upgrade output.txt

Description of problem:

After upgrading to CloudForms-1.0.1, and running /usr/share/aeolus-conductor/script/upgrade, the conductor web-ui presents a 503 error.  Restarting httpd resolves the problem and gets conductor working again.

Version-Release number of selected component (if applicable):
 * aeolus-conductor-0.8.33-1.el6cf.src.rpm
 * aeolus-configure-2.5.10-1.el6cf.src.rpm
 * deltacloud-core-0.5.0-10.el6_2.src.rpm
 * rubygem-aeolus-cli-0.3.3-2.el6_2.src.rpm
 * rubygem-aeolus-image-0.3.0-12.el6.src.rpm
 * rubygem-deltacloud-client-0.5.0-2.el6.src.rpm

How reproducible:


Steps to Reproduce:
1. Install CloudForms-1.0
2. Configure rhevm, vsphere and ec2 providers
3. Build/push and deploy images into each provider
4. Upgrade to CloudForms-1.0.1
5. Run /usr/share/aeolus-conductor/script/upgrade
6. Access the conductor web-ui

Actual results:

Step#6 fails with a 503 error

Expected results:

The upgrade script should 
 1) restart httpd in proper order, 
 2) or instruct the admin to restart httpd when complete.

Additional info:

 * See attached command-line output capturing the "Steps to reproduce"
 * See attached aeolus-debug.tgz
 * Restarting httpd resolved the problem

> 10.11.11.9 - - [28/Jun/2012:17:12:06 -0400] "GET /conductor/pools HTTP/1.1" 503 421
> 10.11.11.9 - - [28/Jun/2012:17:12:08 -0400] "GET /conductor/pools HTTP/1.1" 503 421

# service httpd restart

> 10.11.11.9 - - [28/Jun/2012:17:12:15 -0400] "GET /conductor/pools HTTP/1.1" 200 17252

Comment 1 James Laska 2012-06-28 22:21:25 UTC

Created attachment 595153 [details]
aeolus-debug.tgz

Comment 2 Ronelle Landy 2012-06-29 15:39:13 UTC

Followed through the above reproduction steps.

I do get the 'website unavailable' message and then the Apache 503 error if I try access conductor right after running the update script. Refreshing the browser does not help right away.
But, after about a minute,  refresh returned the conductor interface (according to eck, this is expected - copying chat comment ...

<eck> yeah it's 60 seconds - http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass )

Running >> aeolus-restart-services and then accessing conductor right away shows the same behaviour: 
unavailable -> 503 -> conductor shows up

rpms tested:

>> rpm -qa |grep aeolus
aeolus-conductor-0.8.33-1.el6cf.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
aeolus-conductor-daemons-0.8.33-1.el6cf.noarch
aeolus-conductor-doc-0.8.33-1.el6cf.noarch
rubygem-aeolus-cli-0.3.3-2.el6_2.noarch
aeolus-configure-2.5.10-1.el6cf.noarch
aeolus-all-0.8.33-1.el6cf.noarch

Comment 3 John Eckersberg 2012-07-03 14:57:44 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Conductor traffic passes through httpd+mod_proxy before reaching conductor itself (conductor runs via thin).  During upgrade (or running aeolus-restart-services), conductor is restarted.  The conductor init script will return successfully before thin has fully loaded the conductor app.  Accessing conductor during this brief window will result in an HTTP/503 status code (Temporarily Unavailable).  The default behavior of httpd/mod_proxy is to mark a backend worker as disabled for 60 seconds if the backend is determined to be unavailable; once the worker is marked as such, mod_proxy will not attempt to pass any traffic to the backend until the retry period expires.  Therefor, if a user attempt to access conductor immediately after it starts, but before it has loaded completely, the user will receive an HTTP/503 error for approximately 60 seconds until the mod_proxy retry interval expires and httpd starts forwarding requests to conductor again.

Comment 5 Mike Orazi 2012-08-03 17:22:32 UTC

Should this be closed as the release notes were added or are we trying to track something additional here?

Comment 6 James Laska 2012-08-03 18:57:26 UTC

(In reply to comment #5)
> Should this be closed as the release notes were added or are we trying to
> track something additional here?

Is there any interest in eliminating, or reducing, the 60 window where conductor may return a HTTP/503?  

If not, I believe the release note that Dan added documents this behavior.  With that in place, we can CLOSED NOTABUG.

Comment 7 Steve Linabery 2012-08-08 19:25:36 UTC

Since the backend downtime is unpredictable (less than 60, but how much less?) my vote is CLOSED NOTABUG on this.

Comment 8 James Laska 2012-08-08 19:45:14 UTC

By the power vested in me ... I close this bug and cast it out!

I changed my mind and went with CLOSED WONTFIX.  I'm waffling between that an NOTABUG.  Either way ... we aren't going to fix it.