Bug 1286774 - Overcloud update might fail (abort) prematurely because of short timeout (90mins)
Summary: Overcloud update might fail (abort) prematurely because of short timeout (90m...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: y2
: 7.0 (Kilo)
Assignee: Dougal Matthews
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-30 17:19 UTC by Giulio Fidente
Modified: 2015-12-21 16:54 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-common-0.0.1.dev6-5.git49b57eb.el7ost
Doc Type: Bug Fix
Doc Text:
The timeout for Overcloud updates was set to 90 minutes. This caused incomplete updates that took longer than 90 minutes. This fix extends the timeout to 240 minutes. Longer updates now complete successfully.
Clone Of:
Environment:
Last Closed: 2015-12-21 16:54:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 251818 0 None None None Never
Red Hat Product Errata RHBA-2015:2651 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OSP 7 director Bug Fix Advisory 2015-12-21 21:50:26 UTC

Description Giulio Fidente 2015-11-30 17:19:12 UTC
Description of problem:
Overcloud update attempts will fail when taking longer than 90mins; this can generally be observed by investigating the heat events list:

| overcloud | e661c9a8-b8f7-470e-b7a8-1fecbb79b23f | Stack UPDATE started | UPDATE_IN_PROGRESS | 2015-11-27T18:27:38Z

| ControllerNodesPostDeployment | d951ac6f-a804-4d14-a13c-c54ff56257c9 | UPDATE aborted | UPDATE_FAILED | 2015-11-27T19:57:46Z

None of the resources will be found in UPDATE_FAILED; we used to set a default timeout of 240mins [1] for new deployments but not for updates.

I don't think updates should take a shorter amount of time than new deployments as they can indeed take an even longer time due to yum upgrade / pcmk maintenance steps.

1. https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_deploy.py#L725


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.10-19.el7ost.noarch

Comment 2 Dougal Matthews 2015-12-01 12:01:07 UTC
I've put a fix for this upstream.

Comment 3 James Slagle 2015-12-01 13:47:41 UTC
dougal, thanks for the patch. i'm assigning this one to you :)

Comment 4 James Slagle 2015-12-02 16:58:33 UTC
the assignee for this should still be dougal

Comment 6 Alexander Chuzhoy 2015-12-07 14:13:30 UTC
Verified:

Environment:
openstack-tripleo-common-0.0.1.dev6-5.git49b57eb.el7ost.noarch

Was able to complete an update from 7.1 to 7.2 that took more than 90 minutes.

Comment 8 errata-xmlrpc 2015-12-21 16:54:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651


Note You need to log in before you can comment on or make changes to this bug.