Bug 1242052 - Deployment hangs because no controller services are running
Summary: Deployment hangs because no controller services are running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: Director
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ga
: Director
Assignee: Giulio Fidente
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-10 18:46 UTC by Ben Nemec
Modified: 2015-08-05 13:59 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-heat-templates-0.8.6-42.el7ost
Doc Type: Bug Fix
Doc Text:
The timeout for Pacemaker service start-up was 20 seconds. Sometimes start-up exceeded this time limit and caused hung deployments. This fix increases the timeout to 60 second. Pacemaker services now start correctly and the deployment completes.
Clone Of:
Environment:
Last Closed: 2015-08-05 13:59:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 202085 0 None None None Never
Red Hat Product Errata RHEA-2015:1549 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC

Description Ben Nemec 2015-07-10 18:46:34 UTC
Description of problem: Overcloud deployment hangs.  When looking at the service status on the compute and control nodes, the compute node will have nova-compute hung trying to connect to the controller, while the controller will have no OpenStack services running at all.

I am hitting this on a fairly regular basis with basic 1 control, 1 compute deployments.  It may be mitigated by HA because if one controller fails the deployment can still continue.


Version-Release number of selected component (if applicable): 


How reproducible: Intermittent


Steps to Reproduce:
1. Deploy cloud with director
2. On some percentage of deployments, it will hang with the described symptoms
3.

Actual results: Hung deployment


Expected results: Successful deployment


Additional info: The current theory on this is that pacemaker is timing out starting the services on the controller.  The current timeout is 20 seconds, and we were advised that 60 would be a better value.

Comment 5 Alexander Chuzhoy 2015-07-21 14:41:03 UTC
Verified:

Environment:
instack-undercloud-2.1.2-21.el7ost.noarch

Don't reproduce the issue.

Comment 7 errata-xmlrpc 2015-08-05 13:59:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549


Note You need to log in before you can comment on or make changes to this bug.