Bug 1242052

Summary: Deployment hangs because no controller services are running
Product: Red Hat OpenStack Reporter: Ben Nemec <bnemec>
Component: rhosp-directorAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: unspecified Docs Contact:
Priority: high    
Version: DirectorCC: dmacpher, mandreou, mburns, ohochman, rhel-osp-director-maint, rrosa, sasha
Target Milestone: gaKeywords: Triaged
Target Release: Director   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-42.el7ost Doc Type: Bug Fix
Doc Text:
The timeout for Pacemaker service start-up was 20 seconds. Sometimes start-up exceeded this time limit and caused hung deployments. This fix increases the timeout to 60 second. Pacemaker services now start correctly and the deployment completes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-05 13:59:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Nemec 2015-07-10 18:46:34 UTC
Description of problem: Overcloud deployment hangs.  When looking at the service status on the compute and control nodes, the compute node will have nova-compute hung trying to connect to the controller, while the controller will have no OpenStack services running at all.

I am hitting this on a fairly regular basis with basic 1 control, 1 compute deployments.  It may be mitigated by HA because if one controller fails the deployment can still continue.


Version-Release number of selected component (if applicable): 


How reproducible: Intermittent


Steps to Reproduce:
1. Deploy cloud with director
2. On some percentage of deployments, it will hang with the described symptoms
3.

Actual results: Hung deployment


Expected results: Successful deployment


Additional info: The current theory on this is that pacemaker is timing out starting the services on the controller.  The current timeout is 20 seconds, and we were advised that 60 would be a better value.

Comment 5 Alexander Chuzhoy 2015-07-21 14:41:03 UTC
Verified:

Environment:
instack-undercloud-2.1.2-21.el7ost.noarch

Don't reproduce the issue.

Comment 7 errata-xmlrpc 2015-08-05 13:59:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549