Bug 1285363

Summary: Deployment failure "httpd never started after 200 seconds"
Product: Red Hat OpenStack Reporter: Jiri Stransky <jstransk>
Component: openstack-tripleo-heat-templatesAssignee: Jiri Stransky <jstransk>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: dnavale, jcoufal, jslagle, jstransk, mburns, rhel-osp-director-maint, sasha, yeylon
Target Milestone: y2   
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-85.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-21 16:53:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jiri Stransky 2015-11-25 13:15:05 UTC
A deployment failed with this message in os-collect-config log:

Nov 24 18:09:38 overcloud-controller-0.localdomain 
os-collect-config[2921]: httpd not yet started, sleeping 3 seconds.
Nov 24 18:09:38 overcloud-controller-0.localdomain 
os-collect-config[2921]: httpd not yet started, sleeping 3 seconds.
Nov 24 18:09:38 overcloud-controller-0.localdomain 
os-collect-config[2921]: httpd never started after 200 seconds

However, when the environment was investigated, all services were already up and running.

[root@overcloud-controller-0 ~]# pcs status | grep Stopped -C2
[root@overcloud-controller-0 ~]#

There were a few monitor action timeouts in pcmk, but no start/stop timeouts. The actual httpd start time on one of the controllers exceeded the timeout by about 10 seconds, causing the deployment to fail:

Nov 24 18:09:31 overcloud-controller-0.localdomain crmd[29936]: notice: 
Operation httpd_start_0: ok (node=overcloud-controller-0, call=430, 
rc=0, cib-update=246, confirmed=true)

Nov 24 18:09:49 overcloud-controller-1.localdomain crmd[29784]: notice: 
Operation httpd_start_0: ok (node=overcloud-controller-1, call=425, 
rc=0, cib-update=403, confirmed=true)

^^ this one timed out

Nov 24 18:09:07 overcloud-controller-2.localdomain crmd[29500]: notice: 
Operation httpd_start_0: ok (node=overcloud-controller-2, call=422, 
rc=0, cib-update=270, confirmed=true)


The current timeout values are probably too aggressive for slow virtualized environments, and should be bumped up.

Comment 1 Jiri Stransky 2015-11-25 16:50:01 UTC
*** Bug 1284121 has been marked as a duplicate of this bug. ***

Comment 4 Alexander Chuzhoy 2015-12-03 16:07:41 UTC
Verified:

Environment:
openstack-tripleo-heat-templates-0.8.6-85.el7ost.noarch


The reported issue doesn't reproduce. Able to deploy HA.

Comment 8 errata-xmlrpc 2015-12-21 16:53:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650