Bug 1289648

Summary:	Resource CREATE failed: MessagingTimeout: resources.Controller.resources[2]: Timed out waiting for a reply to message ID
Product:	Red Hat OpenStack	Reporter:	Ola Pavlenko <opavlenk>
Component:	rhosp-director	Assignee:	chris alfonso <calfonso>
Status:	CLOSED DUPLICATE	QA Contact:	yeylon <yeylon>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.0 (Kilo)	CC:	cylopez, hbrock, mburns, opavlenk, rhel-osp-director-maint, sbaker, shardy, srevivo
Target Milestone:	ga	Keywords:	Regression
Target Release:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-01-07 11:51:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ola Pavlenko 2015-12-08 16:25:41 UTC

Description of problem:
tried to deploy 7.2 on clean Virt env with 3 controllers, 1 compute ans 1 ceph using the following command:
openstack overcloud deploy --templates --control-scale 3 --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml --ntp-server clock.redhat.com --libvirt-type qemu

without any changes in yaml files or any additional config.

deployment failed on 3 controllers


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-87.el7ost.noarch

How reproducible:
100% 
reproduced to me 3 times with this setup

Steps to Reproduce:
1.install latest (Dec 4th) undercloud on RHEL7.2 
2.upload pre-built latest images (from Dec 4), register nodes, (follow the guide)
3. deploy the overcloud with 3 controllers, 1 compute and 1 ceph using the following command:
openstack overcloud deploy --templates --control-scale 3 --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml --ntp-server clock.redhat.com --libvirt-type qemu


Actual results:
Deployment failed:
$ openstack overcloud deploy --templates --control-scale 3 --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml --ntp-server clock.redhat.com --libvirt-type qemu
Deploying templates in the directory /usr/share/openstack-tripleo-heat-templates
Stack failed with status: Resource CREATE failed: MessagingTimeout: resources.Controller.resources[2]: Timed out waiting for a reply to message ID 30aa11ea9d39478d9cefd9243a86f291
ERROR: openstack Heat Stack create failed.


Expected results:
Overcloud successfully deployed

Additional info:
from heat-api.log on instack machine:
2015-12-08 10:42:26.921 28372 INFO oslo_messaging._drivers.impl_rabbit [req-554a14d0-b2af-4ffb-b064-42c4d1e49d38 c6e
5f48f6a6a4fdd8000ca2822088472 110b5499f44a48f19495ed8d9cc11ea9] Connected to AMQP server on 192.0.2.1:5672
2015-12-08 10:42:26.976 28372 DEBUG heat.common.serializers [req-554a14d0-b2af-4ffb-b064-42c4d1e49d38 c6e5f48f6a6a4f
dd8000ca2822088472 110b5499f44a48f19495ed8d9cc11ea9] JSON response : {"explanation": "The resource could not be foun
d.", "code": 404, "error": {"message": "The Stack (overcloud) could not be found.", "traceback": "Traceback (most re
cent call last):\n\n  File \"/usr/lib/python2.7/site-packages/heat/common/context.py\", line 300, in wrapped\n    re
turn func(self, ctx, *args, **kwargs)\n\n  File \"/usr/lib/python2.7/site-packages/heat/engine/service.py\", line 43
4, in identify_stack\n    raise exception.StackNotFound(stack_name=stack_name)\n\nStackNotFound: The Stack (overclou
d) could not be found.\n", "type": "StackNotFound"}, "title": "Not Found"} to_json /usr/lib/python2.7/site-packages/
heat/common/serializers.py:42


from heat-engine.log on instack machine:
2015-12-08 10:50:57.703 28339 INFO heat.engine.resource [-] CREATE: ResourceGroup "Controller" [802e0f08-a865-455b-a
55a-27f08a97118b] Stack "overcloud" [cc05d2af-aa97-47be-bfa2-054e85172bde]
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource Traceback (most recent call last):
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resour
ce.py", line 528, in _action_recorder
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource     yield
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resour
ce.py", line 598, in _do_action
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource     yield self.action_handler_task(action, args=handler_arg
s)
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/schedu
ler.py", line 313, in wrapper
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource     step = next(subtask)
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resour
ce.py", line 572, in action_handler_task
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource     while not check(handler_data):
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resour
ces/stack_resource.py", line 299, in check_create_complete
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource     return self._check_status_complete(resource.Resource.CR
EATE)
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resour
ces/stack_resource.py", line 340, in _check_status_complete
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource     action=action)
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource ResourceFailure: MessagingTimeout: resources.Controller.res
ources[2]: Timed out waiting for a reply to message ID 30aa11ea9d39478d9cefd9243a86f291
2015-12-08 10:50:57.703 28339 TRACE heat.engine.resource

Comment 3 Steve Baker 2015-12-08 23:02:03 UTC

How much memory does your undercloud have. The product documentation states that 6GB RAM is the minimum but I suspect there are still testers running with 4GB underclouds, which would definitely cause random undercloud failures.

Comment 4 Ola Pavlenko 2015-12-09 11:52:13 UTC

(In reply to Steve Baker from comment #3)
> How much memory does your undercloud have. The product documentation states
> that 6GB RAM is the minimum but I suspect there are still testers running
> with 4GB underclouds, which would definitely cause random undercloud
> failures.

instack has 8gb
and each VM 5gb

This setup worked for me with earlier OSP releases.

Comment 5 Ola Pavlenko 2015-12-17 14:03:09 UTC

reproduce to me with latest rhel-osp-director-puddle-2015-12-16-1

It seems that at some point deployment fails on second controller node and doesn't try to deploy on third at all.



same env but this time undercloud vm has 7gb and each vm 5gb

Comment 7 Steven Hardy 2016-01-06 18:13:50 UTC

This sounds similar to https://bugzilla.redhat.com/show_bug.cgi?id=1290949

In that case, the root cause wasn't memory, but attempting to run with a single CPU undercloud - workarounds are noted in that bz.

Comment 8 Cyril Lopez 2016-01-07 08:48:00 UTC

(In reply to Steven Hardy from comment #7)
> This sounds similar to https://bugzilla.redhat.com/show_bug.cgi?id=1290949
> 
> In that case, the root cause wasn't memory, but attempting to run with a
> single CPU undercloud - workarounds are noted in that bz.

You right, if I change to 4 CPU and now it's works. Of course I add also some worker on nova.

Comment 9 Steven Hardy 2016-01-07 11:50:46 UTC

Per comment #8, closing this as a duplicate of bz #1290949

Comment 10 Steven Hardy 2016-01-07 11:51:11 UTC


*** This bug has been marked as a duplicate of bug 1290949 ***