Bug 1234153

Summary:	scale failed : Unknown status FAILED due to "AttributeError: 'module' object has no attribute 'MessagingTimeout'"
Product:	Red Hat OpenStack	Reporter:	Ola Pavlenko <opavlenk>
Component:	openstack-puppet-modules	Assignee:	Gaël Chamoulaud <gchamoul>
Status:	CLOSED ERRATA	QA Contact:	Ola Pavlenko <opavlenk>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	Director	CC:	gchamoul, jcoufal, jprovazn, jschluet, jslagle, mburns, ohochman, opavlenk, rhos-maint, rrosa, rybrown, sbaker, shardy, yeylon
Target Milestone:	ga
Target Release:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	openstack-puppet-modules-2015.1.8-3.el7ost	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1241255 (view as bug list)		Environment:
Last Closed:	2015-08-05 13:27:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1241255

Description Ola Pavlenko 2015-06-21 18:33:14 UTC

Description of problem:
deployed overcloud with 1 controller. 
tried to scale to 1 compute, 3 controllers and 1 ceph and failed for Controller resource:
resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "AttributeError: 'module' object has no attribute 'MessagingTimeout'"

Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-1.el7ost.noarch
rhos-release-0.62-1.noarch
openstack-heat-engine-2015.1.0-3.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-3.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-heat-api-2015.1.0-3.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-heat-common-2015.1.0-3.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-3.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-9.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1.deploy overcloud with 1 compute and 1 controller
2.scale the overcloud using 'openstack overcloud deploy --control-scale 3 --ceph-storage-scale 1 --plan-uuid'

Actual results:
scale failed

Expected results:
overcloud was successfully scaled

Additional info:
[stack@instack ~]$ heat stack-list
+--------------------------------------+------------+---------------+----------------------+
| id                                   | stack_name | stack_status  | creation_time        |
+--------------------------------------+------------+---------------+----------------------+
| 2d463b67-856f-4985-84b5-dac704274803 | overcloud  | UPDATE_FAILED | 2015-06-21T17:38:57Z |
+--------------------------------------+------------+---------------+----------------------+

[stack@instack ~]$ heat resource-list 2d463b67-856f-4985-84b5-dac704274803
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
| resource_name                     | physical_resource_id                          | resource_type                                     | resource_status | updated_time         |
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
| BlockStorageAllNodesDeployment    | edd4e0e9-9c1c-4888-aeea-dc1658d0e57d          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| BlockStorageNodesPostDeployment   | ccf03844-1afb-4bf4-a876-0976dd2971f9          | OS::TripleO::BlockStoragePostDeployment           | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| CephClusterConfig                 | 52aec691-4fb2-4282-8900-46cf188c7946          | OS::TripleO::CephClusterConfig::SoftwareConfig    | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| CephStorageAllNodesDeployment     | 4b2db21f-3a53-4cac-9213-3c725a94dfdb          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| CephStorageCephDeployment         | e61f20c5-d0ad-4b34-a920-996d9ae920cc          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| CephStorageNodesPostDeployment    | 4ef49e76-0780-490a-96e5-75acde342489          | OS::TripleO::CephStoragePostDeployment            | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ComputeAllNodesDeployment         | 409f0fd2-ead3-4833-a8aa-2fa26b3ef7d5          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ComputeCephDeployment             | 08b2da3a-7894-488b-857e-d3bf60a1fa8d          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ComputeNodesPostDeployment        | 4f62ba20-7ba1-4165-be8f-98d714ea247d          | OS::TripleO::ComputePostDeployment                | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControlVirtualIP                  | 903789a3-9ef5-4697-a6e2-c15ad9c79539          | OS::Neutron::Port                                 | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerAllNodesDeployment      | e15edc7d-03bb-4fcf-85f2-31ade7f540b9          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerBootstrapNodeConfig     | 723accfb-03bf-4e7b-bc90-3e6008dc53aa          | OS::TripleO::BootstrapNode::SoftwareConfig        | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerBootstrapNodeDeployment | f986daa9-eb49-47f0-bf67-62482aae9ce2          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerCephDeployment          | e129b764-f327-4555-9a2f-7cedcb9db989          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerClusterConfig           | 31e6e498-12e8-48c9-84d1-d53e5cb2494c          | OS::Heat::StructuredConfig                        | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerClusterDeployment       | 67e4b50d-73e8-4ee2-83a0-dd27b4d44ae8          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerIpListMap               | 2cdeae4c-f208-4acd-b457-61970d23dba0          | OS::TripleO::Network::Ports::NetIpListMap         | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerNodesPostDeployment     | eb272c31-c6c7-4dc8-bcd8-36129b2b7405          | OS::TripleO::ControllerPostDeployment             | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ControllerSwiftDeployment         | 6b1dfe3e-be77-4c18-9bf2-776509afda0e          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| HeatAuthEncryptionKey             | overcloud-HeatAuthEncryptionKey-x5l5jmvqeowp  | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| HorizonSecret                     | overcloud-HorizonSecret-pivvbmqbtgrn          | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| MysqlClusterUniquePart            | overcloud-MysqlClusterUniquePart-binf3mzcjnbs | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| MysqlRootPassword                 | overcloud-MysqlRootPassword-phobq6i6y7iy      | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ObjectStorageAllNodesDeployment   | 47839a59-7eb2-4777-9e86-024df964586f          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ObjectStorageNodesPostDeployment  | 9e43c9ce-e9fe-48ca-9d00-3e82630d0b14          | OS::TripleO::ObjectStoragePostDeployment          | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| ObjectStorageSwiftDeployment      | c501cb8c-04ad-41bf-9e44-f5be7e4d2377          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| PcsdPassword                      | overcloud-PcsdPassword-sgkx4az7hdek           | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| PublicVirtualIP                   | 6dd64bbc-7f0b-4f8b-b24c-1c7192121f37          | OS::Neutron::Port                                 | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| RabbitCookie                      | overcloud-RabbitCookie-jcz5yb27svc5           | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| SwiftDevicesAndProxyConfig        | 9c9dda32-a136-44c3-9d77-c9d938ed67b3          | OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| VipDeployment                     | ed555594-e54a-4cbf-90bf-275d8509553a          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| allNodesConfig                    | 779a0c2e-9fd9-4196-8e1c-24752258d5ff          | OS::TripleO::AllNodes::SoftwareConfig             | CREATE_COMPLETE | 2015-06-21T17:38:57Z |
| VipConfig                         | 8f91fc88-5cd4-459d-aebf-5041ad1ee49d          | OS::TripleO::VipConfig                            | UPDATE_COMPLETE | 2015-06-21T18:17:27Z |
| Ceph-Storage                      | c1b1f54f-8021-42b6-a87c-85393a1fb3f1          | OS::Heat::ResourceGroup                           | UPDATE_COMPLETE | 2015-06-21T18:17:28Z |
| Networks                          | 39a6e787-b647-4731-aed9-c583dc996655          | OS::TripleO::Network                              | UPDATE_COMPLETE | 2015-06-21T18:17:30Z |
| Swift-Storage                     | 2392fa7e-4e67-44a0-aece-df794f394caf          | OS::Heat::ResourceGroup                           | UPDATE_COMPLETE | 2015-06-21T18:17:38Z |
| InternalApiVirtualIP              | caf88942-50c1-4ca2-b9c9-03308adbbf86          | OS::TripleO::Controller::Ports::InternalApiPort   | UPDATE_COMPLETE | 2015-06-21T18:18:00Z |
| StorageMgmtVirtualIP              | 9cd3edf0-94db-4e15-b162-29c0c4012eef          | OS::TripleO::Controller::Ports::StorageMgmtPort   | UPDATE_COMPLETE | 2015-06-21T18:18:05Z |
| RedisVirtualIP                    | d9d54244-5f76-4085-860c-3d8efc2d94d4          | OS::TripleO::Controller::Ports::RedisVipPort      | UPDATE_COMPLETE | 2015-06-21T18:18:07Z |
| StorageVirtualIP                  | 8efcadb9-99d5-46e7-9e7c-8882c202e88a          | OS::TripleO::Controller::Ports::StoragePort       | UPDATE_COMPLETE | 2015-06-21T18:18:09Z |
| VipMap                            | 80d2539f-9a40-40f5-9ce0-2d3bb43bb39c          | OS::TripleO::Network::Ports::NetIpMap             | UPDATE_COMPLETE | 2015-06-21T18:18:13Z |
| Compute                           | 0fa47c06-cefb-4624-ad51-c998079421ca          | OS::Heat::ResourceGroup                           | UPDATE_COMPLETE | 2015-06-21T18:18:15Z |
| Controller                        | 862960c9-0d2d-4525-97c0-b839dfd147a0          | OS::Heat::ResourceGroup                           | UPDATE_FAILED   | 2015-06-21T18:18:18Z |
| Cinder-Storage                    | ef694b10-b9c4-4b15-8b0d-395e5bc7a676          | OS::Heat::ResourceGroup                           | UPDATE_COMPLETE | 2015-06-21T18:18:31Z |
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
[stack@instack ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks            |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| 0a652eb7-0f0c-406d-a72a-48973d985ac0 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.11 |
| c49e5616-dcaa-412c-8406-2da201beb591 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
| 601e74ed-cec6-44a0-bf60-868b080e920e | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.0.2.10 |
| 4b79329e-8b1e-4cfd-aa1e-90cda268fad2 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.0.2.12 |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
[stack@instack ~]$ ironic node-list
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provision State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| 14f6ce5d-3821-4be4-bc85-c758bb76e4fc | None | 4b79329e-8b1e-4cfd-aa1e-90cda268fad2 | power on    | active          | False       |
| 1021710a-9e61-46f1-b417-7d943af31839 | None | 0a652eb7-0f0c-406d-a72a-48973d985ac0 | power on    | active          | False       |
| d8219c79-2ace-4865-8220-e1853611060d | None | None                                 | power off   | available       | False       |
| 457b9740-79ef-43cb-abf7-09391fa1cde5 | None | c49e5616-dcaa-412c-8406-2da201beb591 | power on    | active          | False       |
| 60a64d3a-5d82-46e8-a653-30c7f2942975 | None | 601e74ed-cec6-44a0-bf60-868b080e920e | power on    | active          | False       |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+

[stack@instack ~]$ heat resource-show 2d463b67-856f-4985-84b5-dac704274803 Controller
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Property               | Value                                                                                                                                            |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| attributes             | {                                                                                                                                                |
|                        |   "attributes": null,                                                                                                                            |
|                        |   "refs": null                                                                                                                                   |
|                        | }                                                                                                                                                |
| description            |                                                                                                                                                  |
| links                  | http://192.0.2.1:8004/v1/7b4f213db6ee4517aaee45249cee5fc8/stacks/overcloud/2d463b67-856f-4985-84b5-dac704274803/resources/Controller (self)      |
|                        | http://192.0.2.1:8004/v1/7b4f213db6ee4517aaee45249cee5fc8/stacks/overcloud/2d463b67-856f-4985-84b5-dac704274803 (stack)                          |
|                        | http://192.0.2.1:8004/v1/7b4f213db6ee4517aaee45249cee5fc8/stacks/overcloud-Controller-5jnzh27yznnh/862960c9-0d2d-4525-97c0-b839dfd147a0 (nested) |
| logical_resource_id    | Controller                                                                                                                                       |
| physical_resource_id   | 862960c9-0d2d-4525-97c0-b839dfd147a0                                                                                                             |
| required_by            | ControllerCephDeployment                                                                                                                         |
|                        | ControllerBootstrapNodeDeployment                                                                                                                |
|                        | ControllerNodesPostDeployment                                                                                                                    |
|                        | SwiftDevicesAndProxyConfig                                                                                                                       |
|                        | ControllerClusterConfig                                                                                                                          |
|                        | ControllerClusterDeployment                                                                                                                      |
|                        | CephClusterConfig                                                                                                                                |
|                        | ControllerAllNodesDeployment                                                                                                                     |
|                        | allNodesConfig                                                                                                                                   |
|                        | ControllerIpListMap                                                                                                                              |
|                        | ControllerBootstrapNodeConfig                                                                                                                    |
|                        | ControllerSwiftDeployment                                                                                                                        |
|                        | VipDeployment                                                                                                                                    |
| resource_name          | Controller                                                                                                                                       |
| resource_status        | UPDATE_FAILED                                                                                                                                    |
| resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "AttributeError: 'module' object has no attribute 'MessagingTimeout'"      |
| resource_type          | OS::Heat::ResourceGroup                                                                                                                          |
| updated_time           | 2015-06-21T18:18:18Z                                                                                                                             |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+

Comment 3 Jan Provaznik 2015-06-22 08:22:09 UTC

From the output it seems that nova failed to find suitable host for one of controller hosts. After checking ironic nodes on this deployment it seems that the problem is that the one remaining host which should be used (but wasn't matched by nova filter) has wrong capabilities settings:

+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provision State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| 14f6ce5d-3821-4be4-bc85-c758bb76e4fc | None | 4b79329e-8b1e-4cfd-aa1e-90cda268fad2 | power on    | active          | False       |
| 1021710a-9e61-46f1-b417-7d943af31839 | None | 0a652eb7-0f0c-406d-a72a-48973d985ac0 | power on    | active          | False       |
| d8219c79-2ace-4865-8220-e1853611060d | None | None                                 | power off   | available       | False       |
| 457b9740-79ef-43cb-abf7-09391fa1cde5 | None | c49e5616-dcaa-412c-8406-2da201beb591 | power on    | active          | False       |
| 60a64d3a-5d82-46e8-a653-30c7f2942975 | None | 601e74ed-cec6-44a0-bf60-868b080e920e | power on    | active          | False       |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+

[stack@instack ~]$ ironic node-show d8219c79-2ace-4865-8220-e1853611060d
<snip>
| reservation            | None                                                                     |
| properties             | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'40',      |
| instance_uuid          | None                                                                     |

(IOW the hash is terminated in middle)

in compare to another "valid" node:
[stack@instack ~]$ ironic node-show 60a64d3a-5d82-46e8-a653-30c7f2942975
<snip>

| reservation            | None                                                                     |
| properties             | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'40',      |
|                        | u'cpus': u'1', u'capabilities': u'boot_option:local'}                    |
| instance_uuid          | 601e74ed-cec6-44a0-bf60-868b080e920e                                     |

Comment 4 Steven Hardy 2015-06-22 08:38:57 UTC

The error looks similar to this upstream heat bug:

https://bugs.launchpad.net/heat/+bug/1466239

It'd be useful to see the undercloud heat-engine logs so we can confirm if it's the same issue.

Comment 5 Jan Provaznik 2015-06-22 13:58:31 UTC

Disregard my comment #3 - although the ironic capabalities output is wrong, it seems to be irrelevant to the MessageTimeout error, also jfoucal has just reproduced this error on a different setup where ironic ndoe settings is OK.

Comment 6 Jaromir Coufal 2015-06-22 13:59:04 UTC

I had the same issue today, ryansb is looking into my deployment.

Comment 7 Steve Baker 2015-06-22 20:09:31 UTC

The fix for this has landed in upstream master https://review.openstack.org/#/c/192938

Comment 8 Ryan Brown 2015-06-23 14:22:41 UTC

The fix is there, but this issue is actually the result of a timeout happening during nova server creation.

In the logs, I still see a traceback (now with the correct error message telling us which message timed out) followed two minutes later by a response to the message that timed out.

It seems that with more machines sharing the same host that nova startup is delayed. The temp fix is to increase the RPC reply timeout.

Comment 9 Jaromir Coufal 2015-06-24 12:55:56 UTC

@Ryan: ACK. The problem is in the timeout Ryan described. As a workaround I tried to edit /etc/heat/heat.conf increased the timeout to rpc_response_timeout = 600 (uncomment!), restart openstack-heat-engine and the deployment passed.

Comment 10 Ryan Brown 2015-06-29 20:53:30 UTC

Here's an instack-undercloud patch that bumps the timeout for you. https://code.engineering.redhat.com/gerrit/#/c/51906/

Comment 11 James Slagle 2015-07-01 14:38:48 UTC

could be a dupe of bz#1231825?

Comment 13 Gaël Chamoulaud 2015-07-08 14:43:48 UTC

Added

Comment 14 Ryan Brown 2015-07-08 14:46:53 UTC

This also requires https://code.engineering.redhat.com/gerrit/#/c/51906/ before it can be closed.

Comment 16 Ola Pavlenko 2015-07-13 15:11:10 UTC

the rpc_response_timeout is still 60 in /etc/heat/heat.conf in the latest puddle from July 10

patch from comment 14 is failing Jenkins build.

Returning to Modified.

Comment 17 Ryan Brown 2015-07-14 15:50:10 UTC

Ola,

Can you tell me what puddle version you were on, and (if possible) the OPM and instack-undercloud versions? 

I tried on puddle 2015-07-13.1 (today's puddle) and the rpc_response_timeout is correct.

Comment 18 Ola Pavlenko 2015-07-14 15:52:54 UTC

I had puddle form July 10 e.g 2015-07-10.1

Comment 19 Ola Pavlenko 2015-07-14 15:55:21 UTC

now, in puddle 2015-07-13.1 its 
# Seconds to wait for a response from a call. (integer value)
#rpc_response_timeout = 60
rpc_response_timeout = 600
i assume the fix was dropped from previous puddle, but the bug was set ON_QA...

Comment 20 Ryan Brown 2015-07-14 15:58:26 UTC

After IRC discussion & verifying myself, setting back to ON_QA for verification.

Comment 21 Ola Pavlenko 2015-07-21 08:29:03 UTC

Verified with puddle 2015-07-17-1

Comment 23 errata-xmlrpc 2015-08-05 13:27:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1548