Bug 1234153
Summary: | scale failed : Unknown status FAILED due to "AttributeError: 'module' object has no attribute 'MessagingTimeout'" | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Ola Pavlenko <opavlenk> | |
Component: | openstack-puppet-modules | Assignee: | Gaël Chamoulaud <gchamoul> | |
Status: | CLOSED ERRATA | QA Contact: | Ola Pavlenko <opavlenk> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | Director | CC: | gchamoul, jcoufal, jprovazn, jschluet, jslagle, mburns, ohochman, opavlenk, rhos-maint, rrosa, rybrown, sbaker, shardy, yeylon | |
Target Milestone: | ga | |||
Target Release: | 7.0 (Kilo) | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | openstack-puppet-modules-2015.1.8-3.el7ost | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1241255 (view as bug list) | Environment: | ||
Last Closed: | 2015-08-05 13:27:58 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1241255 |
Description
Ola Pavlenko
2015-06-21 18:33:14 UTC
From the output it seems that nova failed to find suitable host for one of controller hosts. After checking ironic nodes on this deployment it seems that the problem is that the one remaining host which should be used (but wasn't matched by nova filter) has wrong capabilities settings: +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | UUID | Name | Instance UUID | Power State | Provision State | Maintenance | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | 14f6ce5d-3821-4be4-bc85-c758bb76e4fc | None | 4b79329e-8b1e-4cfd-aa1e-90cda268fad2 | power on | active | False | | 1021710a-9e61-46f1-b417-7d943af31839 | None | 0a652eb7-0f0c-406d-a72a-48973d985ac0 | power on | active | False | | d8219c79-2ace-4865-8220-e1853611060d | None | None | power off | available | False | | 457b9740-79ef-43cb-abf7-09391fa1cde5 | None | c49e5616-dcaa-412c-8406-2da201beb591 | power on | active | False | | 60a64d3a-5d82-46e8-a653-30c7f2942975 | None | 601e74ed-cec6-44a0-bf60-868b080e920e | power on | active | False | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ [stack@instack ~]$ ironic node-show d8219c79-2ace-4865-8220-e1853611060d <snip> | reservation | None | | properties | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'40', | | instance_uuid | None | (IOW the hash is terminated in middle) in compare to another "valid" node: [stack@instack ~]$ ironic node-show 60a64d3a-5d82-46e8-a653-30c7f2942975 <snip> | reservation | None | | properties | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'40', | | | u'cpus': u'1', u'capabilities': u'boot_option:local'} | | instance_uuid | 601e74ed-cec6-44a0-bf60-868b080e920e | The error looks similar to this upstream heat bug: https://bugs.launchpad.net/heat/+bug/1466239 It'd be useful to see the undercloud heat-engine logs so we can confirm if it's the same issue. Disregard my comment #3 - although the ironic capabalities output is wrong, it seems to be irrelevant to the MessageTimeout error, also jfoucal has just reproduced this error on a different setup where ironic ndoe settings is OK. I had the same issue today, ryansb is looking into my deployment. The fix for this has landed in upstream master https://review.openstack.org/#/c/192938 The fix is there, but this issue is actually the result of a timeout happening during nova server creation. In the logs, I still see a traceback (now with the correct error message telling us which message timed out) followed two minutes later by a response to the message that timed out. It seems that with more machines sharing the same host that nova startup is delayed. The temp fix is to increase the RPC reply timeout. @Ryan: ACK. The problem is in the timeout Ryan described. As a workaround I tried to edit /etc/heat/heat.conf increased the timeout to rpc_response_timeout = 600 (uncomment!), restart openstack-heat-engine and the deployment passed. Here's an instack-undercloud patch that bumps the timeout for you. https://code.engineering.redhat.com/gerrit/#/c/51906/ could be a dupe of bz#1231825? Added This also requires https://code.engineering.redhat.com/gerrit/#/c/51906/ before it can be closed. the rpc_response_timeout is still 60 in /etc/heat/heat.conf in the latest puddle from July 10 patch from comment 14 is failing Jenkins build. Returning to Modified. Ola, Can you tell me what puddle version you were on, and (if possible) the OPM and instack-undercloud versions? I tried on puddle 2015-07-13.1 (today's puddle) and the rpc_response_timeout is correct. I had puddle form July 10 e.g 2015-07-10.1 now, in puddle 2015-07-13.1 its # Seconds to wait for a response from a call. (integer value) #rpc_response_timeout = 60 rpc_response_timeout = 600 i assume the fix was dropped from previous puddle, but the bug was set ON_QA... After IRC discussion & verifying myself, setting back to ON_QA for verification. Verified with puddle 2015-07-17-1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1548 |