Bug 1299613
Summary: | rhel-osp-director: Scale-up Ceph from 1 to 3 fails, when Overcloud is deployed with SSL (resources.EndpointMap: Timed out) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Omri Hochman <ohochman> | ||||||
Component: | openstack-heat | Assignee: | Ben Nemec <bnemec> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Amit Ugol <augol> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 7.0 (Kilo) | CC: | apevec, dyasny, gfidente, jslagle, lhh, mburns, ohochman, rhel-osp-director-maint, sbaker, shardy, ssainkar, yeylon, zbitter | ||||||
Target Milestone: | z4 | Keywords: | TestOnly, ZStream | ||||||
Target Release: | 7.0 (Kilo) | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1301627 1301629 1302593 1309816 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-02-18 16:43:10 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1302880, 1305947, 1308562, 1309823 | ||||||||
Bug Blocks: | 1309816 | ||||||||
Attachments: |
|
Description
Omri Hochman
2016-01-18 18:45:09 UTC
Created attachment 1115955 [details]
heat-engine.log
heat-engine.log
This same problem exists when trying to scale Ceph storage nodes in a regular IPv4 deployment. the same error happens with 2/4/6/12 workers and it is always the same resource failing with the Timed out message: https://github.com/openstack/tripleo-heat-templates/blob/master/network/endpoints/endpoint_map.yaml which is defining a big number of https://github.com/openstack/tripleo-heat-templates/blob/master/network/endpoints/endpoint.yaml adding a depends_on across the Endpoint resources doesn't help, same error trying to deploy and update a standalone stack from endpoint_map does *not* trigger the same error yet on a full overcloud this is reproducible consistenly on both ipv4 and ipv6 but only when scaling ceph nodes, if you try to scale compute nodes, the update will complete successfully Created attachment 1118860 [details]
heat-engine.log.gz
extract from heat-engine.log (debug) during a failed update attempt
the single useful message which I seem to find in the logs, in addition to the TRACE is: INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : 4fc5bf4064674d04832ef3d638665979 Possible alternative to generating the endpoint map proposed upstream I suspect the remainder of this issue will be fixed by bug 1302880. Ben, can you test the Endpoint map patch and if it looks good propose it downstream in tripleo-heat-templates? This should be resolved by the fixes for bug 1302880 and bug 1305947. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-0266.html |