Bug 1393802
Summary: | heat-engine takes 100% CPU when calling resource-list and the operation times out | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Udi Kalifon <ukalifon> | ||||
Component: | openstack-heat | Assignee: | Zane Bitter <zbitter> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Amit Ugol <augol> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 10.0 (Newton) | CC: | jcoufal, jprovazn, jschluet, jslagle, jtrowbri, mburns, mcornea, rhel-osp-director-maint, sasha, sbaker, shardy, srevivo, therve, ukalifon | ||||
Target Milestone: | async | Keywords: | Triaged, ZStream | ||||
Target Release: | 10.0 (Newton) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-02-09 15:25:27 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Udi Kalifon
2016-11-10 11:22:21 UTC
Unfortunately, it is an expensive operation. What's your deployment like? How many nodes do you have in the overcloud? What's the undercloud sizing? Can you attach logs from the heat-engine? Thanks. Yeah, this is expensive and not likely to get cheaper any time soon. It's worth looking at whether we need to adjust the timeout on the load balancer though, assuming that's what's timing out. Created attachment 1219454 [details]
heat logs
I am attaching the logs. Sorry that it's all the logs and they include other tests I made also.
I was trying to deploy 3 controllers and 2 computes on a bare metal setup. I tried to ssl-ize the overcloud and I think the deployment failed because I didn't set the keys and certificates in the template.
Can we release with such a basic operation not working?
We can see the 2 failures in the API log when nested_depth is specified. The calls took around 64s to finish, so just after a haproxy timeout I presume. It is slow, I'm not sure we'll get around to fix that now though. The command "openstack stack failures list" is meant to do what you'd like to do I think, maybe that mitigate this issue. A patch has been proposed upstream to bump the HAProxy timeout to 2 minutes. So it's *really* not helping that we reduced the number of engine workers to 2. That's going to make everything painfully slow. Fix here: https://review.openstack.org/#/c/399619/ Is there any desire to backport https://review.openstack.org/399619 to stable/newton, or should we just close this bug? *** Bug 1396391 has been marked as a duplicate of this bug. *** Apparently we are not going to increase the number of workers back to 4 on OSP 10, so I'm going to close this bug since all of the other patches have been released. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |