Bug 1265653
Summary: | Heat consumes high CPU and Memory | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Pablo Iranzo Gómez <pablo.iranzo> |
Component: | python-eventlet | Assignee: | Jon Schlueter <jschluet> |
Status: | CLOSED ERRATA | QA Contact: | Amit Ugol <augol> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.0 (Juno) | CC: | adahms, dmaley, fahmed, jschluet, lhh, mburns, mschuppe, nlevinki, pablo.iranzo, rhel-osp-director-maint, sasha, sbaker, sclewis, shardy, yeylon, zbitter |
Target Milestone: | async | Keywords: | Rebase, Triaged, ZStream |
Target Release: | 6.0 (Juno) | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | python-eventlet-0.17.4-2.el7ost | Doc Type: | Rebase: Bug Fixes and Enhancements |
Doc Text: |
Rebase package to version: 0.17.4
This version resolves the following error: "maximum recursion depth exceeded in GreenSocket.__del__"
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-10-15 12:30:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Pablo Iranzo Gómez
2015-09-23 12:22:38 UTC
From the traceback it looks like RabbitMQ died. Possibly related to bug 1265418 Could you please compare the following configuration settings in /etc/heat/heat.conf against other OpenStack services? They should be consistent (or consistently using the default value): - rpc_backend - rabbit_* It's highly unlikely that max_resources_per_stack is the problem here. In RHOS 7 the problem was that each nested stack ran in its own RPC request, and that each stack counted its resources by loading every stack in the tree, with the result that the total memory consumption was O(n^2) in the number of nested stacks in the tree. The reason that bug even happened was that this was just not the case in RHOS 6. All of the stacks in the tree were handled in a single RPC context, which just counted all of the resources in-memory. It doesn't change the amount that is loaded into memory at all. It's far more likely that the patch to eliminate reference loops (so that objects are deleted immediately when they are no longer referenced, rather than waiting for the garbage collector) might make a difference. Most likely of all though is that something is leaking memory - with eventlet and qpid being the prime suspects. Could you please supply the response to the following? rpm -qa |egrep "eventlet|greenlet|requests|qpid" Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method GreenSocket.__del__ of <eventlet.greenio.GreenSocket object at 0x54bee10>> ignored This ^ is a known bug in eventlet 0.15.2. It was identified upstream by our OpenStack team: https://bugs.launchpad.net/oslo.messaging/+bug/1369999 The description states that it occurs after an RPC Timeout with QPID, and the result is "The load average shoots up, lots of services hang up and chew through CPU spewing the same message into the logs." That sounds like exactly what you're seeing here. Here is the upstream eventlet bug: https://github.com/eventlet/eventlet/issues/137 (Note there was also a follow-up fix required: https://github.com/eventlet/eventlet/issues/154) This was fixed in eventlet 0.16.0 http://eventlet.net/doc/changelog.html#id7 To verify bug: from upstream bug standard install with qpid running with HOST_IP on a nic that can be taken down (he used wifi nic) install with qpid and neutron for networking https://bugs.launchpad.net/oslo.messaging/+bug/1369999/comments/4 https://bugs.launchpad.net/oslo.messaging/+bug/1369999/comments/5 after things are up and running bring down the nic CPU usage will spike for a number of services including heat Backported to OSP 6 and picked up patch for another bug that was originally pushed out on OSP 6 - rhbz 1259758 new build python-eventlet-0.17.4-2.el7ost Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1903.html |