Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1797436

Summary: unbounded memory usage in collectd when it's not configured with any write plugin
Product: Red Hat OpenStack Reporter: Jaison Raju <jraju>
Component: collectdAssignee: Matthias Runge <mrunge>
Status: CLOSED DUPLICATE QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: astupnik, csibbitt, dhill, dsedgmen, jbadiapa, jraju, lars, mmagr, mmethot, mrunge, pkilambi, rmccabe, uemit.seren
Target Milestone: z13Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-25 05:59:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1790928, 1817124, 1859630    
Bug Blocks:    

Description Jaison Raju 2020-02-03 07:25:39 UTC
This bug was initially created as a copy of Bug #1771994

I am copying this bug because: 

We noticed this issue in RHOS13 for customer environment where collectd was configured to send data to gnocchi, but collectd was facing issue an issue communicating to gnocchi.
So it seems, this issue can be seen in environments which are deployed with write plugin (to send metrics) but the endpoint of destination is not accessible.
In customer environment, this lead to ovs-vswitchd crash.


Description of problem:
If overcloud is deployed with collectd but when collectd is not configured to use any write plugin or any destination collectd server it can send data, memory leak is noticed.
Collectd processes Resident memory increases to 20GB in few hours.
During every cycle collectd collects data, the process visibly grows in memory usage.

Collectd should have some configuration to discard collected data rather than storing it in memory when it is not configured with destination collectd server or write plugin.

Version-Release number of selected component (if applicable):
RHOS15z1

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Martin Magr 2020-02-19 16:30:28 UTC
We will need to backport a fix for setting maximum memory queue which collectd holds.

Comment 15 Matthias Runge 2020-09-25 05:52:21 UTC
There are two bzs for osp13 to address a memory issue with amqp1, these are targeted for osp13z13 release.

- https://bugzilla.redhat.com/show_bug.cgi?id=1817124 (fix a memory issue with amqp1)
- https://bugzilla.redhat.com/show_bug.cgi?id=1861716 (puppet-collectd change to add the SendQueueLimit)


In general, there is https://access.redhat.com/solutions/4855731 which mentions
~~~
collectd::write_queue_limit_high: 1000000
collectd::write_queue_limit_low: 800000
~~~
These settings are intended to limit the write queue length in collectd (the values should probably be much lower, like 80 - 100).
Unfortunately, that setting does not affect either the python plugin (for writing to gnocchi) or the amqp1
plugin, which was addressed in above mentioned bugs.