This bug was initially created as a copy of Bug #1771994 I am copying this bug because: We noticed this issue in RHOS13 for customer environment where collectd was configured to send data to gnocchi, but collectd was facing issue an issue communicating to gnocchi. So it seems, this issue can be seen in environments which are deployed with write plugin (to send metrics) but the endpoint of destination is not accessible. In customer environment, this lead to ovs-vswitchd crash. Description of problem: If overcloud is deployed with collectd but when collectd is not configured to use any write plugin or any destination collectd server it can send data, memory leak is noticed. Collectd processes Resident memory increases to 20GB in few hours. During every cycle collectd collects data, the process visibly grows in memory usage. Collectd should have some configuration to discard collected data rather than storing it in memory when it is not configured with destination collectd server or write plugin. Version-Release number of selected component (if applicable): RHOS15z1 How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
We will need to backport a fix for setting maximum memory queue which collectd holds.
There are two bzs for osp13 to address a memory issue with amqp1, these are targeted for osp13z13 release. - https://bugzilla.redhat.com/show_bug.cgi?id=1817124 (fix a memory issue with amqp1) - https://bugzilla.redhat.com/show_bug.cgi?id=1861716 (puppet-collectd change to add the SendQueueLimit) In general, there is https://access.redhat.com/solutions/4855731 which mentions ~~~ collectd::write_queue_limit_high: 1000000 collectd::write_queue_limit_low: 800000 ~~~ These settings are intended to limit the write queue length in collectd (the values should probably be much lower, like 80 - 100). Unfortunately, that setting does not affect either the python plugin (for writing to gnocchi) or the amqp1 plugin, which was addressed in above mentioned bugs.