Bug 1939112

Summary: Memory leak in ceilometer-agent-notification (Train)
Product: [Community] RDO Reporter: Petr Tuma <p.tuma>
Component: openstack-ceilometerAssignee: Matthias Runge <mrunge>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: trunkCC: apevec, mrunge, srevivo
Target Milestone: ---   
Target Release: trunk   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-08 16:10:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Tuma 2021-03-15 16:15:43 UTC
Description of problem:
Ceilometer_notification container workers are leaking memory under load. Synthetic load of ~250 VMs worth of metric is enough to reliably hit the issue.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Build RHEL8 binary containers by kolla, UBI8 as base container.
2. Deploy such containers using kolla-ansible.
3. Run high load test and measure memory consumption of ceilometer-agent-notification workers.

Actual results:
These are few measurements during 1 hour of testing with synthetic load of 250 VMs being run on the system. Memory continues to grow until it's eventually killed by OOM killer (in 1st detected occurrence the processes used 25GBs of RAM).

42405      31998 27.2  0.6 5186236 101576 ?      SNl  12:17   0:14 ceilometer-agent-notification: NotificationService worker(0)
42405      32001 26.9  0.6 5186236 101660 ?      SNl  12:17   0:14 ceilometer-agent-notification: NotificationService worker(1)

42405      31998 16.3  0.7 5196220 122636 ?      SNl  12:17   0:33 ceilometer-agent-notification: NotificationService worker(0)
42405      32001 16.1  0.7 5196220 122272 ?      SNl  12:17   0:33 ceilometer-agent-notification: NotificationService worker(1)

42405      31998  8.9  1.1 5253052 180588 ?      SNl  12:17   3:54 ceilometer-agent-notification: NotificationService worker(0)
42405      32001  8.9  1.1 5252540 181372 ?      SNl  12:17   3:53 ceilometer-agent-notification: NotificationService worker(1)

Expected results:
42405      33544 29.6  0.6 5186240 101828 ?      SNl  12:17   0:14 ceilometer-agent-notification: NotificationService worker(0)
42405      33547 28.9  0.6 5186240 101552 ?      SNl  12:17   0:14 ceilometer-agent-notification: NotificationService worker(1)

42405      33544 16.2  0.7 5193408 119248 ?      SNl  12:17   0:33 ceilometer-agent-notification: NotificationService worker(0)
42405      33547 15.7  0.7 5192384 115240 ?      SNl  12:17   0:32 ceilometer-agent-notification: NotificationService worker(1)

42405      33544  8.4  0.7 5193152 120136 ?      SNl  12:17   3:40 ceilometer-agent-notification: NotificationService worker(0)
42405      33547  8.4  0.7 5191872 118324 ?      SNl  12:17   3:39 ceilometer-agent-notification: NotificationService worker(1)

Memory consumption does not grow idefinitely.

Additional info:
We tracked the issue to version of 'ujson' library. One that is available in http://mirror.centos.org/centos/8/cloud/x86_64/openstack-train/Packages/ (i.e. python3-ujson-2.0-0.2.20170206git2f1d487.el8.x86_64.rpm) is having the memory leak.

Official suggested version of the library for Train release is v1.35 (https://github.com/openstack/requirements/blob/stable/train/upper-constraints.txt) and for RHEL / CentOS 7 the package available is of this version. Version 2.0 is present in RHEL / CentOS 8 repositories.

Issue is not present in such containers when ujson==1.35 is used, as well as if ujson==2.0.3 is used (which is version that is suggested for Ussuri release).

Comment 1 Matthias Runge 2021-03-29 07:12:39 UTC
Thank you for this report. I'll follow-up with upstream and we'll make sure this gets fixed in later releases. In my understanding, Train on centos is not supported.
You should be able to rebuild the container. ujson is not a direct dependency ofr ceilometer. So far, I have found gnocchiclient (and gnocchi) using ujson.

Comment 2 Matthias Runge 2021-03-29 08:11:32 UTC
I've proposed https://review.rdoproject.org/r/c/rdoinfo/+/32832

Comment 4 Matthias Runge 2021-07-08 16:10:19 UTC
This issue has been addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1944027 and in https://bugzilla.redhat.com/show_bug.cgi?id=1948452

RDO only supports the latest release.