RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/
Bug 1939112 - Memory leak in ceilometer-agent-notification (Train)
Summary: Memory leak in ceilometer-agent-notification (Train)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RDO
Classification: Community
Component: openstack-ceilometer
Version: trunk
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
: trunk
Assignee: Matthias Runge
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-15 16:15 UTC by Petr Tuma
Modified: 2024-10-01 17:41 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-07-08 16:10:19 UTC
Embargoed:


Attachments (Terms of Use)

Description Petr Tuma 2021-03-15 16:15:43 UTC
Description of problem:
Ceilometer_notification container workers are leaking memory under load. Synthetic load of ~250 VMs worth of metric is enough to reliably hit the issue.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Build RHEL8 binary containers by kolla, UBI8 as base container.
2. Deploy such containers using kolla-ansible.
3. Run high load test and measure memory consumption of ceilometer-agent-notification workers.

Actual results:
These are few measurements during 1 hour of testing with synthetic load of 250 VMs being run on the system. Memory continues to grow until it's eventually killed by OOM killer (in 1st detected occurrence the processes used 25GBs of RAM).

42405      31998 27.2  0.6 5186236 101576 ?      SNl  12:17   0:14 ceilometer-agent-notification: NotificationService worker(0)
42405      32001 26.9  0.6 5186236 101660 ?      SNl  12:17   0:14 ceilometer-agent-notification: NotificationService worker(1)

42405      31998 16.3  0.7 5196220 122636 ?      SNl  12:17   0:33 ceilometer-agent-notification: NotificationService worker(0)
42405      32001 16.1  0.7 5196220 122272 ?      SNl  12:17   0:33 ceilometer-agent-notification: NotificationService worker(1)

42405      31998  8.9  1.1 5253052 180588 ?      SNl  12:17   3:54 ceilometer-agent-notification: NotificationService worker(0)
42405      32001  8.9  1.1 5252540 181372 ?      SNl  12:17   3:53 ceilometer-agent-notification: NotificationService worker(1)

Expected results:
42405      33544 29.6  0.6 5186240 101828 ?      SNl  12:17   0:14 ceilometer-agent-notification: NotificationService worker(0)
42405      33547 28.9  0.6 5186240 101552 ?      SNl  12:17   0:14 ceilometer-agent-notification: NotificationService worker(1)

42405      33544 16.2  0.7 5193408 119248 ?      SNl  12:17   0:33 ceilometer-agent-notification: NotificationService worker(0)
42405      33547 15.7  0.7 5192384 115240 ?      SNl  12:17   0:32 ceilometer-agent-notification: NotificationService worker(1)

42405      33544  8.4  0.7 5193152 120136 ?      SNl  12:17   3:40 ceilometer-agent-notification: NotificationService worker(0)
42405      33547  8.4  0.7 5191872 118324 ?      SNl  12:17   3:39 ceilometer-agent-notification: NotificationService worker(1)

Memory consumption does not grow idefinitely.

Additional info:
We tracked the issue to version of 'ujson' library. One that is available in http://mirror.centos.org/centos/8/cloud/x86_64/openstack-train/Packages/ (i.e. python3-ujson-2.0-0.2.20170206git2f1d487.el8.x86_64.rpm) is having the memory leak.

Official suggested version of the library for Train release is v1.35 (https://github.com/openstack/requirements/blob/stable/train/upper-constraints.txt) and for RHEL / CentOS 7 the package available is of this version. Version 2.0 is present in RHEL / CentOS 8 repositories.

Issue is not present in such containers when ujson==1.35 is used, as well as if ujson==2.0.3 is used (which is version that is suggested for Ussuri release).

Comment 1 Matthias Runge 2021-03-29 07:12:39 UTC
Thank you for this report. I'll follow-up with upstream and we'll make sure this gets fixed in later releases. In my understanding, Train on centos is not supported.
You should be able to rebuild the container. ujson is not a direct dependency ofr ceilometer. So far, I have found gnocchiclient (and gnocchi) using ujson.

Comment 2 Matthias Runge 2021-03-29 08:11:32 UTC
I've proposed https://review.rdoproject.org/r/c/rdoinfo/+/32832

Comment 4 Matthias Runge 2021-07-08 16:10:19 UTC
This issue has been addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1944027 and in https://bugzilla.redhat.com/show_bug.cgi?id=1948452

RDO only supports the latest release.


Note You need to log in before you can comment on or make changes to this bug.