Description of problem: Ceilometer_notification container workers are leaking memory under load. Synthetic load of ~250 VMs worth of metric is enough to reliably hit the issue. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Build RHEL8 binary containers by kolla, UBI8 as base container. 2. Deploy such containers using kolla-ansible. 3. Run high load test and measure memory consumption of ceilometer-agent-notification workers. Actual results: These are few measurements during 1 hour of testing with synthetic load of 250 VMs being run on the system. Memory continues to grow until it's eventually killed by OOM killer (in 1st detected occurrence the processes used 25GBs of RAM). 42405 31998 27.2 0.6 5186236 101576 ? SNl 12:17 0:14 ceilometer-agent-notification: NotificationService worker(0) 42405 32001 26.9 0.6 5186236 101660 ? SNl 12:17 0:14 ceilometer-agent-notification: NotificationService worker(1) 42405 31998 16.3 0.7 5196220 122636 ? SNl 12:17 0:33 ceilometer-agent-notification: NotificationService worker(0) 42405 32001 16.1 0.7 5196220 122272 ? SNl 12:17 0:33 ceilometer-agent-notification: NotificationService worker(1) 42405 31998 8.9 1.1 5253052 180588 ? SNl 12:17 3:54 ceilometer-agent-notification: NotificationService worker(0) 42405 32001 8.9 1.1 5252540 181372 ? SNl 12:17 3:53 ceilometer-agent-notification: NotificationService worker(1) Expected results: 42405 33544 29.6 0.6 5186240 101828 ? SNl 12:17 0:14 ceilometer-agent-notification: NotificationService worker(0) 42405 33547 28.9 0.6 5186240 101552 ? SNl 12:17 0:14 ceilometer-agent-notification: NotificationService worker(1) 42405 33544 16.2 0.7 5193408 119248 ? SNl 12:17 0:33 ceilometer-agent-notification: NotificationService worker(0) 42405 33547 15.7 0.7 5192384 115240 ? SNl 12:17 0:32 ceilometer-agent-notification: NotificationService worker(1) 42405 33544 8.4 0.7 5193152 120136 ? SNl 12:17 3:40 ceilometer-agent-notification: NotificationService worker(0) 42405 33547 8.4 0.7 5191872 118324 ? SNl 12:17 3:39 ceilometer-agent-notification: NotificationService worker(1) Memory consumption does not grow idefinitely. Additional info: We tracked the issue to version of 'ujson' library. One that is available in http://mirror.centos.org/centos/8/cloud/x86_64/openstack-train/Packages/ (i.e. python3-ujson-2.0-0.2.20170206git2f1d487.el8.x86_64.rpm) is having the memory leak. Official suggested version of the library for Train release is v1.35 (https://github.com/openstack/requirements/blob/stable/train/upper-constraints.txt) and for RHEL / CentOS 7 the package available is of this version. Version 2.0 is present in RHEL / CentOS 8 repositories. Issue is not present in such containers when ujson==1.35 is used, as well as if ujson==2.0.3 is used (which is version that is suggested for Ussuri release).
Thank you for this report. I'll follow-up with upstream and we'll make sure this gets fixed in later releases. In my understanding, Train on centos is not supported. You should be able to rebuild the container. ujson is not a direct dependency ofr ceilometer. So far, I have found gnocchiclient (and gnocchi) using ujson.
I've proposed https://review.rdoproject.org/r/c/rdoinfo/+/32832
This issue has been addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1944027 and in https://bugzilla.redhat.com/show_bug.cgi?id=1948452 RDO only supports the latest release.