Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1364193

Summary: Ceilometer Collector grows in memory with gnocchi or mongo as backend
Product: Red Hat OpenStack Reporter: Alex Krzos <akrzos>
Component: openstack-ceilometerAssignee: Julien Danjou <jdanjou>
Status: CLOSED DUPLICATE QA Contact: Yurii Prokulevych <yprokule>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.0 (Mitaka)CC: akrzos, fbaudin, jruzicka, pkilambi, srevivo
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-02 14:51:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
System resource graphs none

Description Alex Krzos 2016-08-04 16:39:46 UTC
Description of problem:
Ceilometer-collector grows in memory rapidly with mongo or gnocchi as a backend

Version-Release number of selected component (if applicable):
Openstack Mitaka (OSPd deployed overcloud)

openstack-ceilometer-notification-6.0.0-2.el7ost.noarch
openstack-ceilometer-api-6.0.0-2.el7ost.noarch
openstack-ceilometer-polling-6.0.0-2.el7ost.noarch
openstack-ceilometer-compute-6.0.0-2.el7ost.noarch
openstack-ceilometer-common-6.0.0-2.el7ost.noarch
openstack-ceilometer-collector-6.0.0-2.el7ost.noarch
openstack-ceilometer-central-6.0.0-2.el7ost.noarch
python-ceilometer-6.0.0-2.el7ost.noarch
python-ceilometerclient-2.3.0-1.el7ost.noarch


How reproducible:
With large enough environment/deployment (200 instances)

Steps to Reproduce:
1. Deploy ha-overcloud (3 controllers) with 2 compute nodes
2. Tune nova allocation ratios to allow for more overcommitting of the compute nodes (if needed for your hardware)
3. Tune ceilometer for backend gnocchi (If desired to see ceilometer-collector memory growth with gnocchi)
4. Tune ceilometer to poll more often (default polling is 600s, I have tested 5s, 10s, 60s)
5. Boot small isntances on overcloud at rate of 20 every 1200s or so until you have 200 total instances

Actual results:
Ceilometer collector was witnessed growing in memory from a ~100MiB to over 5GiB and as high as 65GiB.  Eventually this leads to the entire cloud collapsing as there is no swap space for relief on both controllers and computes and the Linux OOM kills processes which causes pacemaker to restart services.

Expected results:
Ceilometer collector to not spike in memory usage.

Additional info:


I understand that mongo as a backend is going away, however this behavior is witnessed with both ceilometer backends (mongo and gnocchi).

View attached screen shots of graphs of tests:

Test 1: 1 OSPd, 3 Controllers, 2 computes - mongo ceilometer backend, 10s interval, 200 instances booted
Test 2: 1 OSPd, 3 Controllers, 2 computes - gnocchi ceilometer backend, 10s interval, 200 instances booted
Test 3: 1 OSPd, 3 Controllers, 2 computes - gnocchi ceilometer backend, 60s interval, 200 instances booted

Comment 2 Alex Krzos 2016-08-04 16:45:28 UTC
Created attachment 1187576 [details]
System resource graphs

System Resource Graphs of Overcloud with 200 instances and ceilometer with mongo and then gnocchi as a backend

Comment 10 Julien Danjou 2016-09-02 14:51:45 UTC

*** This bug has been marked as a duplicate of bug 1336664 ***