Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1372732

Summary: ceilometer-polling thread count growing
Product: Red Hat OpenStack Reporter: Alex Krzos <akrzos>
Component: openstack-ceilometerAssignee: Mehdi ABAAKOUK <mabaakou>
Status: CLOSED ERRATA QA Contact: Mehdi ABAAKOUK <mabaakou>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: jjoyce, jruzicka, mabaakou, pkilambi, srevivo
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ceilometer-7.0.0-0.2.0rc2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 15:55:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Graphs showing thread count and rss memory none

Description Alex Krzos 2016-09-02 13:56:27 UTC
Created attachment 1197219 [details]
Graphs showing thread count and rss memory

Description of problem:
Ceilometer-polling process is growing in the number of threads both when under load and not under load.

Comparing two clouds:

Cloud under load(16 hours): Booting 20 instances every 20 minutes:
Controllers (3 controllers) grew from 7 threads to 1k threads
Computes (7 computes) grew from 7 threads to 1k threads
This results in a total of 10k sleeping threads across the 10 machines


Cloud not under load (13 hours):
In about 13 hours thread counts grow:
Controllers (3 controllers) - 15 to 63 threads
Compute (1 compute) - 16 to 95 threads

Version-Release number of selected component (if applicable):
OSP10 deployed from OSPd: builds - 2016-08-29.1 and 2016-08-30
openstack-ceilometer-polling-7.0.0-0.20160818153837.4ce3339.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Attached graphs showing thread counts over time of the ceilometer-polling daemon.  Also attached is RSS Memory graphs of the processes as well.  One concern here is that leaking threads consume a little bit of memory that is never released, and as demonstrated we can see RSS memory grow in both the under load and no load situations.  This does not directly confirm a memory leak though.

Further investigation of the threads show they are in sleeping state when viewed with top and ps.

[root@overcloud-controller-0 ~]# ps afx | grep ceilometer-polling
24013 pts/0    S+     0:00                          \_ grep --color=auto ceilometer-polling
 9697 ?        Ss     4:38 /usr/bin/python2 /usr/bin/ceilometer-polling --polling-namespaces central --logfile /var/log/ceilometer/central.log
 9917 ?        Sl    21:29  \_ ceilometer-polling - AgentManager(0)
[root@overcloud-controller-0 ~]# cat /proc/9917/status | grep -i threads
Threads:        1005
[root@overcloud-controller-0 ~]# ps -T -p 9917 -o pid,lwp,state,rss,pcpu,cmd
  PID   LWP S   RSS %CPU CMD
 9917  9917 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  9918 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  9945 S 201200 0.2 ceilometer-polling - AgentManager(0)
 9917  9946 S 201200 0.8 ceilometer-polling - AgentManager(0)
 9917  9948 S 201200 0.1 ceilometer-polling - AgentManager(0)
 9917 17550 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 21764 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 25958 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 29950 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 33829 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 37656 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 41507 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 45586 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917   844 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  5158 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  9237 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 13068 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 16867 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 21349 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 25498 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 29287 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 33148 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 36955 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 40808 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 44784 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 48765 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  4192 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  8277 S 201200 0.0 ceilometer-polling - AgentManager(0)
....

Comment 2 Mehdi ABAAKOUK 2016-09-09 15:52:21 UTC
The bug have been fixed upstream, it's not critical, the thread pool used by ceilometer was 1000. So the value was safely capped (but a bit too big :p). The upstream fix limit the number of needed threads to the exact number of pollsters.

When eventlet was used, the pool was 1000 too, but greenthread wasn't show up in ps. And the event pool recycle does not work like the concurrent.futures one. So memory usage was not as high.

Comment 3 Mehdi ABAAKOUK 2016-09-21 15:24:52 UTC
This is part of Ceilometer 7.0.0.0rc1 upstream release

Comment 5 Mehdi ABAAKOUK 2016-10-19 13:06:48 UTC
After two hours, I still have 7 threads, everything looks OK


[heat-admin@overcloud-controller-1 ~]$ uptime
 13:02:02 up  2:37,  1 user,  load average: 6.74, 6.35, 7.16
[heat-admin@overcloud-controller-1 ~]$ ps -T -p 13028 -o pid,lwp,state,rss,pcpu,cmd
  PID   LWP S   RSS %CPU CMD
13028 13028 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028 13032 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028 13090 R 52892  0.1 ceilometer-polling - AgentManager(0)
13028 13092 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028 13093 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028 13213 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028  9295 S 52892  0.0 ceilometer-polling - AgentManager(0)

Comment 7 errata-xmlrpc 2016-12-14 15:55:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html