Bug 1372732 - ceilometer-polling thread count growing
Summary: ceilometer-polling thread count growing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 10.0 (Newton)
Assignee: Mehdi ABAAKOUK
QA Contact: Mehdi ABAAKOUK
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-02 13:56 UTC by Alex Krzos
Modified: 2016-12-14 15:55 UTC (History)
5 users (show)

Fixed In Version: openstack-ceilometer-7.0.0-0.2.0rc2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 15:55:58 UTC


Attachments (Terms of Use)
Graphs showing thread count and rss memory (342.07 KB, application/x-gzip)
2016-09-02 13:56 UTC, Alex Krzos
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 368092 None None None 2016-09-09 15:47:15 UTC

Description Alex Krzos 2016-09-02 13:56:27 UTC
Created attachment 1197219 [details]
Graphs showing thread count and rss memory

Description of problem:
Ceilometer-polling process is growing in the number of threads both when under load and not under load.

Comparing two clouds:

Cloud under load(16 hours): Booting 20 instances every 20 minutes:
Controllers (3 controllers) grew from 7 threads to 1k threads
Computes (7 computes) grew from 7 threads to 1k threads
This results in a total of 10k sleeping threads across the 10 machines


Cloud not under load (13 hours):
In about 13 hours thread counts grow:
Controllers (3 controllers) - 15 to 63 threads
Compute (1 compute) - 16 to 95 threads

Version-Release number of selected component (if applicable):
OSP10 deployed from OSPd: builds - 2016-08-29.1 and 2016-08-30
openstack-ceilometer-polling-7.0.0-0.20160818153837.4ce3339.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Attached graphs showing thread counts over time of the ceilometer-polling daemon.  Also attached is RSS Memory graphs of the processes as well.  One concern here is that leaking threads consume a little bit of memory that is never released, and as demonstrated we can see RSS memory grow in both the under load and no load situations.  This does not directly confirm a memory leak though.

Further investigation of the threads show they are in sleeping state when viewed with top and ps.

[root@overcloud-controller-0 ~]# ps afx | grep ceilometer-polling
24013 pts/0    S+     0:00                          \_ grep --color=auto ceilometer-polling
 9697 ?        Ss     4:38 /usr/bin/python2 /usr/bin/ceilometer-polling --polling-namespaces central --logfile /var/log/ceilometer/central.log
 9917 ?        Sl    21:29  \_ ceilometer-polling - AgentManager(0)
[root@overcloud-controller-0 ~]# cat /proc/9917/status | grep -i threads
Threads:        1005
[root@overcloud-controller-0 ~]# ps -T -p 9917 -o pid,lwp,state,rss,pcpu,cmd
  PID   LWP S   RSS %CPU CMD
 9917  9917 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  9918 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  9945 S 201200 0.2 ceilometer-polling - AgentManager(0)
 9917  9946 S 201200 0.8 ceilometer-polling - AgentManager(0)
 9917  9948 S 201200 0.1 ceilometer-polling - AgentManager(0)
 9917 17550 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 21764 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 25958 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 29950 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 33829 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 37656 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 41507 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 45586 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917   844 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  5158 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  9237 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 13068 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 16867 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 21349 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 25498 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 29287 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 33148 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 36955 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 40808 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 44784 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917 48765 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  4192 S 201200 0.0 ceilometer-polling - AgentManager(0)
 9917  8277 S 201200 0.0 ceilometer-polling - AgentManager(0)
....

Comment 2 Mehdi ABAAKOUK 2016-09-09 15:52:21 UTC
The bug have been fixed upstream, it's not critical, the thread pool used by ceilometer was 1000. So the value was safely capped (but a bit too big :p). The upstream fix limit the number of needed threads to the exact number of pollsters.

When eventlet was used, the pool was 1000 too, but greenthread wasn't show up in ps. And the event pool recycle does not work like the concurrent.futures one. So memory usage was not as high.

Comment 3 Mehdi ABAAKOUK 2016-09-21 15:24:52 UTC
This is part of Ceilometer 7.0.0.0rc1 upstream release

Comment 5 Mehdi ABAAKOUK 2016-10-19 13:06:48 UTC
After two hours, I still have 7 threads, everything looks OK


[heat-admin@overcloud-controller-1 ~]$ uptime
 13:02:02 up  2:37,  1 user,  load average: 6.74, 6.35, 7.16
[heat-admin@overcloud-controller-1 ~]$ ps -T -p 13028 -o pid,lwp,state,rss,pcpu,cmd
  PID   LWP S   RSS %CPU CMD
13028 13028 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028 13032 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028 13090 R 52892  0.1 ceilometer-polling - AgentManager(0)
13028 13092 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028 13093 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028 13213 S 52892  0.0 ceilometer-polling - AgentManager(0)
13028  9295 S 52892  0.0 ceilometer-polling - AgentManager(0)

Comment 7 errata-xmlrpc 2016-12-14 15:55:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.