Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1021908

Summary:	[performance] mcollectived memory leak exist on node
Product:	OpenShift Container Platform	Reporter:	nsun <nsun>
Component:	Containers	Assignee:	Brenton Leanhardt <bleanhar>
Status:	CLOSED EOL	QA Contact:	libra bugs <libra-bugs>
Severity:	high	Docs Contact:
Priority:	low
Version:	2.2.0	CC:	anli, gpei, jeder, jialiu, libra-onpremise-devel, lmeyer, rthrashe, xtian
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1116034 (view as bug list)		Environment:
Last Closed:	2017-01-13 22:35:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1116034
Bug Blocks:

Description nsun 2013-10-22 10:12:39 UTC

Dcription of problem:
During longevity testing, found node service mcollectived have about 9% memory increased increased.

Version-Release number of selected component (if applicable):
Puddle Version : [1.2/2013-08-21.3]

Sever Environment: On BJ openstack, Broker and Node OS config :
Broker : 2 cpus/4G memory/10G
Node : 1 cpu/8G memory/200G

How reproducible:
Reference to "OpenShift Enterprise Performance Test Plan" 10.3.1:
https://docs.engineering.redhat.com/display/reporting/OpenShift+Enterprise+Performance+Test+Plan#OpenShiftEnterprisePerformanceTestPlan-10.3.LongevityTesting
our's script(longevity_app_create.sh) is the loop of one unit, the unit's logic:
sshkey remove/add--> add cart with all type's app--> select all apps-->rm carts/apps
Running Longevity test script about 14 days, and we get 191 cycles record. Slected the first app created as a compare point with every cycles.
From memory consumed results we can easy to get if existed memory leak on our's broker and node.

Steps to Reproduce:
1. Running Longevity script about 2 sweeks
2.
3.

Actual results:
Depending on the selected point, we can get 191 memory results, we can
see the Node's system memory consumed increased obviously. check the detail
from server monitor log, we get the mcollectived service existed memory leak.

Selected 3 cycle's mcollectived service status:
CYCLE USER PID %CPU %MEM COMMAND
1st root 27658 1.3 3.0 mcollectived
100th root 27658 3.5 9.0 mcollectived
191th root 27658 4.5 12.1 mcollectived

Expected results:
Memory consumeption growth should not more than 10% by itself.

Additional info:

Comment 2 Brenton Leanhardt 2013-10-22 12:58:51 UTC

I would really be interested in seeing the results of this test against 2.0. :)

Comment 3 Johnny Liu 2013-10-23 06:16:21 UTC

QE will run a round of performance testing after ose-2.0 code freeze ( ~ Nov 7th). Let us wait to see the result of 2.0 performance testing.

Comment 4 Gaoyun Pei 2013-12-10 06:59:17 UTC

Memory leak of mcollectived process still exists on ose-2.0

Version-Release number of selected component (if applicable):  
2.0/2013-11-22.1
[root@node ~]# rpm -qa|grep mcollective
ruby193-mcollective-2.2.3-4.el6op.noarch
ruby193-mcollective-common-2.2.3-4.el6op.noarch
openshift-origin-msg-node-mcollective-1.17.2-2.el6op.noarch

Broker: KVM (2 VCPU|4G RAM|10G Disk)
Node  : KVM (1 VCPU|8G RAM|200G Disk)

We have a script which would keep creating and deleting all kinds of cartridges.
After one week running, the mcollectived process has taken 3.4% more memory.

mcollective usage in the first:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      5775  2.5  2.8 713884 232504 ?       Sl   Nov26  35:36 ruby /opt/rh/ruby193/root/usr/sbin/mcollectived --pid=/opt/rh/ruby193/root/var/run/mcollectived.pid --config=/opt/rh/ruby193/root/etc/mcollective/server.cfg^M

mcollective usage after one week:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      5775 10.2  6.2 2943768 501500 ?      Sl   Nov26 1302:55 ruby /opt/rh/ruby193/root/usr/sbin/mcollectived --pid=/opt/rh/ruby193/root/var/run/mcollectived.pid --config=/opt/rh/ruby193/root/etc/mcollective/server.cfg^M

Comment 6 Anping Li 2014-07-03 06:00:30 UTC

Memory leak of mcollectived process still exists on ose-2.1

Version-Release number of selected component (if applicable):  
2.1/2014-5-29.3

[root@node ~]# rpm -qa|grep mcollective
ruby193-mcollective-2.4.1-5.el6op.noarch
ruby193-mcollective-common-2.4.1-5.el6op.noarch
openshift-origin-msg-node-mcollective-1.22.2-1.el6op.noarch

Broker: KVM (2 VCPU|4G RAM|10G Disk)
Node  : KVM (1 VCPU|8G RAM|200G Disk)

Test Result:
In this test, The test matrix was divided into test cycles. all supported cartridges are used in sequence in cycle. One application using these cartridges run the following actions as an unit.
sshkey remove --> sshkey add --> add each cartridge to an application -->access this application--> run "rhc domain show"  --> delete this application

Loop cycle, and record the consumed memory and CPU for core services after every unit is executed.

mcollective usage in the first:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND^M
root      1156  8.8 32.3 3071504 2610812 ?     Sl   Jun18 731:49 ruby /opt/rh/ruby193/root/usr/sbin/mcollectived --pid=/opt/rh/ruby193/root/var/run/mcollectived.pid --config=/opt/rh/ruby193/root/etc/mcollective/server.cfg^

mcollective usage after 10 days. There are about 10936 units(apps) was executed until now.
SER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1156 16.2 78.3 7396880 6313108 ?     Sl   Jun18 3492:53 ruby /opt/rh/ruby193/root/usr/sbin/mcollectived --pid=/opt/rh/ruby193/root/var/run/mcollectived.pid --config=/opt/rh/ruby193/root/etc/mcollective/server.cfg^M

Comment 7 Luke Meyer 2014-07-03 14:11:11 UTC

Good information, thanks. Given this problem persists, I would have to assume it's present upstream as well. It's just not such an urgent problem since it requires a higher level of activity than we're likely to see on a single node normally, and the memory is reclaimed any time mcollective is restarted. Still, it's a bug, and we want to reduce reasons to have to restart mcollective.

Comment 8 Rory Thrasher 2017-01-13 22:35:56 UTC

OpenShift Enterprise v2 has officially reached EoL.  This product is no longer supported and bugs will be closed.

Please look into the replacement enterprise-grade container option, OpenShift Container Platform v3.  https://www.openshift.com/container-platform/

More information can be found here: https://access.redhat.com/support/policy/updates/openshift/