Bug 1358452 - memory utilization on machine hosting both ceph monitor and calamari grows in a linear way
Summary: memory utilization on machine hosting both ceph monitor and calamari grows in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Calamari
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 2.0
Assignee: Christina Meno
QA Contact: Martin Bukatovic
URL:
Whiteboard:
Depends On:
Blocks: Console-2-GA
TreeView+ depends on / blocked
 
Reported: 2016-07-20 17:52 UTC by Martin Bukatovic
Modified: 2016-08-23 19:44 UTC (History)
10 users (show)

Fixed In Version: RHEL: calamari-server-1.4.8-1.el7cp Ubuntu: calamari_1.4.8-2redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-23 19:44:53 UTC
Embargoed:


Attachments (Terms of Use)
screenshot 1: graphite generated chart of memory consumption of all machines of the cluster (59.78 KB, image/png)
2016-07-20 17:52 UTC, Martin Bukatovic
no flags Details
screenshot 2: new memory consumption chart (calamari-server-1.4.7-1.el7cp.x86_64) (187.51 KB, image/png)
2016-07-27 08:51 UTC, Martin Bukatovic
no flags Details
figure 1: calamari-lite rss during 2 days (65.17 KB, image/png)
2016-07-28 15:38 UTC, Martin Bukatovic
no flags Details
source data for figure 1 (rss and size of calamari-lite process) (91.50 KB, text/plain)
2016-07-28 15:43 UTC, Martin Bukatovic
no flags Details
1185165: figure 2: calamari-lite rss during one week (52.50 KB, image/png)
2016-08-02 14:47 UTC, Martin Bukatovic
no flags Details
souce data for figure 2 (rss and size of calamari-lite process) (59.49 KB, application/x-gzip)
2016-08-02 14:50 UTC, Martin Bukatovic
no flags Details
figure 3: calamari-lite rss during 3 days (QE verification) (59.54 KB, image/png)
2016-08-08 09:03 UTC, Martin Bukatovic
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1358440 0 unspecified CLOSED RFE install calamari on console machine instead of monitor machine 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2016:1755 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 23:23:52 UTC

Internal Links: 1358440

Description Martin Bukatovic 2016-07-20 17:52:49 UTC
Created attachment 1182208 [details]
screenshot 1: graphite generated chart of memory consumption of all machines of the cluster

Description of problem
======================

Memory consumption on monitor machine hosting calamari server *gradually
grows in a linear way*.

The severity of this bug and associates risks depends on the long term
behavior under load, which is yet to be tested.

Version-Release
===============

On RHSC 2.0 server machine:

rhscon-ui-0.0.48-1.el7scon.noarch
rhscon-core-selinux-0.0.34-1.el7scon.noarch
rhscon-ceph-0.0.33-1.el7scon.x86_64
rhscon-core-0.0.34-1.el7scon.x86_64
ceph-installer-1.0.14-1.el7scon.noarch
ceph-ansible-1.0.5-28.el7scon.noarch

On Ceph MON machines:

rhscon-core-selinux-0.0.34-1.el7scon.noarch
rhscon-agent-0.0.15-1.el7scon.noarch
ceph-selinux-10.2.2-22.el7cp.x86_64
ceph-common-10.2.2-22.el7cp.x86_64
ceph-base-10.2.2-22.el7cp.x86_64
ceph-mon-10.2.2-22.el7cp.x86_64
calamari-server-1.4.6-1.el7cp.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.
2. Accept few nodes for the ceph cluster.
3. Create new ceph cluster named 'alpha'.
4. Wait at least for 10 hours
5. Go to graphite web interface, and select memory consumption for every
   machine of the cluster.

Actual results
==============

Even without actual load, the memory consumption on machine hosting both 
ceph monitor and clamari grows in a linear fashion, as can be seen on
attached screenshot #1 (graph generated by graphite web interface).

Compare this with memory consumption trends from other machines, which are
basically constant (the expected behavior here).

Expected results
================

While memory consumption on machine hosting both ceph monitor and clamari is
higher compared to monitor only machines, it doesn't grow in a linear way.

Comment 1 Nishanth Thomas 2016-07-21 12:03:44 UTC
What is memory consumption of the calamari process? Does it grow linearly?

Comment 2 Martin Bukatovic 2016-07-27 08:51:27 UTC
Created attachment 1184541 [details]
screenshot 2: new memory consumption chart (calamari-server-1.4.7-1.el7cp.x86_64)

Update
======

Testing with new build from Monday with calamari-server-1.4.7-1.el7cp.x86_64,
and so far I see the same behavior as shown in the original report.

As has been decided on the bug triage meeting, later I'm going to attach logs of
rss and size for calamari-lite process, which is crucial here - we need to know
how the consumption behaves over longer periods of time. When I have the data
ready, I will answer the needinfo flag.

Comment 3 Martin Bukatovic 2016-07-27 08:58:46 UTC
(In reply to Nishanth Thomas from comment #1)
> What is memory consumption of the calamari process? Does it grow linearly?

Based on the evidence I already have, I can state that yes, RSS of calamari-lite
process grows in a linear way. Yesterday, RSS of calamari-lite process was
113756, and now I see it's 262500 already.

I will provide full logs and plot it into charts when I have long term data,
as we decided on the bug triage meeting.

Comment 4 Martin Bukatovic 2016-07-27 14:54:44 UTC
Could calamari dev team check this? Have you seen this in your environment?

Comment 5 Christina Meno 2016-07-28 04:55:22 UTC
I'm not seeing it grow quite as rapidly. I will investigate and advise. I had an instance up for 7days that only made it to 700M RSS

Comment 6 Jeff Applewhite 2016-07-28 12:22:27 UTC
It looks like linear growth according to QE

Comment 7 Martin Bukatovic 2016-07-28 15:38:12 UTC
Created attachment 1185165 [details]
figure 1: calamari-lite rss during 2 days

Attaching plot of RSS (physical memory consumption) of calamari-lite process
after 2 days of logging.

As you can see, the trend is quite clear.

Comment 8 Martin Bukatovic 2016-07-28 15:43:28 UTC
Created attachment 1185166 [details]
source data for figure 1 (rss and size of calamari-lite process)

Attaching source data for figure 1 (calamari-lite process RSS during 2 days).

The file has the following format:

~~~
<timestamp> <rss-in-kiloBytes> <size-in-kiloBytes>
~~~

The quick summary is:

~~~
$ head calamari-watch.2days.log
2016-07-26T16:33 113756 1314692
2016-07-26T16:34 113792 1314692
2016-07-26T16:35 114036 1314884
2016-07-26T16:36 114648 1315384
2016-07-26T16:37 114724 1315528
2016-07-26T16:38 115104 1315976
2016-07-26T16:39 115124 1315976
2016-07-26T16:40 115252 1316268
2016-07-26T16:41 115840 1316432
2016-07-26T16:42 115920 1316880
$ tail calamari-watch.2days.log
2016-07-28T17:11 496996 1697944
2016-07-28T17:12 497380 1698336
2016-07-28T17:13 497384 1698336
2016-07-28T17:14 497436 1698336
2016-07-28T17:15 497512 1698664
2016-07-28T17:16 497644 1698664
2016-07-28T17:17 498188 1699120
2016-07-28T17:18 498284 1699272
2016-07-28T17:19 498600 1699440
2016-07-28T17:20 498976 1699908
~~~

Comment 9 Christina Meno 2016-07-29 16:28:04 UTC
I'm in agreement this needs a fix. I have a few ready. While I don't expect to eliminate all growth in this time frame. I can reduce it's slope and set a hard limit in systemd so that if we grow too much the process will restart. I should be able to get that to be infrequent.

Comment 11 Christina Meno 2016-08-01 19:49:19 UTC
https://github.com/ceph/calamari/releases/tag/v1.4.8

Comment 15 Martin Bukatovic 2016-08-02 14:47:57 UTC
Created attachment 1186856 [details]
1185165: figure 2: calamari-lite rss during one week

Just for a record, I'm attaching chart of calamari-lite memory consumption for
a whole week.

Comment 16 Martin Bukatovic 2016-08-02 14:50:36 UTC
Created attachment 1186858 [details]
souce data for figure 2 (rss and size of calamari-lite process)

Attaching source data for figure 2 (from previous comment).

Comment 17 Martin Bukatovic 2016-08-08 09:02:01 UTC
Checking with
=============

On Monitor/Calamari machine:

calamari-server-1.4.8-1.el7cp.x86_64

On RHSC 2.0 server machine:

rhscon-ui-0.0.51-1.el7scon.noarch
rhscon-core-0.0.38-1.el7scon.x86_64
rhscon-ceph-0.0.38-1.el7scon.x86_64
rhscon-core-selinux-0.0.38-1.el7scon.noarch

On Ceph 2.0 machines:

rhscon-core-selinux-0.0.38-1.el7scon.noarch
rhscon-agent-0.0.16-1.el7scon.noarch

Verification
============

I observed memory consumption of calamari-lite process for about 3 days, and
noticed a significant change:

* at first, the memory consumption was growing (but the rate was a bit
  different compared to the previous behaviour),
* when the RSS reached about 190 MB, the memory consumption stopped growing and
  stayed there.

So memory consumption of calamari-lite process no longer grows in a linear way
indefinitely, but stops at about 190 MB RSS.

>> VERIFIED

Comment 18 Martin Bukatovic 2016-08-08 09:03:49 UTC
Created attachment 1188580 [details]
figure 3: calamari-lite rss during 3 days (QE verification)

Attaching evidence for verification: figure 3.

Comment 20 errata-xmlrpc 2016-08-23 19:44:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html


Note You need to log in before you can comment on or make changes to this bug.