Bug 1846402

Summary: Pod noobaa-core-0 memory consumption is increasing 0.41 MiB per hour
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Martin Bukatovic <mbukatov>
Component: Multi-Cloud Object GatewayAssignee: Igor Pick <ipick>
Status: CLOSED WONTFIX QA Contact: Raz Tamir <ratamir>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2CC: ebenahar, etamir, kramdoss, muagarwa, nbecker, ocs-bugs, odf-bz-bot, rcyriac
Target Milestone: ---Keywords: AutomationBackLog
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-31 09:19:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot #1: memory consumption of noobaa-core-0 pod for a 15 day period
none
screenshot #2: memory cunsumption of every noobaa pod on 4.6 BM cluster for a 9 day period none

Description Martin Bukatovic 2020-06-11 13:53:05 UTC
Created attachment 1696783 [details]
screenshot #1: memory consumption of noobaa-core-0 pod for a 15 day period

Description of problem
======================

Pod noobaa-core-0 seems to be leaking 0.41 MiB of memory per hour.

Reported based on Nimrod's question under BZ 1799920:

https://bugzilla.redhat.com/show_bug.cgi?id=1799920#c11

Version-Release number of selected component
============================================

cluster channel: stable-4.2
cluster version: 4.2.18
cluster image: quay.io/openshift-release-dev/ocp-release@sha256:283a1625e18e0b6d7f354b1b022a0aeaab5598f2144ec484faf89e1ecb5c7498

storage namespace openshift-cluster-storage-operator
image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9dd509754b883dab8301bc1cd50d1b902de531d012817b27bfdda2cd28c782a
 * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9dd509754b883dab8301bc1cd50d1b902de531d012817b27bfdda2cd28c782a

storage namespace openshift-storage
image registry.redhat.io/ocs4/cephcsi-rhel8@sha256:a2c8a48ad6c3da44dac2e700e2055e89fe8f333ecd2c6c49d21a90d9e7abd1b9
 * registry.redhat.io/ocs4/cephcsi-rhel8@sha256:a2c8a48ad6c3da44dac2e700e2055e89fe8f333ecd2c6c49d21a90d9e7abd1b9
image registry.redhat.io/openshift4/ose-csi-driver-registrar@sha256:0fe4f131214353131e44b17ee67d2c43a5cb78e24b8af1bccaefb25c734a5e84
 * registry.redhat.io/openshift4/ose-csi-driver-registrar@sha256:0fe4f131214353131e44b17ee67d2c43a5cb78e24b8af1bccaefb25c734a5e84
image registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:d5cab390ea94409337516be4a67d0b07be770a1a0be37bc852bf9ccf4effa353
 * registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:d5cab390ea94409337516be4a67d0b07be770a1a0be37bc852bf9ccf4effa353
image registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:2eac5300e73ab43be46762178b844d6629220565a15272a215187c7d251b6fb7
 * registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:2eac5300e73ab43be46762178b844d6629220565a15272a215187c7d251b6fb7
image registry.redhat.io/ocs4/mcg-core-rhel8@sha256:08866178f34a93b0f6a3e99aa6127d11b29bd65900f3e30fcd119b354b65fe0d
 * registry.redhat.io/ocs4/mcg-core-rhel8@sha256:08866178f34a93b0f6a3e99aa6127d11b29bd65900f3e30fcd119b354b65fe0d
image registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:ad5dc22e6115adc0d875f6d2eb44b2ba594d07330a600e67bf3de49e02cab5b0
 * registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:ad5dc22e6115adc0d875f6d2eb44b2ba594d07330a600e67bf3de49e02cab5b0
image registry.redhat.io/ocs4/mcg-rhel8-operator@sha256:12f8b2051f28086e97b7dd7673b65217503ecafec95739bbb101bc7054271c82
 * registry.redhat.io/ocs4/mcg-rhel8-operator@sha256:12f8b2051f28086e97b7dd7673b65217503ecafec95739bbb101bc7054271c82
image registry.redhat.io/ocs4/ocs-rhel8-operator@sha256:35da0c0bda55f6cb599da9d8b756d45ca9f0e89c29762752fd44b0da2683882d
 * registry.redhat.io/ocs4/ocs-rhel8-operator@sha256:35da0c0bda55f6cb599da9d8b756d45ca9f0e89c29762752fd44b0da2683882d
image registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:ac9d216d2d691f910f6b4772bddf5b6857d01f31595f2e929e5fcb589fa212c2
 * registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:ac9d216d2d691f910f6b4772bddf5b6857d01f31595f2e929e5fcb589fa212c2
image registry.redhat.io/ocs4/rhceph-rhel8@sha256:f42e598d3eb8b68be7344c50ac4eb6f9c6b3b161b2dba4aed297889c3f53343b
 * registry.redhat.io/ocs4/rhceph-rhel8@sha256:f42e598d3eb8b68be7344c50ac4eb6f9c6b3b161b2dba4aed297889c3f53343b
image registry.redhat.io/ocs4/rhceph-rhel8@sha256:b299023c03a0a2708970a7d752d35b97ebd651e48e80b7e751a29f65910dfc50
 * registry.redhat.io/ocs4/rhceph-rhel8@sha256:b299023c03a0a2708970a7d752d35b97ebd651e48e80b7e751a29f65910dfc50
image registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:5feddbd9232657b2828dc9ce617204cfccef09b692ded624e29ce5af9928759d
 * registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:5feddbd9232657b2828dc9ce617204cfccef09b692ded624e29ce5af9928759d

How reproducible
================

1/1

Steps to Reproduce
==================

1. Install OCP/OCS cluster 
2. Let the cluster running idle for a while (or at least perform no MCG
   workloads)
3. Check memory consumption of noobaa-core-0 pod (via Grafana instance
   referenced in OCP Console: Monitoring -> Dashboards -> "Kubernetes/
   Compute Resources / Namespaces (Pods) for openshift-storage").

Actual results
==============

I see a slight increase in memory consumption of noobaa-core-0 pod (see
screenshot #1 attached to this bug):

2020-05-26 20:40 1.413 GiB
2020-06-11  8:40 1.562 GiB

Which is increase by 0.149 GiB within 372 hours, which is 0.41 MiB per hour.

Expected results
================

There is no memory leak in noobaa-core-0 pod.

Comment 2 Nimrod Becker 2020-06-18 10:43:07 UTC
4.2 still contains the DB inside the core (it was split in 4.3).
We currently think this is expected since Mongo keeps things in mem as long as he can, and even on an idle cluster, we keep metrics and statistics which are collected regularly.

We will check and verify if this is the case.

Comment 3 Nimrod Becker 2020-06-30 14:39:31 UTC
Pushing out of 4.5 for now , not a blocker.

Comment 4 Nimrod Becker 2020-10-01 09:23:34 UTC
Need to test with a new noobaa image which contains more metrics for the different services.
Martin, can you sync with Elad and Ohad ?

Comment 5 Nimrod Becker 2020-10-06 13:56:42 UTC
Talking with Elad, this is not a blocker at this point.

We will try to repro with a custom image (see comment #4) and then decide if this needs to be moved back to 4.6
For now we agreed it would be moved to 4.7

Comment 6 Martin Bukatovic 2020-11-26 19:10:44 UTC
I'm rechecking this on the following cluster:

- OCP 4.6.0-0.nightly-2020-11-05-215543
- OCS: 4.6.0-154.ci
- baremetal platform

The cluster is running for about 9 days.

Querying prometheus for memory consumption of noobaa pods (see query below) shows
while some pods (such as noobaa-core-0) gradually allocates memory, every now and
then a pod frees some memory so that unlimited growth to the numbers originally
reported in this bug is not happening.

See attached screenshot #2.

Comment 7 Martin Bukatovic 2020-11-26 19:13:58 UTC
Created attachment 1733872 [details]
screenshot #2: memory cunsumption of every noobaa pod on 4.6 BM cluster for a 9 day period

Screenshot with chart of the following prometheus query:

pod:container_memory_usage_bytes:sum{namespace='openshift-storage', pod=~'noobaa-.*'}

Comment 9 Nimrod Becker 2021-05-27 09:17:51 UTC
Talking to Elad, we have assigned someone to look and verify if its an issue or just a usage pattern.
Looking at comment #6 doesn't seem like we risk OOM.

Not a blocker for 4.8