Created attachment 1696783 [details] screenshot #1: memory consumption of noobaa-core-0 pod for a 15 day period Description of problem ====================== Pod noobaa-core-0 seems to be leaking 0.41 MiB of memory per hour. Reported based on Nimrod's question under BZ 1799920: https://bugzilla.redhat.com/show_bug.cgi?id=1799920#c11 Version-Release number of selected component ============================================ cluster channel: stable-4.2 cluster version: 4.2.18 cluster image: quay.io/openshift-release-dev/ocp-release@sha256:283a1625e18e0b6d7f354b1b022a0aeaab5598f2144ec484faf89e1ecb5c7498 storage namespace openshift-cluster-storage-operator image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9dd509754b883dab8301bc1cd50d1b902de531d012817b27bfdda2cd28c782a * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9dd509754b883dab8301bc1cd50d1b902de531d012817b27bfdda2cd28c782a storage namespace openshift-storage image registry.redhat.io/ocs4/cephcsi-rhel8@sha256:a2c8a48ad6c3da44dac2e700e2055e89fe8f333ecd2c6c49d21a90d9e7abd1b9 * registry.redhat.io/ocs4/cephcsi-rhel8@sha256:a2c8a48ad6c3da44dac2e700e2055e89fe8f333ecd2c6c49d21a90d9e7abd1b9 image registry.redhat.io/openshift4/ose-csi-driver-registrar@sha256:0fe4f131214353131e44b17ee67d2c43a5cb78e24b8af1bccaefb25c734a5e84 * registry.redhat.io/openshift4/ose-csi-driver-registrar@sha256:0fe4f131214353131e44b17ee67d2c43a5cb78e24b8af1bccaefb25c734a5e84 image registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:d5cab390ea94409337516be4a67d0b07be770a1a0be37bc852bf9ccf4effa353 * registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:d5cab390ea94409337516be4a67d0b07be770a1a0be37bc852bf9ccf4effa353 image registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:2eac5300e73ab43be46762178b844d6629220565a15272a215187c7d251b6fb7 * registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:2eac5300e73ab43be46762178b844d6629220565a15272a215187c7d251b6fb7 image registry.redhat.io/ocs4/mcg-core-rhel8@sha256:08866178f34a93b0f6a3e99aa6127d11b29bd65900f3e30fcd119b354b65fe0d * registry.redhat.io/ocs4/mcg-core-rhel8@sha256:08866178f34a93b0f6a3e99aa6127d11b29bd65900f3e30fcd119b354b65fe0d image registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:ad5dc22e6115adc0d875f6d2eb44b2ba594d07330a600e67bf3de49e02cab5b0 * registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:ad5dc22e6115adc0d875f6d2eb44b2ba594d07330a600e67bf3de49e02cab5b0 image registry.redhat.io/ocs4/mcg-rhel8-operator@sha256:12f8b2051f28086e97b7dd7673b65217503ecafec95739bbb101bc7054271c82 * registry.redhat.io/ocs4/mcg-rhel8-operator@sha256:12f8b2051f28086e97b7dd7673b65217503ecafec95739bbb101bc7054271c82 image registry.redhat.io/ocs4/ocs-rhel8-operator@sha256:35da0c0bda55f6cb599da9d8b756d45ca9f0e89c29762752fd44b0da2683882d * registry.redhat.io/ocs4/ocs-rhel8-operator@sha256:35da0c0bda55f6cb599da9d8b756d45ca9f0e89c29762752fd44b0da2683882d image registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:ac9d216d2d691f910f6b4772bddf5b6857d01f31595f2e929e5fcb589fa212c2 * registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:ac9d216d2d691f910f6b4772bddf5b6857d01f31595f2e929e5fcb589fa212c2 image registry.redhat.io/ocs4/rhceph-rhel8@sha256:f42e598d3eb8b68be7344c50ac4eb6f9c6b3b161b2dba4aed297889c3f53343b * registry.redhat.io/ocs4/rhceph-rhel8@sha256:f42e598d3eb8b68be7344c50ac4eb6f9c6b3b161b2dba4aed297889c3f53343b image registry.redhat.io/ocs4/rhceph-rhel8@sha256:b299023c03a0a2708970a7d752d35b97ebd651e48e80b7e751a29f65910dfc50 * registry.redhat.io/ocs4/rhceph-rhel8@sha256:b299023c03a0a2708970a7d752d35b97ebd651e48e80b7e751a29f65910dfc50 image registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:5feddbd9232657b2828dc9ce617204cfccef09b692ded624e29ce5af9928759d * registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:5feddbd9232657b2828dc9ce617204cfccef09b692ded624e29ce5af9928759d How reproducible ================ 1/1 Steps to Reproduce ================== 1. Install OCP/OCS cluster 2. Let the cluster running idle for a while (or at least perform no MCG workloads) 3. Check memory consumption of noobaa-core-0 pod (via Grafana instance referenced in OCP Console: Monitoring -> Dashboards -> "Kubernetes/ Compute Resources / Namespaces (Pods) for openshift-storage"). Actual results ============== I see a slight increase in memory consumption of noobaa-core-0 pod (see screenshot #1 attached to this bug): 2020-05-26 20:40 1.413 GiB 2020-06-11 8:40 1.562 GiB Which is increase by 0.149 GiB within 372 hours, which is 0.41 MiB per hour. Expected results ================ There is no memory leak in noobaa-core-0 pod.
4.2 still contains the DB inside the core (it was split in 4.3). We currently think this is expected since Mongo keeps things in mem as long as he can, and even on an idle cluster, we keep metrics and statistics which are collected regularly. We will check and verify if this is the case.
Pushing out of 4.5 for now , not a blocker.
Need to test with a new noobaa image which contains more metrics for the different services. Martin, can you sync with Elad and Ohad ?
Talking with Elad, this is not a blocker at this point. We will try to repro with a custom image (see comment #4) and then decide if this needs to be moved back to 4.6 For now we agreed it would be moved to 4.7
I'm rechecking this on the following cluster: - OCP 4.6.0-0.nightly-2020-11-05-215543 - OCS: 4.6.0-154.ci - baremetal platform The cluster is running for about 9 days. Querying prometheus for memory consumption of noobaa pods (see query below) shows while some pods (such as noobaa-core-0) gradually allocates memory, every now and then a pod frees some memory so that unlimited growth to the numbers originally reported in this bug is not happening. See attached screenshot #2.
Created attachment 1733872 [details] screenshot #2: memory cunsumption of every noobaa pod on 4.6 BM cluster for a 9 day period Screenshot with chart of the following prometheus query: pod:container_memory_usage_bytes:sum{namespace='openshift-storage', pod=~'noobaa-.*'}
Talking to Elad, we have assigned someone to look and verify if its an issue or just a usage pattern. Looking at comment #6 doesn't seem like we risk OOM. Not a blocker for 4.8