Bug 1599987

Summary: Growing memory utilization of tendrl-gluster-integration on one node in cluster
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Daniel Horák <dahorak>
Component: web-admin-tendrl-gluster-integrationAssignee: Shubhendu Tripathi <shtripat>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: mbukatov, nchilaka, nthomas, rhs-bugs, sankarshan
Target Milestone: ---Keywords: Regression
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tendrl-gluster-integration-1.6.3-7.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 07:08:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503137    
Attachments:
Description Flags
Gluster storage node memory utilization graph
none
For comparison: "Not affected" Gluster storage node memory utilization graph none

Description Daniel Horák 2018-07-11 06:43:40 UTC
Created attachment 1457998 [details]
Gluster storage node memory utilization graph

Description of problem:
  With the latest builds, memory utilization of tendrl-gluster-integration
  on one Gluster Storage server is growing to very high numbers.

Version-Release number of selected component (if applicable):
  Gluster Storage Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  Red Hat Gluster Storage Server 3.4.0
  glusterfs-3.12.2-13.el7rhgs.x86_64
  glusterfs-api-3.12.2-13.el7rhgs.x86_64
  glusterfs-cli-3.12.2-13.el7rhgs.x86_64
  glusterfs-client-xlators-3.12.2-13.el7rhgs.x86_64
  glusterfs-events-3.12.2-13.el7rhgs.x86_64
  glusterfs-fuse-3.12.2-13.el7rhgs.x86_64
  glusterfs-geo-replication-3.12.2-13.el7rhgs.x86_64
  glusterfs-libs-3.12.2-13.el7rhgs.x86_64
  glusterfs-rdma-3.12.2-13.el7rhgs.x86_64
  glusterfs-server-3.12.2-13.el7rhgs.x86_64
  gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
  gluster-nagios-common-0.2.4-1.el7rhgs.noarch
  libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64
  python2-gluster-3.12.2-13.el7rhgs.x86_64
  tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-commons-1.6.3-8.el7rhgs.noarch
  tendrl-gluster-integration-1.6.3-6.el7rhgs.noarch
  tendrl-node-agent-1.6.3-8.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  vdsm-gluster-4.19.43-2.3.el7rhgs.noarch

How reproducible:
  I've spotted it on three different clusters from yesterday, but I'm not sure
  if it is 100% reproducible on every cluster.

Steps to Reproduce:
1. Prepare, install and configure Gluster Storage Cluster
  (my environment: 6 storage nodes with 8GB RAM, 2GB Swap, 2vCPUs, 1-3 volumes).
2. Install and configure RHGS WA Server and RHGS WA Node Agents on Gluster
  Storage nodes.
3. Import Gluster Cluster into RHGS WA.
4. Let it running for couple of hours/one day.
5. Check memory consumed by tendrl-gluster-integration on all Gluster Storage
  Servers.
  # ps -p $(echo $(ps aux | grep [t]endrl-gluster-integration | awk '{print $2}') | sed 's/ /,/g') -o %cpu,%mem,cmd -h;


Actual results:
  On one Gluster Storage server in the cluster, tendrl-gluster-integration
  consumes huge amount of memory (more than 80% in my case).

  (second number is memory utilization)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # ps -p $(echo $(ps aux | grep [t]endrl-gluster-integration | awk '{print $2}') | sed 's/ /,/g') -o %cpu,%mem,cmd -h;
    9.0 82.8 /usr/bin/python /usr/bin/tendrl-gluster-integration
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Expected results:
  tendrl-gluster-integration should not consume such hight amount of memory.

Additional info:
  This problem was initially spotted, because of alert similar to this shown
  in RHGS WA:

  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Memory utilization on node gl2.example.com in ClusterA at 80.09 % and running out of memory
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  I've also attached graph of memory usage of the affected node.

Comment 1 Daniel Horák 2018-07-11 06:50:19 UTC
Created attachment 1457999 [details]
For comparison: "Not affected" Gluster storage node memory utilization graph

For comparison, I'm attaching also graph of memory usage from "not affected" storage node.
As you can see, the graph is limited on 8GB (which is the available memory), and also the biggest part of the memory is used for cache (purple color).

Comment 6 Daniel Horák 2018-07-23 08:31:01 UTC
On cluster running for three days, tendrl-gluster-integration consumes
significantly smaller amount of memory (1% or less on system with 8GB RAM) on
all nodes.

RHGS WA Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  grafana-4.3.2-3.el7rhgs.x86_64
  tendrl-ansible-1.6.3-5.el7rhgs.noarch
  tendrl-api-1.6.3-4.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
  tendrl-commons-1.6.3-9.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
  tendrl-node-agent-1.6.3-9.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-8.el7rhgs.noarch

Gluster Storage Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  Red Hat Gluster Storage Server 3.4.0
  glusterfs-3.12.2-14.el7rhgs.x86_64
  glusterfs-api-3.12.2-14.el7rhgs.x86_64
  glusterfs-cli-3.12.2-14.el7rhgs.x86_64
  glusterfs-client-xlators-3.12.2-14.el7rhgs.x86_64
  glusterfs-events-3.12.2-14.el7rhgs.x86_64
  glusterfs-fuse-3.12.2-14.el7rhgs.x86_64
  glusterfs-geo-replication-3.12.2-14.el7rhgs.x86_64
  glusterfs-libs-3.12.2-14.el7rhgs.x86_64
  glusterfs-rdma-3.12.2-14.el7rhgs.x86_64
  glusterfs-server-3.12.2-14.el7rhgs.x86_64
  gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
  gluster-nagios-common-0.2.4-1.el7rhgs.noarch
  libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64
  python2-gluster-3.12.2-14.el7rhgs.x86_64
  tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-commons-1.6.3-9.el7rhgs.noarch
  tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
  tendrl-node-agent-1.6.3-9.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  vdsm-gluster-4.19.43-2.3.el7rhgs.noarch

>> VERIFIED

Comment 8 errata-xmlrpc 2018-09-04 07:08:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616