Created attachment 1457998 [details] Gluster storage node memory utilization graph Description of problem: With the latest builds, memory utilization of tendrl-gluster-integration on one Gluster Storage server is growing to very high numbers. Version-Release number of selected component (if applicable): Gluster Storage Server: Red Hat Enterprise Linux Server release 7.5 (Maipo) Red Hat Gluster Storage Server 3.4.0 glusterfs-3.12.2-13.el7rhgs.x86_64 glusterfs-api-3.12.2-13.el7rhgs.x86_64 glusterfs-cli-3.12.2-13.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-13.el7rhgs.x86_64 glusterfs-events-3.12.2-13.el7rhgs.x86_64 glusterfs-fuse-3.12.2-13.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-13.el7rhgs.x86_64 glusterfs-libs-3.12.2-13.el7rhgs.x86_64 glusterfs-rdma-3.12.2-13.el7rhgs.x86_64 glusterfs-server-3.12.2-13.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64 python2-gluster-3.12.2-13.el7rhgs.x86_64 tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch tendrl-commons-1.6.3-8.el7rhgs.noarch tendrl-gluster-integration-1.6.3-6.el7rhgs.noarch tendrl-node-agent-1.6.3-8.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch vdsm-gluster-4.19.43-2.3.el7rhgs.noarch How reproducible: I've spotted it on three different clusters from yesterday, but I'm not sure if it is 100% reproducible on every cluster. Steps to Reproduce: 1. Prepare, install and configure Gluster Storage Cluster (my environment: 6 storage nodes with 8GB RAM, 2GB Swap, 2vCPUs, 1-3 volumes). 2. Install and configure RHGS WA Server and RHGS WA Node Agents on Gluster Storage nodes. 3. Import Gluster Cluster into RHGS WA. 4. Let it running for couple of hours/one day. 5. Check memory consumed by tendrl-gluster-integration on all Gluster Storage Servers. # ps -p $(echo $(ps aux | grep [t]endrl-gluster-integration | awk '{print $2}') | sed 's/ /,/g') -o %cpu,%mem,cmd -h; Actual results: On one Gluster Storage server in the cluster, tendrl-gluster-integration consumes huge amount of memory (more than 80% in my case). (second number is memory utilization) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # ps -p $(echo $(ps aux | grep [t]endrl-gluster-integration | awk '{print $2}') | sed 's/ /,/g') -o %cpu,%mem,cmd -h; 9.0 82.8 /usr/bin/python /usr/bin/tendrl-gluster-integration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Expected results: tendrl-gluster-integration should not consume such hight amount of memory. Additional info: This problem was initially spotted, because of alert similar to this shown in RHGS WA: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Memory utilization on node gl2.example.com in ClusterA at 80.09 % and running out of memory ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I've also attached graph of memory usage of the affected node.
Created attachment 1457999 [details] For comparison: "Not affected" Gluster storage node memory utilization graph For comparison, I'm attaching also graph of memory usage from "not affected" storage node. As you can see, the graph is limited on 8GB (which is the available memory), and also the biggest part of the memory is used for cache (purple color).
On cluster running for three days, tendrl-gluster-integration consumes significantly smaller amount of memory (1% or less on system with 8GB RAM) on all nodes. RHGS WA Server: Red Hat Enterprise Linux Server release 7.5 (Maipo) grafana-4.3.2-3.el7rhgs.x86_64 tendrl-ansible-1.6.3-5.el7rhgs.noarch tendrl-api-1.6.3-4.el7rhgs.noarch tendrl-api-httpd-1.6.3-4.el7rhgs.noarch tendrl-commons-1.6.3-9.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch tendrl-node-agent-1.6.3-9.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-8.el7rhgs.noarch Gluster Storage Server: Red Hat Enterprise Linux Server release 7.5 (Maipo) Red Hat Gluster Storage Server 3.4.0 glusterfs-3.12.2-14.el7rhgs.x86_64 glusterfs-api-3.12.2-14.el7rhgs.x86_64 glusterfs-cli-3.12.2-14.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-14.el7rhgs.x86_64 glusterfs-events-3.12.2-14.el7rhgs.x86_64 glusterfs-fuse-3.12.2-14.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-14.el7rhgs.x86_64 glusterfs-libs-3.12.2-14.el7rhgs.x86_64 glusterfs-rdma-3.12.2-14.el7rhgs.x86_64 glusterfs-server-3.12.2-14.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64 python2-gluster-3.12.2-14.el7rhgs.x86_64 tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch tendrl-commons-1.6.3-9.el7rhgs.noarch tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch tendrl-node-agent-1.6.3-9.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch vdsm-gluster-4.19.43-2.3.el7rhgs.noarch >> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616