Bug 1596862 - Improve performance of tendrl components
Summary: Improve performance of tendrl components
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-node-agent
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: gowtham
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-06-29 20:07 UTC by gowtham
Modified: 2018-09-04 07:09 UTC (History)
5 users (show)

Fixed In Version: tendrl-commons-1.6.3-9.el7rhgs tendrl-node-agent-1.6.3-9.el7rhgs tendrl-monitoring-integration-1.6.3-7.el7rhgs tendrl-gluster-integration-1.6.3-7.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-04 07:08:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1538248 0 unspecified CLOSED [RFE] Performance Improvements 2023-09-14 04:15:57 UTC
Red Hat Product Errata RHSA-2018:2616 0 None None None 2018-09-04 07:09:30 UTC

Internal Links: 1538248

Description gowtham 2018-06-29 20:07:30 UTC
Description of problem:
Tendrl import cluster flow takes a lot of time to import a cluster which has a huge number of volumes. Sometimes import is failed with a timeout error. Even If an import flow is success grafana is not showing correct monitoring values. 
The problems which are occurring is:
 * grafana hang for every time.
 * volumes are deleted by TTL before one gluster integration sync done.
 * Tendrl UI hangs for sometime.

This is actually a serious issue because in a production environment tendrl should able to monitor the huge cluster.

Version-Release number of selected component (if applicable):
tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-node-agent-1.6.3-7.el7rhgs.noarch
tendrl-ui-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-7.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-5.el7rhgs.noarch

How reproducible:
100%

Steps to Reproduce:
1. we install tendrl components in 8GB RAM and 2 core cpu 
2. We tried to import 33 volumes and 99 bricks gluster cluster
3. Most of the  import cluster failed with timeout
4. And sometimes import successful but grafana and tendrl-ui and hole machine slow down.
5. carbon-cache takes 90 to 100% of cpu.
6. Lots of htttpd calls make overall setup slow.

Actual results:
the overall system goes slowdown when we try to import 33 volume and 99 brick cluster

Expected results:
Tendrl should able to import and monitor hug cluster

Additional info:

Comment 2 Martin Bukatovic 2018-07-03 12:01:19 UTC
This BZ has been reported based on results from perf. team investigation (as
required by RFE BZ 1538248) and will be verified in the same way.

Comment 5 gowtham 2018-07-09 11:06:41 UTC
I am not moving this issue to modified for now, because I am still thinking import timeout is very less to import huge cluster. Import timeout is based on a number of nodes now. Problem is if we import 100 volumes with 5 or 6 nodes means it will work, but if we import 100 volumes with 2 nodes means it will timeout. So I am going to change the logic to decide import cluster timeout value based on no.of volumes at runtime.

Comment 6 gowtham 2018-07-17 07:34:33 UTC
this is fixed https://github.com/Tendrl/commons/pull/1033

Comment 8 Daniel Horák 2018-08-23 08:13:19 UTC
As mentioned in Bug 1538248, the overall resource consumption is lower with
the new versions of RHGS WA components.
Also I've tried to import cluster with 2 Storage Nodes, with 100 "Replicated"
volumes with total number of 200 bricks (100 on each node) and the import
process passed correctly and also all Grafana Dashboard seems to be generated
properly.

Configuration of RHGS WA Server: VM with 4 vCPUs, 4GB of RAM.

RHGS WA Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  collectd-5.7.2-3.1.el7rhgs.x86_64
  collectd-ping-5.7.2-3.1.el7rhgs.x86_64
  etcd-3.2.7-1.el7.x86_64
  libcollectdclient-5.7.2-3.1.el7rhgs.x86_64
  python-etcd-0.4.5-2.el7rhgs.noarch
  rubygem-etcd-0.3.0-2.el7rhgs.noarch
  tendrl-ansible-1.6.3-7.el7rhgs.noarch
  tendrl-api-1.6.3-5.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
  tendrl-commons-1.6.3-12.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-10.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch
  tendrl-node-agent-1.6.3-10.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-11.el7rhgs.noarch

Gluster Storage Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  Red Hat Gluster Storage Server 3.4.0
  collectd-5.7.2-3.1.el7rhgs.x86_64
  collectd-ping-5.7.2-3.1.el7rhgs.x86_64
  glusterfs-3.12.2-16.el7rhgs.x86_64
  glusterfs-api-3.12.2-16.el7rhgs.x86_64
  glusterfs-cli-3.12.2-16.el7rhgs.x86_64
  glusterfs-client-xlators-3.12.2-16.el7rhgs.x86_64
  glusterfs-events-3.12.2-16.el7rhgs.x86_64
  glusterfs-fuse-3.12.2-16.el7rhgs.x86_64
  glusterfs-geo-replication-3.12.2-16.el7rhgs.x86_64
  glusterfs-libs-3.12.2-16.el7rhgs.x86_64
  glusterfs-rdma-3.12.2-16.el7rhgs.x86_64
  glusterfs-server-3.12.2-16.el7rhgs.x86_64
  gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
  gluster-nagios-common-0.2.4-1.el7rhgs.noarch
  libcollectdclient-5.7.2-3.1.el7rhgs.x86_64
  libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.7.x86_64
  python2-gluster-3.12.2-16.el7rhgs.x86_64
  python-etcd-0.4.5-2.el7rhgs.noarch
  tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-commons-1.6.3-12.el7rhgs.noarch
  tendrl-gluster-integration-1.6.3-10.el7rhgs.noarch
  tendrl-node-agent-1.6.3-10.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  vdsm-gluster-4.19.43-2.3.el7rhgs.noarch

>> VERIFIED

Comment 10 errata-xmlrpc 2018-09-04 07:08:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616


Note You need to log in before you can comment on or make changes to this bug.