Bug 1561468
Summary: | tendrl-node-agent CPU consumption | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Daniel Horák <dahorak> |
Component: | web-admin-tendrl-node-agent | Assignee: | Nishanth Thomas <nthomas> |
Status: | CLOSED ERRATA | QA Contact: | Daniel Horák <dahorak> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.4 | CC: | dahorak, gshanmug, mbukatov, rhs-bugs |
Target Milestone: | --- | ||
Target Release: | RHGS 3.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | tendrl-node-agent-1.6.3-3.el7rhgs.noarch | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-04 07:03:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1503137 |
Description
Daniel Horák
2018-03-28 11:54:46 UTC
Output from Gluster storage server with 2vCPUs running 1 day: %CPU %MEM 17.1 2.9 /usr/bin/python /usr/bin/tendrl-node-agent Output from Gluster storage server with 2vCPUs running 12 days: %CPU %MEM 42.4 3.1 /usr/bin/python /usr/bin/tendrl-node-agent And output from RHGS WA Server with 4 vCPUs running 1 day: %CPU %MEM 22.1 0.3 /usr/bin/python /usr/bin/tendrl-node-agent Also the overall system load is quite high, despite the fact, that there is no data load on the Gluster Volumes or any other tasks performed. What is the value of config option "sync_interval" at ? /etc/tendrl/node-agent/node-agent.conf.yaml We didn't update the 'sync_interval', so it contains the default value: # grep sync_interval /etc/tendrl/node-agent/node-agent.conf.yaml sync_interval: 60 @Daniel, can you re-check this with the latest buid For the first look, it seems to be ok, but I'll have to keep the cluster running for few days to be sure. I'll post update early next week. There seems to be noticeable improvement between last two versions: tendrl-node-agent-1.6.3-2.el7rhgs.noarch and tendrl-node-agent-1.6.3-3.el7rhgs.noarch With the newer version, the CPU usage of tendrl-node-agent is bellow 3% on cluster running for 2 days. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # ps aux | grep -E "[t]endrl-node-agent" | awk "{print \$2}" | sed "s/ /,/" | xargs -n1 ps -o %cpu,%mem,cmd -h -p 2.8 0.8 /usr/bin/python /usr/bin/tendrl-node-agent ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I'll watch it further on more cluster variants and send another status in few days. I have reduced usage percentage in latest release https://github.com/Tendrl/commons/pull/931 please verify it happening again (In reply to Nishanth Thomas from comment #6) > @Daniel, can you re-check this with the latest buid On (nearly) 5 days running cluster, CPU usage of tendrl-node-agent service is still bellow 3%, so I can confirm, that it is fixed in the latest builds ( tendrl-node-agent-1.6.3-3.el7rhgs.noarch). Tested and Verified on few clusters with various configurations,
for example:
* cluster with 6 storage nodes, running for 6 days,
with 2 vCPUs and 8GB RAM peer storage node
* cluster with 24 storage nodes, running for 6 days,
with 4 vCPUs and 6GB RAM peer storage node
The tendrl-node-agent CPU utilization is around 1-3%, for example:
(first value is CPU utilization, second value memory utilization)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ ansible -i ci-usm4-gluster.hosts gluster_servers:tendrl_server -m shell \
-a 'ps aux | grep -E "[t]endrl-node-agent" | \
awk "{print \$2}" | sed "s/ /,/" | \
xargs -n1 ps -o %cpu,%mem,cmd -h -p'
ci-usm4-gl2.usmqe.example.com | SUCCESS | rc=0 >>
2.0 0.9 /usr/bin/python /usr/bin/tendrl-node-agent
ci-usm4-gl5.usmqe.example.com | SUCCESS | rc=0 >>
1.8 0.8 /usr/bin/python /usr/bin/tendrl-node-agent
ci-usm4-gl1.usmqe.example.com | SUCCESS | rc=0 >>
2.0 0.9 /usr/bin/python /usr/bin/tendrl-node-agent
ci-usm4-gl3.usmqe.example.com | SUCCESS | rc=0 >>
1.8 0.8 /usr/bin/python /usr/bin/tendrl-node-agent
ci-usm4-gl4.usmqe.example.com | SUCCESS | rc=0 >>
1.8 0.9 /usr/bin/python /usr/bin/tendrl-node-agent
ci-usm4-gl6.usmqe.example.com | SUCCESS | rc=0 >>
1.8 0.8 /usr/bin/python /usr/bin/tendrl-node-agent
ci-usm4-server.usmqe.example.com | SUCCESS | rc=0 >>
1.7 0.4 /usr/bin/python /usr/bin/tendrl-node-agent
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Version-Release number of selected component
Red Hat Enterprise Linux Server release 7.5 (Maipo)
Red Hat Gluster Storage Server 3.4.0
collectd-5.7.2-3.1.el7rhgs.x86_64
collectd-ping-5.7.2-3.1.el7rhgs.x86_64
glusterfs-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-api-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-cli-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-events-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-fuse-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-libs-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-rdma-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
glusterfs-server-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libcollectdclient-5.7.2-3.1.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.2.x86_64
python2-gluster-3.12.2-8.6.gite12fa69.el7rhgs.x86_64
tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.6.3-4.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-2.el7rhgs.noarch
tendrl-node-agent-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
>> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616 |