Bug 1594899
| Summary: | Most IOPS charts in At a Glance section of Brick Dashboards shows no data for short or light workloads | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Martin Bukatovic <mbukatov> |
| Component: | web-admin-tendrl-monitoring-integration | Assignee: | Shubhendu Tripathi <shtripat> |
| Status: | CLOSED ERRATA | QA Contact: | Martin Bukatovic <mbukatov> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.4 | CC: | nthomas, rhs-bugs, sankarshan, shtripat |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | tendrl-monitoring-integration-1.6.3-6.el7rhgs | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-04 07:07:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1503137 | ||
| Attachments: | |||
|
Description
Martin Bukatovic
2018-06-25 16:09:13 UTC
Reported during testing BZ 1581736. Created attachment 1454405 [details]
wiki export split script
Created attachment 1454408 [details]
screenshot 1 (this looks ok)
Created attachment 1454409 [details]
screenshot 2: zeroes reported
Created attachment 1454411 [details]
screenshot 3: no data reported at all
I really checked that there are data (as expected) on the bricks for which no IOPS are reported, eg. the brick from screenshot 2: ``` [root@mbukatov-usm1-gl1 ~]# ls /mnt/brick_beta_arbiter_1/1 | wc -l 3005 ``` @Martin can you please check if the extracted files are present in the other machines. Might be the case where the all the files are extracted in same machine. (In reply to gowtham from comment #9) > @Martin can you please check if the extracted files are present in the other > machines. Might be the case where the all the files are extracted in same > machine. I'm quite sure that all bricks were utilized, which I checked via ssh on few machines (as noted in comment 7 for gl1 machine) and checked utilization charts of all bricks both in WA (Brick Details page of the volume) and in Grafana in all dashboards. The point of extracting 10 000 files named using sha1 of their content is to achieve uniform allocation of files across all bricks. Additional Information ====================== I scheduled the same workload to be run over night without limiting number of extracted files: ``` [root@mbukatov-usm1-client volume_beta_arbiter_2_plus_1x2]# bzcat /tmp/enwiki-latest-pages-articles.xml.bz2 | wiki-export-split.py --noredir --filenames=sha1 --sha1sum=wikipages.sha1 ``` This means that the same rate of new files and data were stored on the volume as described in the reproducer of this BZ, but for much longer time period (over 12 hours). And I noticed that *all IOPS charts* in At a Glance section of Brick Dashboards reports data as expected. Created attachment 1454596 [details] screenshot 4: short term vs long term workload Attaching screenshot 4 providing evidence for comment 11. This means that the problem is with reporting IOPS for too light or too short workloads, which are reported only sometimes. Long running and high IOPS workloads seems to be reported fine (after while). Created attachment 1455203 [details]
IOPS shooting up while writing small no of smaller files to the volume mount
Created attachment 1456241 [details]
iops_while_no_of_small_file_being_written
Will be verified with this limitation in mind:
> So mismatch between the starting points of trends coming up in grafana
> dashboard is in-evitable I feel due to these technical limitations.
That said, I expect to see some improvement here as well. We will need to
write down some known issue/limitation notice based on the result of the
testing.
Testing with ============ [root@mbukatov-usm1-server ~]# rpm -qa | grep tendrl | sort tendrl-ansible-1.6.3-6.el7rhgs.noarch tendrl-api-1.6.3-5.el7rhgs.noarch tendrl-api-httpd-1.6.3-5.el7rhgs.noarch tendrl-commons-1.6.3-12.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-10.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch tendrl-node-agent-1.6.3-10.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-10.el7rhgs.noarch [root@mbukatov-usm1-gl1 ~]# rpm -qa | grep tendrl | sort tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch tendrl-commons-1.6.3-12.el7rhgs.noarch tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch tendrl-node-agent-1.6.3-10.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch [root@mbukatov-usm1-gl1 ~]# rpm -qa | grep gluster | sort glusterfs-3.12.2-16.el7rhgs.x86_64 glusterfs-api-3.12.2-16.el7rhgs.x86_64 glusterfs-cli-3.12.2-16.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-16.el7rhgs.x86_64 glusterfs-events-3.12.2-16.el7rhgs.x86_64 glusterfs-fuse-3.12.2-16.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-16.el7rhgs.x86_64 glusterfs-libs-3.12.2-16.el7rhgs.x86_64 glusterfs-rdma-3.12.2-16.el7rhgs.x86_64 glusterfs-server-3.12.2-16.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64 python2-gluster-3.12.2-16.el7rhgs.x86_64 tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch vdsm-gluster-4.19.43-2.3.el7rhgs.noarch Results ======= When I perform the steps to reproduce, I see IOPS data reported on Brick dashboards for all bricks of beta volume for this light workload (I checked IOPS charts for all 18 bricks of "beta" volume, which is arbiter 2+1x2). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616 |