Bug 1266476 - RFE : Feature: Periodic FOP statistics dumps for v3.6.x/v3.7.x
Summary: RFE : Feature: Periodic FOP statistics dumps for v3.6.x/v3.7.x
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: All
OS: All
medium
medium
Target Milestone: ---
Assignee: rwareing
QA Contact:
URL:
Whiteboard:
Depends On: 1261700
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-25 11:23 UTC by Avra Sengupta
Modified: 2016-08-23 12:35 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Enhancement
Doc Text:
Clone Of: 1261700
Environment:
Last Closed: 2016-06-16 13:38:14 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Avra Sengupta 2015-09-25 11:23:58 UTC
+++ This bug was initially created as a clone of Bug #1261700 +++

Description of problem:
Patch to add periodic JSON dumps of FOP latency & hit rate statistics from the io-stats translator.  Dumps are controlled by the diagnostics.stats-dump-interval <dump interval sec> option and stored in /var/lib/glusterd/stats under their respective FUSE, gNFSd or brick instance.

This is immensely useful to reliably ferret out diagnostics & performance metrics from GlusterFS for injection into a robust analytics backend for future analysis or alarming.  Heavily in-use here at Facebook.

Patches clean onto the release-3.6 or release-3.7 branches as of this bug creation.

Version-Release number of selected component (if applicable):
v3.6.x or v3.7.x, should be trivial to port to master.

How reproducible:
100%

Steps to Reproduce:
N/A

Actual results:


Expected results:


Additional info:

--- Additional comment from Ben England on 2015-09-22 16:55:47 EDT ---

Richard, 

this is an extremely good idea.  I have had to parse gluster volume profile output and it is extremely hard to do.  JSON would make it much easier. Also, io-stats translator can run client-side so you get client-side latency, not server-side.   Would be great if /usr/sbin/gluster could initiate the profiling so we didn't have to edit a volfile.

Can you provide an attachment with JSON output from the patch so that lazy folks like me can see what it looks like?

thx

-Ben England, Perf. Engr., Red Hat

--- Additional comment from  on 2015-09-22 17:25 EDT ---



--- Additional comment from  on 2015-09-22 17:34:08 EDT ---

Added example output.  Also, this is automatically engaged when either of these options is enabled:

diagnostics.latency-measurement
diagnostics.count-fop-hits

and 

diagnostics.ios-dump-interval 

...is set to something non-zero.

We run with these enabled 24x7 on all clusters at all times, and as you have noted it's extremely powerful to be able to look at performance from all layers of the stack (FUSE client, gNFSd and bricks).  And with lockless counters (also in this patch) we haven't observed any perf hit.

--- Additional comment from Avra Sengupta on 2015-09-25 07:23:19 EDT ---

Cloning this bug to master.

Comment 1 Avra Sengupta 2015-09-25 11:27:57 UTC
Patch at http://review.gluster.org/#/c/12209/

Comment 2 Mike McCune 2016-03-28 23:30:47 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 3 Niels de Vos 2016-06-16 13:38:14 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.