Bug 1261700 - RFE : Feature: Periodic FOP statistics dumps for v3.6.x/v3.7.x
Summary: RFE : Feature: Periodic FOP statistics dumps for v3.6.x/v3.7.x
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 3.6.6
Hardware: All
OS: All
medium
medium
Target Milestone: ---
Assignee: rwareing
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1266476
TreeView+ depends on / blocked
 
Reported: 2015-09-10 02:13 UTC by rwareing
Modified: 2016-08-23 12:34 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.0
Doc Type: Enhancement
Doc Text:
Clone Of:
: 1266476 (view as bug list)
Environment:
Last Closed: 2016-08-23 12:34:51 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Patch for stats dump code. (40.27 KB, application/mbox)
2015-09-10 02:13 UTC, rwareing
no flags Details
Example output for the dumps (nfsd) (25.89 KB, text/plain)
2015-09-22 21:25 UTC, rwareing
no flags Details

Description rwareing 2015-09-10 02:13:39 UTC
Created attachment 1071977 [details]
Patch for stats dump code.

Description of problem:
Patch to add periodic JSON dumps of FOP latency & hit rate statistics from the io-stats translator.  Dumps are controlled by the diagnostics.stats-dump-interval <dump interval sec> option and stored in /var/lib/glusterd/stats under their respective FUSE, gNFSd or brick instance.

This is immensely useful to reliably ferret out diagnostics & performance metrics from GlusterFS for injection into a robust analytics backend for future analysis or alarming.  Heavily in-use here at Facebook.

Patches clean onto the release-3.6 or release-3.7 branches as of this bug creation.

Version-Release number of selected component (if applicable):
v3.6.x or v3.7.x, should be trivial to port to master.

How reproducible:
100%

Steps to Reproduce:
N/A

Actual results:


Expected results:


Additional info:

Comment 1 Ben England 2015-09-22 20:55:47 UTC
Richard, 

this is an extremely good idea.  I have had to parse gluster volume profile output and it is extremely hard to do.  JSON would make it much easier. Also, io-stats translator can run client-side so you get client-side latency, not server-side.   Would be great if /usr/sbin/gluster could initiate the profiling so we didn't have to edit a volfile.

Can you provide an attachment with JSON output from the patch so that lazy folks like me can see what it looks like?

thx

-Ben England, Perf. Engr., Red Hat

Comment 2 rwareing 2015-09-22 21:25:55 UTC
Created attachment 1076043 [details]
Example output for the dumps (nfsd)

Comment 3 rwareing 2015-09-22 21:34:08 UTC
Added example output.  Also, this is automatically engaged when either of these options is enabled:

diagnostics.latency-measurement
diagnostics.count-fop-hits

and 

diagnostics.ios-dump-interval 

...is set to something non-zero.

We run with these enabled 24x7 on all clusters at all times, and as you have noted it's extremely powerful to be able to look at performance from all layers of the stack (FUSE client, gNFSd and bricks).  And with lockless counters (also in this patch) we haven't observed any perf hit.

Comment 4 Avra Sengupta 2015-09-25 11:23:19 UTC
Cloning this bug to master.

Comment 5 Kaushal 2016-08-23 12:34:51 UTC
This bug is being closed as GlusterFS-3.6 is nearing its End-Of-Life and only important security bugs will be fixed. This bug has been fixed in more recent GlusterFS releases. If you still face this bug with the newer GlusterFS versions, please open a new bug.


Note You need to log in before you can comment on or make changes to this bug.