This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1261700 - RFE : Feature: Periodic FOP statistics dumps for v3.6.x/v3.7.x
RFE : Feature: Periodic FOP statistics dumps for v3.6.x/v3.7.x
Status: CLOSED NEXTRELEASE
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
3.6.6
All All
medium Severity medium
: ---
: ---
Assigned To: rwareing
: FutureFeature, Triaged
Depends On:
Blocks: 1266476
  Show dependency treegraph
 
Reported: 2015-09-09 22:13 EDT by rwareing
Modified: 2016-08-23 08:34 EDT (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.8.0
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1266476 (view as bug list)
Environment:
Last Closed: 2016-08-23 08:34:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch for stats dump code. (40.27 KB, application/mbox)
2015-09-09 22:13 EDT, rwareing
no flags Details
Example output for the dumps (nfsd) (25.89 KB, text/plain)
2015-09-22 17:25 EDT, rwareing
no flags Details

  None (edit)
Description rwareing 2015-09-09 22:13:39 EDT
Created attachment 1071977 [details]
Patch for stats dump code.

Description of problem:
Patch to add periodic JSON dumps of FOP latency & hit rate statistics from the io-stats translator.  Dumps are controlled by the diagnostics.stats-dump-interval <dump interval sec> option and stored in /var/lib/glusterd/stats under their respective FUSE, gNFSd or brick instance.

This is immensely useful to reliably ferret out diagnostics & performance metrics from GlusterFS for injection into a robust analytics backend for future analysis or alarming.  Heavily in-use here at Facebook.

Patches clean onto the release-3.6 or release-3.7 branches as of this bug creation.

Version-Release number of selected component (if applicable):
v3.6.x or v3.7.x, should be trivial to port to master.

How reproducible:
100%

Steps to Reproduce:
N/A

Actual results:


Expected results:


Additional info:
Comment 1 Ben England 2015-09-22 16:55:47 EDT
Richard, 

this is an extremely good idea.  I have had to parse gluster volume profile output and it is extremely hard to do.  JSON would make it much easier. Also, io-stats translator can run client-side so you get client-side latency, not server-side.   Would be great if /usr/sbin/gluster could initiate the profiling so we didn't have to edit a volfile.

Can you provide an attachment with JSON output from the patch so that lazy folks like me can see what it looks like?

thx

-Ben England, Perf. Engr., Red Hat
Comment 2 rwareing 2015-09-22 17:25 EDT
Created attachment 1076043 [details]
Example output for the dumps (nfsd)
Comment 3 rwareing 2015-09-22 17:34:08 EDT
Added example output.  Also, this is automatically engaged when either of these options is enabled:

diagnostics.latency-measurement
diagnostics.count-fop-hits

and 

diagnostics.ios-dump-interval 

...is set to something non-zero.

We run with these enabled 24x7 on all clusters at all times, and as you have noted it's extremely powerful to be able to look at performance from all layers of the stack (FUSE client, gNFSd and bricks).  And with lockless counters (also in this patch) we haven't observed any perf hit.
Comment 4 Avra Sengupta 2015-09-25 07:23:19 EDT
Cloning this bug to master.
Comment 5 Kaushal 2016-08-23 08:34:51 EDT
This bug is being closed as GlusterFS-3.6 is nearing its End-Of-Life and only important security bugs will be fixed. This bug has been fixed in more recent GlusterFS releases. If you still face this bug with the newer GlusterFS versions, please open a new bug.

Note You need to log in before you can comment on or make changes to this bug.