Bug 969348

Summary: Integrate gluster related stats with PCP
Product: [Fedora] Fedora Reporter: Neependra Khare <nkhare>
Component: pcpAssignee: Nathan Scott <nathans>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: ahatfiel, avishwan, bhubbard, dshaks, fche, kmayilsa, mgoodwin, nathans, rcyriac, sabose, sgowda
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pcp-3.8.2-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-11 19:58:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gluster vol profile parser
none
Output from the parser none

Description Neependra Khare 2013-05-31 09:30:28 UTC
Description of problem:

Performance stats collected by following gluster commands are very useful to find the bottlenecks. 

$ gluster volume profile <VOLNAME> info
$ gluster volume top <VOLNAME>

It would be good if we can integrate them with PCP.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Nathan Scott 2013-06-07 23:37:38 UTC
Hi Neependra,

Just a friendy reminder - this bug is awaiting input from gluster folks on the statistics that are available today (as discussed in email).

cheers.

Comment 2 shishir gowda 2013-06-10 06:29:10 UTC
Created attachment 759052 [details]
gluster vol profile parser

Comment 3 shishir gowda 2013-06-10 06:30:12 UTC
Attaching a python script which take gluster volume name as an option, and gives the profile o/p as a dictionary.
Enable profiling on volumes with (before running the script):
gluster volume profile <volname> start

All the profiling info from the bricks will be put out as dictionary.

Comment 4 Neependra Khare 2013-06-10 11:02:43 UTC
Created attachment 759132 [details]
Output from the parser

Its a sample output from the parser, which was attached earlier.

Comment 5 Nathan Scott 2013-06-21 04:32:39 UTC
Hi all,

I had a look through the script and have setup a little gluster test environment here to experiment with.

First of all, that script's using python to extract values from the XML output of the gluster command.  Previously PCP supported C and perl data sources, so I wandered off for the last week or so and added python plugin support to make progress here easier.  This went out with the pcp-3.8.1 release a day or two ago, so now we can start to think about the next step - bridging your python script and pmcd.

To do that, we need to first map the gluster data to new PCP metrics (names, metadata, etc).  The data the example script prints out looks like this:

[ {'Latency': {
    'fopStats': [
        {'latencyAvg': '68', 'latencyMin': '55', 'hits': '2', 'name': 'STAT', 'latencyMax': '81'},
        ... ]
    'brick': '172.17.40.15:/brick/gluster'
  },
  {'Latency': {
    'fopStats': [
        ...
        {'latencyAvg': '396.750000', 'latencyMin': '127', 'hits': '4', 'name': 'READDIRP', 'latencyMax': '839'}]},
    'brick': '172.17.40.16:/brick/gluster'
  }
]

Perhaps if we go for PCP metric names like:
gluster.latency.fileops.{mkdir,open,write,etc}.{avg,min,count,max}

This data is available per-brick, where a brick can be remote:
    'brick': '172.17.40.15:/brick/gluster'
    'brick': '172.17.40.16:/brick/gluster'

Now, PCP would typically be deployed on all machines (we are getting
stats from the remote kernel/hardware/... too), so we need to decide
whether to represent remote hosts' bricks from every pmdagluster, or
to only present data from the host being queried.

Its not clear whether the data can be requested for the local host
only, from the gluster command - can anyone confirm?  (I didn't find
a way looking at the source - lots of RPC calls sprinkled around).
Which raises another issue - if a remote host is down, we must still
respond to requests for data quickly in pmdagluster (network timeout
is a killer - will result in pmcd terminating pmdagluster for poor
behaviour).  As more and more cluster nodes are added, life becomes
more complicated too, possibly.

So, ideally, we'll just extract the local host stats, and then have
a PCP instance domain ("set of values") of local bricks, for each of
the metric names I listed above.  Then we'd rely on the distributed
PCP metric fetching mechanism to collate the data centrally (along
with all the other statistics for each host).

Is there a way to get stats for just the local bricks?  (& not just
by getting 'em all and ignoring data from the remote hosts).

thanks!

Comment 6 shishir gowda 2013-06-21 05:16:33 UTC
Hi Nathan,

As for querying the stats, a single query on any node part of the gluster-cluster will result in getting stats from across the all the nodes. In effect, for a given volume, you can query stats for all bricks from any given node.

Additionally, there is a provision just to query stat info from a given brick (calling gluster volume profile <volname> <brick>). This again can be run from any node, and does not have to be sent to the node where the brick exists.

The stat representation gluster.latency.fileops.{mkdir,open,write,etc}.{avg,min,count,max} looks good.

Comment 7 Nathan Scott 2013-07-02 04:01:31 UTC
Hi,

I've committed an initial version of pmdagluster which exports the metrics listed below.  I added a few more metrics in since last we chatted, after examining the XML output a bit more closely.  The gluster.volume.profile metric can be used to query whether individual volumes have had profiling enabled.

This code is currently in the dev branch of the main pcp git tree.  It requires the latest python modules from the same branch.  Feel free to try it out (there's an INSTALL file in the top level of the pcp sources, with build and installation instructions) and please let me know how it goes - thanks!

gluster.volume.profile
gluster.volume.dist.count
gluster.volume.stripe.count
gluster.volume.replica.count
gluster.brick.read_bytes
gluster.brick.write_bytes
gluster.brick.latency.xattrop.count
gluster.brick.latency.xattrop.avg
gluster.brick.latency.xattrop.max
gluster.brick.latency.xattrop.min
gluster.brick.latency.writev.count
gluster.brick.latency.writev.avg
gluster.brick.latency.writev.max
gluster.brick.latency.writev.min
gluster.brick.latency.unlink.count
gluster.brick.latency.unlink.avg
gluster.brick.latency.unlink.max
gluster.brick.latency.unlink.min
gluster.brick.latency.truncate.count
gluster.brick.latency.truncate.avg
gluster.brick.latency.truncate.max
gluster.brick.latency.truncate.min
gluster.brick.latency.symlink.count
gluster.brick.latency.symlink.avg
gluster.brick.latency.symlink.max
gluster.brick.latency.symlink.min
gluster.brick.latency.statfs.count
gluster.brick.latency.statfs.avg
gluster.brick.latency.statfs.max
gluster.brick.latency.statfs.min
gluster.brick.latency.stat.count
gluster.brick.latency.stat.avg
gluster.brick.latency.stat.max
gluster.brick.latency.stat.min
gluster.brick.latency.setxattr.count
gluster.brick.latency.setxattr.avg
gluster.brick.latency.setxattr.max
gluster.brick.latency.setxattr.min
gluster.brick.latency.setattr.count
gluster.brick.latency.setattr.avg
gluster.brick.latency.setattr.max
gluster.brick.latency.setattr.min
gluster.brick.latency.rmdir.count
gluster.brick.latency.rmdir.avg
gluster.brick.latency.rmdir.max
gluster.brick.latency.rmdir.min
gluster.brick.latency.rename.count
gluster.brick.latency.rename.avg
gluster.brick.latency.rename.max
gluster.brick.latency.rename.min
gluster.brick.latency.removexattr.count
gluster.brick.latency.removexattr.avg
gluster.brick.latency.removexattr.max
gluster.brick.latency.removexattr.min
gluster.brick.latency.readv.count
gluster.brick.latency.readv.avg
gluster.brick.latency.readv.max
gluster.brick.latency.readv.min
gluster.brick.latency.readlink.count
gluster.brick.latency.readlink.avg
gluster.brick.latency.readlink.max
gluster.brick.latency.readlink.min
gluster.brick.latency.readdirp.count
gluster.brick.latency.readdirp.avg
gluster.brick.latency.readdirp.max
gluster.brick.latency.readdirp.min
gluster.brick.latency.readdir.count
gluster.brick.latency.readdir.avg
gluster.brick.latency.readdir.max
gluster.brick.latency.readdir.min
gluster.brick.latency.rchecksum.count
gluster.brick.latency.rchecksum.avg
gluster.brick.latency.rchecksum.max
gluster.brick.latency.rchecksum.min
gluster.brick.latency.opendir.count
gluster.brick.latency.opendir.avg
gluster.brick.latency.opendir.max
gluster.brick.latency.opendir.min
gluster.brick.latency.open.count
gluster.brick.latency.open.avg
gluster.brick.latency.open.max
gluster.brick.latency.open.min
gluster.brick.latency.mknod.count
gluster.brick.latency.mknod.avg
gluster.brick.latency.mknod.max
gluster.brick.latency.mknod.min
gluster.brick.latency.mkdir.count
gluster.brick.latency.mkdir.avg
gluster.brick.latency.mkdir.max
gluster.brick.latency.mkdir.min
gluster.brick.latency.lookup.count
gluster.brick.latency.lookup.avg
gluster.brick.latency.lookup.max
gluster.brick.latency.lookup.min
gluster.brick.latency.lk.count
gluster.brick.latency.lk.avg
gluster.brick.latency.lk.max
gluster.brick.latency.lk.min
gluster.brick.latency.link.count
gluster.brick.latency.link.avg
gluster.brick.latency.link.max
gluster.brick.latency.link.min
gluster.brick.latency.inodelk.count
gluster.brick.latency.inodelk.avg
gluster.brick.latency.inodelk.max
gluster.brick.latency.inodelk.min
gluster.brick.latency.getxattr.count
gluster.brick.latency.getxattr.avg
gluster.brick.latency.getxattr.max
gluster.brick.latency.getxattr.min
gluster.brick.latency.getspec.count
gluster.brick.latency.getspec.avg
gluster.brick.latency.getspec.max
gluster.brick.latency.getspec.min
gluster.brick.latency.fxattrop.count
gluster.brick.latency.fxattrop.avg
gluster.brick.latency.fxattrop.max
gluster.brick.latency.fxattrop.min
gluster.brick.latency.ftruncate.count
gluster.brick.latency.ftruncate.avg
gluster.brick.latency.ftruncate.max
gluster.brick.latency.ftruncate.min
gluster.brick.latency.fsyncdir.count
gluster.brick.latency.fsyncdir.avg
gluster.brick.latency.fsyncdir.max
gluster.brick.latency.fsyncdir.min
gluster.brick.latency.fsync.count
gluster.brick.latency.fsync.avg
gluster.brick.latency.fsync.max
gluster.brick.latency.fsync.min
gluster.brick.latency.fstat.count
gluster.brick.latency.fstat.avg
gluster.brick.latency.fstat.max
gluster.brick.latency.fstat.min
gluster.brick.latency.fsetxattr.count
gluster.brick.latency.fsetxattr.avg
gluster.brick.latency.fsetxattr.max
gluster.brick.latency.fsetxattr.min
gluster.brick.latency.fsetattr.count
gluster.brick.latency.fsetattr.avg
gluster.brick.latency.fsetattr.max
gluster.brick.latency.fsetattr.min
gluster.brick.latency.fremovexattr.avg
gluster.brick.latency.fremovexattr.count
gluster.brick.latency.fremovexattr.max
gluster.brick.latency.fremovexattr.min
gluster.brick.latency.flush.count
gluster.brick.latency.flush.avg
gluster.brick.latency.flush.max
gluster.brick.latency.flush.min
gluster.brick.latency.finodelk.count
gluster.brick.latency.finodelk.avg
gluster.brick.latency.finodelk.max
gluster.brick.latency.finodelk.min
gluster.brick.latency.fgetxattr.count
gluster.brick.latency.fgetxattr.avg
gluster.brick.latency.fgetxattr.max
gluster.brick.latency.fgetxattr.min
gluster.brick.latency.fentrylk.count
gluster.brick.latency.fentrylk.avg
gluster.brick.latency.fentrylk.max
gluster.brick.latency.fentrylk.min
gluster.brick.latency.fallocate.count
gluster.brick.latency.fallocate.avg
gluster.brick.latency.fallocate.max
gluster.brick.latency.fallocate.min
gluster.brick.latency.entrylk.count
gluster.brick.latency.entrylk.avg
gluster.brick.latency.entrylk.max
gluster.brick.latency.entrylk.min
gluster.brick.latency.discard.count
gluster.brick.latency.discard.avg
gluster.brick.latency.discard.max
gluster.brick.latency.discard.min
gluster.brick.latency.create.count
gluster.brick.latency.create.avg
gluster.brick.latency.create.max
gluster.brick.latency.create.min
gluster.brick.latency.access.min
gluster.brick.latency.access.max
gluster.brick.latency.access.count
gluster.brick.latency.access.avg

Comment 8 Nathan Scott 2013-07-03 04:47:25 UTC
Hi guys,

I wrote a man page today as well, which is also included with PCP now.
It describes a mechanism for enabling the per-volume stats using the
PCP tools (and distributed protocol) as well.

cheers.


PMDAGLUSTER(1)							PMDAGLUSTER(1)

NAME
       pmdagluster - Gluster Filesystem PMDA

DESCRIPTION
       pmdagluster  is a Performance Metrics Domain Agent (PMDA) which exports
       metric values about mounted gluster filesystems	using  the  gluster(8)
       command.	 This PMDA exports metrics about volumes and bricks both local
       and remote to the node where pmdagluster is running.

       The gluster filesystem  supports	 fine-grained  control	over  enabling
       statistics  on  individual  volumes,  so that the values are optionally
       enabled or disabled on systems where they are not desired to  be	 moni-
       tored.

       The  pmstore(1)	command can be used to enable and disable profiling of
       volumes.	 Using the individual instances of the	gluster.volume.profile
       metric,	one  can set their values (and associated profiling) either on
       (1) or off (0).	Additionally, pminfo(1) can report on the current sta-
       tus of profiling of each volume.

	    # pminfo -f gluster.volume.profile

	    gluster.volume.profile
		inst [0 or "gv0"] value 0
		inst [1 or "gv1"] value 1

	    # pmstore -i "gv0" gluster.volume.profile 1
	    gluster.volume.profile inst [0 or "gv0"] old value=0 new value=1

       Further	 details   on	the   gluster	filesystem  can	 be  found  at
       http://www.gluster.org

INSTALLATION
       Install the gluster PMDA by using the Install script as root:

	     # cd $PCP_PMDAS_DIR/gluster
	     # ./Install

       To uninstall, do the following as root:

	     # cd $PCP_PMDAS_DIR/gluster
	     # ./Remove

       pmdagluster is  launched	 by  pmcd(1)  and  should  never  be  executed
       directly.  The Install and Remove scripts notify pmcd(1) when the agent
       is installed or removed.

FILES
       $PCP_PMDAS_DIR/gluster/Install
	   installation script for the pmdagluster agent

       $PCP_PMDAS_DIR/gluster/Remove
	   undo installation script for the pmdagluster agent

       $PCP_LOG_DIR/pmcd/gluster.log
	   default log file for error messages from pmdagluster

PCP ENVIRONMENT
       Environment variables with the prefix PCP_ are used to parameterise the
       file  and  directory  names used by PCP. On each installation, the file
       /etc/pcp.conf contains the  local  values  for  these  variables.   The
       $PCP_CONF  variable may be used to specify an alternative configuration
       file, as described in pcp.conf(5).

SEE ALSO
       pmcd(1), pminfo(1), pmstore(1), and gluster(8)


Performance Co-Pilot		      PCP			PMDAGLUSTER(1)

Comment 9 Fedora Update System 2013-07-31 04:41:21 UTC
pcp-3.8.2-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/pcp-3.8.2-1.fc19

Comment 10 Fedora Update System 2013-07-31 05:16:50 UTC
pcp-3.8.2-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/pcp-3.8.2-1.fc18

Comment 11 Fedora Update System 2013-07-31 05:17:45 UTC
pcp-3.8.2-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/pcp-3.8.2-1.fc17

Comment 12 Fedora Update System 2013-07-31 05:18:45 UTC
pcp-3.8.2-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/pcp-3.8.2-1.el6

Comment 13 Fedora Update System 2013-07-31 05:19:37 UTC
pcp-3.8.2-1.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/pcp-3.8.2-1.el5

Comment 14 Fedora Update System 2013-08-01 20:32:50 UTC
Package pcp-3.8.2-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing pcp-3.8.2-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11023/pcp-3.8.2-1.el6
then log in and leave karma (feedback).

Comment 15 Fedora Update System 2013-08-02 03:47:33 UTC
pcp-3.8.2-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 16 Fedora Update System 2013-08-05 23:28:34 UTC
pcp-3.8.2-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2013-08-10 04:00:54 UTC
pcp-3.8.2-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 18 Fedora Update System 2013-08-10 12:32:51 UTC
pcp-3.8.2-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 19 Fedora Update System 2013-08-16 19:52:25 UTC
pcp-3.8.2-1.el5 has been pushed to the Fedora EPEL 5 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 20 Nathan Scott 2013-08-19 02:13:33 UTC
Due to an oversight on my part, this code was accidentally not included in the build for pcp-3.8.2 - this is resolved in the dev branch for pcp-3.8.3 and will be released in a bugfix update shortly.

Comment 21 Fedora Admin XMLRPC Client 2013-09-11 19:50:03 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 22 Nathan Scott 2013-09-11 19:58:51 UTC
This was released in pcp-3.8.3 earlier this week.