Bug 969348
Summary: | Integrate gluster related stats with PCP | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Neependra Khare <nkhare> | ||||||
Component: | pcp | Assignee: | Nathan Scott <nathans> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | ahatfiel, avishwan, bhubbard, dshaks, fche, kmayilsa, mgoodwin, nathans, rcyriac, sabose, sgowda | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | pcp-3.8.2-1.el5 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-09-11 19:58:51 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Neependra Khare
2013-05-31 09:30:28 UTC
Hi Neependra, Just a friendy reminder - this bug is awaiting input from gluster folks on the statistics that are available today (as discussed in email). cheers. Created attachment 759052 [details]
gluster vol profile parser
Attaching a python script which take gluster volume name as an option, and gives the profile o/p as a dictionary. Enable profiling on volumes with (before running the script): gluster volume profile <volname> start All the profiling info from the bricks will be put out as dictionary. Created attachment 759132 [details]
Output from the parser
Its a sample output from the parser, which was attached earlier.
Hi all, I had a look through the script and have setup a little gluster test environment here to experiment with. First of all, that script's using python to extract values from the XML output of the gluster command. Previously PCP supported C and perl data sources, so I wandered off for the last week or so and added python plugin support to make progress here easier. This went out with the pcp-3.8.1 release a day or two ago, so now we can start to think about the next step - bridging your python script and pmcd. To do that, we need to first map the gluster data to new PCP metrics (names, metadata, etc). The data the example script prints out looks like this: [ {'Latency': { 'fopStats': [ {'latencyAvg': '68', 'latencyMin': '55', 'hits': '2', 'name': 'STAT', 'latencyMax': '81'}, ... ] 'brick': '172.17.40.15:/brick/gluster' }, {'Latency': { 'fopStats': [ ... {'latencyAvg': '396.750000', 'latencyMin': '127', 'hits': '4', 'name': 'READDIRP', 'latencyMax': '839'}]}, 'brick': '172.17.40.16:/brick/gluster' } ] Perhaps if we go for PCP metric names like: gluster.latency.fileops.{mkdir,open,write,etc}.{avg,min,count,max} This data is available per-brick, where a brick can be remote: 'brick': '172.17.40.15:/brick/gluster' 'brick': '172.17.40.16:/brick/gluster' Now, PCP would typically be deployed on all machines (we are getting stats from the remote kernel/hardware/... too), so we need to decide whether to represent remote hosts' bricks from every pmdagluster, or to only present data from the host being queried. Its not clear whether the data can be requested for the local host only, from the gluster command - can anyone confirm? (I didn't find a way looking at the source - lots of RPC calls sprinkled around). Which raises another issue - if a remote host is down, we must still respond to requests for data quickly in pmdagluster (network timeout is a killer - will result in pmcd terminating pmdagluster for poor behaviour). As more and more cluster nodes are added, life becomes more complicated too, possibly. So, ideally, we'll just extract the local host stats, and then have a PCP instance domain ("set of values") of local bricks, for each of the metric names I listed above. Then we'd rely on the distributed PCP metric fetching mechanism to collate the data centrally (along with all the other statistics for each host). Is there a way to get stats for just the local bricks? (& not just by getting 'em all and ignoring data from the remote hosts). thanks! Hi Nathan, As for querying the stats, a single query on any node part of the gluster-cluster will result in getting stats from across the all the nodes. In effect, for a given volume, you can query stats for all bricks from any given node. Additionally, there is a provision just to query stat info from a given brick (calling gluster volume profile <volname> <brick>). This again can be run from any node, and does not have to be sent to the node where the brick exists. The stat representation gluster.latency.fileops.{mkdir,open,write,etc}.{avg,min,count,max} looks good. Hi, I've committed an initial version of pmdagluster which exports the metrics listed below. I added a few more metrics in since last we chatted, after examining the XML output a bit more closely. The gluster.volume.profile metric can be used to query whether individual volumes have had profiling enabled. This code is currently in the dev branch of the main pcp git tree. It requires the latest python modules from the same branch. Feel free to try it out (there's an INSTALL file in the top level of the pcp sources, with build and installation instructions) and please let me know how it goes - thanks! gluster.volume.profile gluster.volume.dist.count gluster.volume.stripe.count gluster.volume.replica.count gluster.brick.read_bytes gluster.brick.write_bytes gluster.brick.latency.xattrop.count gluster.brick.latency.xattrop.avg gluster.brick.latency.xattrop.max gluster.brick.latency.xattrop.min gluster.brick.latency.writev.count gluster.brick.latency.writev.avg gluster.brick.latency.writev.max gluster.brick.latency.writev.min gluster.brick.latency.unlink.count gluster.brick.latency.unlink.avg gluster.brick.latency.unlink.max gluster.brick.latency.unlink.min gluster.brick.latency.truncate.count gluster.brick.latency.truncate.avg gluster.brick.latency.truncate.max gluster.brick.latency.truncate.min gluster.brick.latency.symlink.count gluster.brick.latency.symlink.avg gluster.brick.latency.symlink.max gluster.brick.latency.symlink.min gluster.brick.latency.statfs.count gluster.brick.latency.statfs.avg gluster.brick.latency.statfs.max gluster.brick.latency.statfs.min gluster.brick.latency.stat.count gluster.brick.latency.stat.avg gluster.brick.latency.stat.max gluster.brick.latency.stat.min gluster.brick.latency.setxattr.count gluster.brick.latency.setxattr.avg gluster.brick.latency.setxattr.max gluster.brick.latency.setxattr.min gluster.brick.latency.setattr.count gluster.brick.latency.setattr.avg gluster.brick.latency.setattr.max gluster.brick.latency.setattr.min gluster.brick.latency.rmdir.count gluster.brick.latency.rmdir.avg gluster.brick.latency.rmdir.max gluster.brick.latency.rmdir.min gluster.brick.latency.rename.count gluster.brick.latency.rename.avg gluster.brick.latency.rename.max gluster.brick.latency.rename.min gluster.brick.latency.removexattr.count gluster.brick.latency.removexattr.avg gluster.brick.latency.removexattr.max gluster.brick.latency.removexattr.min gluster.brick.latency.readv.count gluster.brick.latency.readv.avg gluster.brick.latency.readv.max gluster.brick.latency.readv.min gluster.brick.latency.readlink.count gluster.brick.latency.readlink.avg gluster.brick.latency.readlink.max gluster.brick.latency.readlink.min gluster.brick.latency.readdirp.count gluster.brick.latency.readdirp.avg gluster.brick.latency.readdirp.max gluster.brick.latency.readdirp.min gluster.brick.latency.readdir.count gluster.brick.latency.readdir.avg gluster.brick.latency.readdir.max gluster.brick.latency.readdir.min gluster.brick.latency.rchecksum.count gluster.brick.latency.rchecksum.avg gluster.brick.latency.rchecksum.max gluster.brick.latency.rchecksum.min gluster.brick.latency.opendir.count gluster.brick.latency.opendir.avg gluster.brick.latency.opendir.max gluster.brick.latency.opendir.min gluster.brick.latency.open.count gluster.brick.latency.open.avg gluster.brick.latency.open.max gluster.brick.latency.open.min gluster.brick.latency.mknod.count gluster.brick.latency.mknod.avg gluster.brick.latency.mknod.max gluster.brick.latency.mknod.min gluster.brick.latency.mkdir.count gluster.brick.latency.mkdir.avg gluster.brick.latency.mkdir.max gluster.brick.latency.mkdir.min gluster.brick.latency.lookup.count gluster.brick.latency.lookup.avg gluster.brick.latency.lookup.max gluster.brick.latency.lookup.min gluster.brick.latency.lk.count gluster.brick.latency.lk.avg gluster.brick.latency.lk.max gluster.brick.latency.lk.min gluster.brick.latency.link.count gluster.brick.latency.link.avg gluster.brick.latency.link.max gluster.brick.latency.link.min gluster.brick.latency.inodelk.count gluster.brick.latency.inodelk.avg gluster.brick.latency.inodelk.max gluster.brick.latency.inodelk.min gluster.brick.latency.getxattr.count gluster.brick.latency.getxattr.avg gluster.brick.latency.getxattr.max gluster.brick.latency.getxattr.min gluster.brick.latency.getspec.count gluster.brick.latency.getspec.avg gluster.brick.latency.getspec.max gluster.brick.latency.getspec.min gluster.brick.latency.fxattrop.count gluster.brick.latency.fxattrop.avg gluster.brick.latency.fxattrop.max gluster.brick.latency.fxattrop.min gluster.brick.latency.ftruncate.count gluster.brick.latency.ftruncate.avg gluster.brick.latency.ftruncate.max gluster.brick.latency.ftruncate.min gluster.brick.latency.fsyncdir.count gluster.brick.latency.fsyncdir.avg gluster.brick.latency.fsyncdir.max gluster.brick.latency.fsyncdir.min gluster.brick.latency.fsync.count gluster.brick.latency.fsync.avg gluster.brick.latency.fsync.max gluster.brick.latency.fsync.min gluster.brick.latency.fstat.count gluster.brick.latency.fstat.avg gluster.brick.latency.fstat.max gluster.brick.latency.fstat.min gluster.brick.latency.fsetxattr.count gluster.brick.latency.fsetxattr.avg gluster.brick.latency.fsetxattr.max gluster.brick.latency.fsetxattr.min gluster.brick.latency.fsetattr.count gluster.brick.latency.fsetattr.avg gluster.brick.latency.fsetattr.max gluster.brick.latency.fsetattr.min gluster.brick.latency.fremovexattr.avg gluster.brick.latency.fremovexattr.count gluster.brick.latency.fremovexattr.max gluster.brick.latency.fremovexattr.min gluster.brick.latency.flush.count gluster.brick.latency.flush.avg gluster.brick.latency.flush.max gluster.brick.latency.flush.min gluster.brick.latency.finodelk.count gluster.brick.latency.finodelk.avg gluster.brick.latency.finodelk.max gluster.brick.latency.finodelk.min gluster.brick.latency.fgetxattr.count gluster.brick.latency.fgetxattr.avg gluster.brick.latency.fgetxattr.max gluster.brick.latency.fgetxattr.min gluster.brick.latency.fentrylk.count gluster.brick.latency.fentrylk.avg gluster.brick.latency.fentrylk.max gluster.brick.latency.fentrylk.min gluster.brick.latency.fallocate.count gluster.brick.latency.fallocate.avg gluster.brick.latency.fallocate.max gluster.brick.latency.fallocate.min gluster.brick.latency.entrylk.count gluster.brick.latency.entrylk.avg gluster.brick.latency.entrylk.max gluster.brick.latency.entrylk.min gluster.brick.latency.discard.count gluster.brick.latency.discard.avg gluster.brick.latency.discard.max gluster.brick.latency.discard.min gluster.brick.latency.create.count gluster.brick.latency.create.avg gluster.brick.latency.create.max gluster.brick.latency.create.min gluster.brick.latency.access.min gluster.brick.latency.access.max gluster.brick.latency.access.count gluster.brick.latency.access.avg Hi guys, I wrote a man page today as well, which is also included with PCP now. It describes a mechanism for enabling the per-volume stats using the PCP tools (and distributed protocol) as well. cheers. PMDAGLUSTER(1) PMDAGLUSTER(1) NAME pmdagluster - Gluster Filesystem PMDA DESCRIPTION pmdagluster is a Performance Metrics Domain Agent (PMDA) which exports metric values about mounted gluster filesystems using the gluster(8) command. This PMDA exports metrics about volumes and bricks both local and remote to the node where pmdagluster is running. The gluster filesystem supports fine-grained control over enabling statistics on individual volumes, so that the values are optionally enabled or disabled on systems where they are not desired to be moni- tored. The pmstore(1) command can be used to enable and disable profiling of volumes. Using the individual instances of the gluster.volume.profile metric, one can set their values (and associated profiling) either on (1) or off (0). Additionally, pminfo(1) can report on the current sta- tus of profiling of each volume. # pminfo -f gluster.volume.profile gluster.volume.profile inst [0 or "gv0"] value 0 inst [1 or "gv1"] value 1 # pmstore -i "gv0" gluster.volume.profile 1 gluster.volume.profile inst [0 or "gv0"] old value=0 new value=1 Further details on the gluster filesystem can be found at http://www.gluster.org INSTALLATION Install the gluster PMDA by using the Install script as root: # cd $PCP_PMDAS_DIR/gluster # ./Install To uninstall, do the following as root: # cd $PCP_PMDAS_DIR/gluster # ./Remove pmdagluster is launched by pmcd(1) and should never be executed directly. The Install and Remove scripts notify pmcd(1) when the agent is installed or removed. FILES $PCP_PMDAS_DIR/gluster/Install installation script for the pmdagluster agent $PCP_PMDAS_DIR/gluster/Remove undo installation script for the pmdagluster agent $PCP_LOG_DIR/pmcd/gluster.log default log file for error messages from pmdagluster PCP ENVIRONMENT Environment variables with the prefix PCP_ are used to parameterise the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5). SEE ALSO pmcd(1), pminfo(1), pmstore(1), and gluster(8) Performance Co-Pilot PCP PMDAGLUSTER(1) pcp-3.8.2-1.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/pcp-3.8.2-1.fc19 pcp-3.8.2-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/pcp-3.8.2-1.fc18 pcp-3.8.2-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/pcp-3.8.2-1.fc17 pcp-3.8.2-1.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/pcp-3.8.2-1.el6 pcp-3.8.2-1.el5 has been submitted as an update for Fedora EPEL 5. https://admin.fedoraproject.org/updates/pcp-3.8.2-1.el5 Package pcp-3.8.2-1.el6: * should fix your issue, * was pushed to the Fedora EPEL 6 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing pcp-3.8.2-1.el6' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11023/pcp-3.8.2-1.el6 then log in and leave karma (feedback). pcp-3.8.2-1.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.8.2-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.8.2-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.8.2-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.8.2-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report. Due to an oversight on my part, this code was accidentally not included in the build for pcp-3.8.2 - this is resolved in the dev branch for pcp-3.8.3 and will be released in a bugfix update shortly. This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. This was released in pcp-3.8.3 earlier this week. |