Bug 1791233

Summary: PCP Vector Host Overview Dashboard shows BCC datasources
Product: Red Hat Enterprise Linux 8 Reporter: Jan Kurik <jkurik>
Component: grafana-pcpAssignee: Andreas Gerstmayr <agerstmayr>
Status: CLOSED DUPLICATE QA Contact: Jan Kurik <jkurik>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.2CC: agerstmayr, grafana-maint, jkurik, mgoodwin, nathans
Target Milestone: rcKeywords: Bugfix, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-05 12:24:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Kurik 2020-01-15 09:45:22 UTC
Description of problem:
The default "PCP Vector Host Overview" dashboard from "grafana-pcp" plugin shows two charts pointing to BCC datasource. IMO this dashboard should use only "Vector" datasource. If BCC datasource is not configured, then the "PCP Vector Host Overview" dashboard shows error displaying these two charts:

* Chart name: Run queue latency (us)
  Datasource query: bcc.runq.latency
* Chart name: BCC disk latency (us) 
  Datasource query: bcc.disk.all.latency


Version-Release number of selected component (if applicable):
* grafana-6.3.6-1.el8
* grafana-pcp-1.0.5-1.el8
* pcp-5.0.2-2.el8
* Vector ? (I am using the latest version of Vector container image from netflixoss/vector:latest )

How reproducible:
always

Steps to Reproduce:
1. Install grafana, pcp and grafana-pcp
# sudo yum install -y grafana-pcp pcp pcp-zeroconf
# sudo systemctl restart grafana-server 
# sudo systemctl start pmproxy pmcd

2. Run Vector as a container image
# podman run -d --name vector -p 80:80 netflixoss/vector:latest

3. Enable the PCP plugin ( Configuration->Plugins then click on "Performance Co-Pilot App" and click on "Enable" button )

4. Create default PCP Vector datasource in Grafana GUI and configure the URL pointing to your pmproxy ( http://<hostname>:44322 )

5. Open the default "PCP Vector Host Overview" dashboard

Actual results:
"Run queue latency (us)" and "BCC disk latency (us)" charts show error reading the datasource

Expected results:
The two charts are either get relevant data from Vector datasource, or are not present on the default Vector dashboard.

Comment 1 Jan Kurik 2020-01-15 10:49:55 UTC
The same behaviour is also observed using the latest currently available build: grafana-pcp-1.0.5-2.el8

Comment 2 Andreas Gerstmayr 2020-01-15 15:58:42 UTC
The Vector datasource in grafana-pcp is replacing the standalone Vector web application - there is no need to run the Vector container.

These two metrics, bcc.runq.latency and bcc.disk.all.latency require the BCC PMDA to be installed on the system (per default the BCC PMDA is not installed).
Unfortunately there is no proper way to display errors from the datasource to the user, so now there is a blinking ! sign in the top left corner of the graph, and if you point the mouse over it, it shows the error message ("Cannot find metric bcc.runq.latency. Please check if the PMDA is enabled" in this case).

Should I remove these two graphs, as the required BCC PMDA is not installed by default? Or should I make this error message more clear? Or point it out somewhere in the docs how to see the error message? Unfortunately there is no proper API to display an error message in a more user friendly way.

Comment 3 Jan Kurik 2020-01-15 16:18:09 UTC
(In reply to Andreas Gerstmayr from comment #2)
> The Vector datasource in grafana-pcp is replacing the standalone Vector web
> application - there is no need to run the Vector container.

Ah, thanks. Good to know.

> These two metrics, bcc.runq.latency and bcc.disk.all.latency require the BCC
> PMDA to be installed on the system (per default the BCC PMDA is not
> installed).
> Unfortunately there is no proper way to display errors from the datasource
> to the user, so now there is a blinking ! sign in the top left corner of the
> graph, and if you point the mouse over it, it shows the error message
> ("Cannot find metric bcc.runq.latency. Please check if the PMDA is enabled"
> in this case).
> 
> Should I remove these two graphs, as the required BCC PMDA is not installed
> by default? Or should I make this error message more clear? Or point it out
> somewhere in the docs how to see the error message? Unfortunately there is
> no proper API to display an error message in a more user friendly way.

I would expect the default configuration works without errors.
If these metrics are something we would like to have in the dashboard, then documenting installation of BCC PMDA on https://grafana-pcp.readthedocs.io/en/latest/datasources/vector.html should be enough, IMO.

Comment 4 Andreas Gerstmayr 2020-01-15 17:20:51 UTC
(In reply to Jan Kurik from comment #3)
> I would expect the default configuration works without errors.
> If these metrics are something we would like to have in the dashboard, then
> documenting installation of BCC PMDA on
> https://grafana-pcp.readthedocs.io/en/latest/datasources/vector.html should
> be enough, IMO.

Okay, I'll move these BCC metrics to a new dashboard which contains multiple BCC metrics, and add a note in this new dashboard that the BCC PMDA is required.

Comment 5 Andreas Gerstmayr 2020-02-03 11:57:51 UTC
Fixed in upstream with:

commit 05a67274542a1a1365cf65ac767144f6f00947fe (HEAD -> master, upstream/master)
Author: Andreas Gerstmayr <agerstmayr>
Date:   Mon Feb 3 12:55:44 2020 +0100

    dashboards: remove BCC metrics from vector host overview (PMDA isn't installed by default)


Will be included in the next release.

Comment 6 Nathan Scott 2020-04-06 04:55:43 UTC
Just another idea on this one, Andreas - perhaps grafana-pcp could check the pmcd.agent.status metric to see if the PMDAs needed for a Vector dashboard are present, and either filter panels from the dashboard(s) and/or provide a very specific message (somehow, not sure where) about the missing PMDA?

Comment 7 Andreas Gerstmayr 2020-04-06 08:58:22 UTC
(In reply to Nathan Scott from comment #6)
> Just another idea on this one, Andreas - perhaps grafana-pcp could check the
> pmcd.agent.status metric to see if the PMDAs needed for a Vector dashboard
> are present, and either filter panels from the dashboard(s) and/or provide a
> very specific message (somehow, not sure where) about the missing PMDA?

Normal dashboards (not scripted) are static, i.e. you can't filter out panels on the fly. Basically it's just a big blob of JSON which gets copied somewhere when the plugin is installed.
Probably I could modify the dashboard definition through the Grafana REST API from the plugin, but that sounds quite hacky.

If the graph uses a metric which can't be found, the following message is displayed currently:
"Cannot find metric bcc.runq.latency. Please check if the PMDA is enabled."

Should I reword this message?

Comment 8 Nathan Scott 2020-04-06 22:04:41 UTC
(In reply to Andreas Gerstmayr from comment #7)
> [...]
> If the graph uses a metric which can't be found, the following message is
> displayed currently:
> "Cannot find metric bcc.runq.latency. Please check if the PMDA is enabled."
> 
> Should I reword this message?

Nope, that looks perfect - thanks for the explanation!

Comment 9 Andreas Gerstmayr 2020-05-05 12:24:53 UTC

*** This bug has been marked as a duplicate of bug 1807099 ***