Bug 1631926

Summary: Should not show apiserver and kube-controllers data in etcd grafana page
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: mloibl, surbania, vwalek, wsun
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:40:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1670700    
Bug Blocks:    
Attachments:
Description Flags
apiserver and kube-controllers also shown in etcd grafana page
none
take apiserver for example, the data is shown in etcd grafana page
none
cluster-monitoring-config and grafana-dashboard-etcd configmap output
none
apiserver and kube-controllers are still shown in etcd grafana page
none
cluster-monitoring-config and grafana-dashboard-etcd -v3.11.36-1
none
etcd grafana page none

Description Junqi Zhao 2018-09-22 02:54:30 UTC
Description of problem:
enabled etcd monitoring, besides etcd data, apiserver and kube-controllers cluster data are shown in etcd grafana page

Version-Release number of selected component (if applicable):
ose-cluster-monitoring-operator-v3.11.12-1


How reproducible:
always

Steps to Reproduce:
1. Install cluster monitoring
	
2. Create Secret/kube-etcd-client-certs that the cluster-monitoring stack expects in your master
********************************************************************************
#!/usr/bin/env bash
set -e
set -x
# only exit with zero if all commands of the pipeline exit successfully
set -o pipefail


oc create -f -<<EOF
apiVersion: v1
data:
  etcd-client-ca.crt: "$(cat /etc/origin/master/master.etcd-ca.crt | base64 --wrap=0)"
  etcd-client.crt: "$(cat /etc/origin/master/master.etcd-client.crt | base64 --wrap=0)"
  etcd-client.key: "$(cat /etc/origin/master/master.etcd-client.key | base64 --wrap=0)"
kind: Secret
metadata:
  name: kube-etcd-client-certs
  namespace: openshift-monitoring
type: Opaque
EOF
********************************************************************************
	 Secret/kube-etcd-client-certs is created
3. # oc edit cm cluster-monitoring-config -n openshift-monitoring
Enable etcd monitoring by adding the followings to cluster-monitoring-config configmap
********************************************************************************
etcd:
  enabled: true
  targets:
  selector:
    openshift.io/component: etcd
    openshift.io/control-plane: "true"
********************************************************************************
FYI: https://github.com/openshift/cluster-monitoring-operator/blob/master/manifests/cluster-monitoring-config.yaml#L22-L27
4. Check etcd grafana page.

Actual results:
apiserver and kube-controllers cluster data are shown in etcd grafana page

Expected results:
Should not show apiserver and kube-controllers data in etcd grafana page

Additional info:

Comment 1 Junqi Zhao 2018-09-22 02:56:05 UTC
Created attachment 1485839 [details]
apiserver and kube-controllers also shown in etcd grafana page

Comment 2 Junqi Zhao 2018-09-22 02:57:02 UTC
Created attachment 1485840 [details]
take apiserver for example, the data is shown in etcd grafana page

Comment 3 Junqi Zhao 2018-09-22 03:01:50 UTC
Created attachment 1485841 [details]
cluster-monitoring-config and grafana-dashboard-etcd configmap output

Comment 4 minden 2018-09-24 15:09:33 UTC
The etcd grafana dashboard determines its data sources based on the `etcd_server_has_leader` [1] metric. As a lot of Golang projects use the global metrics registry and register them in the `init` function of a package, this results in faulty registrations in other projects, importing the initial one.

In the long run this will be fixed with the Kubernetes metrics overhaul [2]. As a short term fix, we can adjust the dashboard upstream (etcd-repo) and trickle the changes down to cluster-monitoring-operator. In particular we can hide the faulty cluster options.

Impact for customers: One will see a broken dashboard, when selecting `apiserver` or `kube-controllers` as a cluster. As of my knowledge this does not classify for a release blocker.


[1] https://github.com/openshift/cluster-monitoring-operator/blob/75f539957f384c084f691d311114227f2a9a38d2/assets/grafana/dashboard-definitions.yaml#L1181

[2] https://github.com/kubernetes/kubernetes/pull/67476#issuecomment-413785762

Comment 5 Matthias Loibl 2018-09-24 15:43:37 UTC
We have now proposed a bug fix with etcd itself: https://github.com/etcd-io/etcd/pull/10116

Comment 6 minden 2018-10-04 08:51:36 UTC
PR [1] to fix the issue is merged into Prometheus Operator. This will propagate into the cluster-monitoring-operator soon and then make it into the Openshift 3.11.z release.

Let me know if you need anything else here from my side.

[1] https://github.com/coreos/prometheus-operator/pull/1959/

Comment 7 minden 2018-10-04 09:20:48 UTC
*** Bug 1634680 has been marked as a duplicate of this bug. ***

Comment 9 Junqi Zhao 2018-11-02 03:23:22 UTC
Issue is not fixed, apiserver and kube-controllers data in etcd grafana page

cluster monitoring image version: v3.11.36-1

Comment 10 Junqi Zhao 2018-11-02 03:23:58 UTC
Created attachment 1500351 [details]
apiserver and kube-controllers are still shown in etcd grafana page

Comment 11 Junqi Zhao 2018-11-02 03:26:51 UTC
Created attachment 1500353 [details]
cluster-monitoring-config and grafana-dashboard-etcd -v3.11.36-1

Comment 12 Frederic Branczyk 2018-11-02 09:10:36 UTC
We're not going to fix this in 3.11.z as the required changes are risky to introduce so we're postponing it to the next non-patch release.

Comment 18 Junqi Zhao 2019-03-29 08:33:23 UTC
move back to MODIFIED, bug 1670700 is not fixed

Comment 20 Junqi Zhao 2019-04-12 09:12:42 UTC
move back to MODIFIED, bug 1670700 is not fixed

Comment 22 Junqi Zhao 2019-04-23 08:10:31 UTC
ectd data is shown in etcd grafana page
payload: 4.0.0-0.nightly-2019-04-20-175518

Comment 23 Junqi Zhao 2019-04-23 08:10:53 UTC
Created attachment 1557470 [details]
etcd grafana page

Comment 25 errata-xmlrpc 2019-06-04 10:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758