Bug 1631926

Summary:

Should not show apiserver and kube-controllers data in etcd grafana page

Product:

OpenShift Container Platform

Reporter:

Junqi Zhao <juzhao>

Component:

Monitoring

Assignee:

Frederic Branczyk <fbranczy>

Status:

CLOSED ERRATA

QA Contact:

Junqi Zhao <juzhao>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

3.11.0

CC:

mloibl, surbania, vwalek, wsun

Target Milestone:

---

Target Release:

4.1.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-06-04 10:40:35 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1670700

Bug Blocks:

Attachments:

Description	Flags
apiserver and kube-controllers also shown in etcd grafana page	none
take apiserver for example, the data is shown in etcd grafana page	none
cluster-monitoring-config and grafana-dashboard-etcd configmap output	none
apiserver and kube-controllers are still shown in etcd grafana page	none
cluster-monitoring-config and grafana-dashboard-etcd -v3.11.36-1	none
etcd grafana page	none

Description Junqi Zhao 2018-09-22 02:54:30 UTC

Description of problem:
enabled etcd monitoring, besides etcd data, apiserver and kube-controllers cluster data are shown in etcd grafana page

Version-Release number of selected component (if applicable):
ose-cluster-monitoring-operator-v3.11.12-1


How reproducible:
always

Steps to Reproduce:
1. Install cluster monitoring
	
2. Create Secret/kube-etcd-client-certs that the cluster-monitoring stack expects in your master
********************************************************************************
#!/usr/bin/env bash
set -e
set -x
# only exit with zero if all commands of the pipeline exit successfully
set -o pipefail


oc create -f -<<EOF
apiVersion: v1
data:
  etcd-client-ca.crt: "$(cat /etc/origin/master/master.etcd-ca.crt | base64 --wrap=0)"
  etcd-client.crt: "$(cat /etc/origin/master/master.etcd-client.crt | base64 --wrap=0)"
  etcd-client.key: "$(cat /etc/origin/master/master.etcd-client.key | base64 --wrap=0)"
kind: Secret
metadata:
  name: kube-etcd-client-certs
  namespace: openshift-monitoring
type: Opaque
EOF
********************************************************************************
	 Secret/kube-etcd-client-certs is created
3. # oc edit cm cluster-monitoring-config -n openshift-monitoring
Enable etcd monitoring by adding the followings to cluster-monitoring-config configmap
********************************************************************************
etcd:
  enabled: true
  targets:
  selector:
    openshift.io/component: etcd
    openshift.io/control-plane: "true"
********************************************************************************
FYI: https://github.com/openshift/cluster-monitoring-operator/blob/master/manifests/cluster-monitoring-config.yaml#L22-L27
4. Check etcd grafana page.

Actual results:
apiserver and kube-controllers cluster data are shown in etcd grafana page

Expected results:
Should not show apiserver and kube-controllers data in etcd grafana page

Additional info:

Comment 1 Junqi Zhao 2018-09-22 02:56:05 UTC

Created attachment 1485839 [details]
apiserver and kube-controllers also shown in etcd grafana page

Comment 2 Junqi Zhao 2018-09-22 02:57:02 UTC

Created attachment 1485840 [details]
take apiserver for example, the data is shown in etcd grafana page

Comment 3 Junqi Zhao 2018-09-22 03:01:50 UTC

Created attachment 1485841 [details]
cluster-monitoring-config and grafana-dashboard-etcd configmap output

Comment 4 minden 2018-09-24 15:09:33 UTC

The etcd grafana dashboard determines its data sources based on the `etcd_server_has_leader` [1] metric. As a lot of Golang projects use the global metrics registry and register them in the `init` function of a package, this results in faulty registrations in other projects, importing the initial one.

In the long run this will be fixed with the Kubernetes metrics overhaul [2]. As a short term fix, we can adjust the dashboard upstream (etcd-repo) and trickle the changes down to cluster-monitoring-operator. In particular we can hide the faulty cluster options.

Impact for customers: One will see a broken dashboard, when selecting `apiserver` or `kube-controllers` as a cluster. As of my knowledge this does not classify for a release blocker.


[1] https://github.com/openshift/cluster-monitoring-operator/blob/75f539957f384c084f691d311114227f2a9a38d2/assets/grafana/dashboard-definitions.yaml#L1181

[2] https://github.com/kubernetes/kubernetes/pull/67476#issuecomment-413785762

Comment 5 Matthias Loibl 2018-09-24 15:43:37 UTC

We have now proposed a bug fix with etcd itself: https://github.com/etcd-io/etcd/pull/10116

Comment 6 minden 2018-10-04 08:51:36 UTC

PR [1] to fix the issue is merged into Prometheus Operator. This will propagate into the cluster-monitoring-operator soon and then make it into the Openshift 3.11.z release.

Let me know if you need anything else here from my side.

[1] https://github.com/coreos/prometheus-operator/pull/1959/

Comment 7 minden 2018-10-04 09:20:48 UTC

*** Bug 1634680 has been marked as a duplicate of this bug. ***

Comment 9 Junqi Zhao 2018-11-02 03:23:22 UTC

Issue is not fixed, apiserver and kube-controllers data in etcd grafana page

cluster monitoring image version: v3.11.36-1

Comment 10 Junqi Zhao 2018-11-02 03:23:58 UTC

Created attachment 1500351 [details]
apiserver and kube-controllers are still shown in etcd grafana page

Comment 11 Junqi Zhao 2018-11-02 03:26:51 UTC

Created attachment 1500353 [details]
cluster-monitoring-config and grafana-dashboard-etcd -v3.11.36-1

Comment 12 Frederic Branczyk 2018-11-02 09:10:36 UTC

We're not going to fix this in 3.11.z as the required changes are risky to introduce so we're postponing it to the next non-patch release.

Comment 18 Junqi Zhao 2019-03-29 08:33:23 UTC

move back to MODIFIED, bug 1670700 is not fixed

Comment 20 Junqi Zhao 2019-04-12 09:12:42 UTC

move back to MODIFIED, bug 1670700 is not fixed

Comment 22 Junqi Zhao 2019-04-23 08:10:31 UTC

ectd data is shown in etcd grafana page
payload: 4.0.0-0.nightly-2019-04-20-175518

Comment 23 Junqi Zhao 2019-04-23 08:10:53 UTC

Created attachment 1557470 [details]
etcd grafana page

Comment 25 errata-xmlrpc 2019-06-04 10:40:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758