Bug 1645417

Summary: [origin]secret "kube-etcd-client-certs" is not created for prometheus-k8s pod
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: lserven
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:40:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cluster-monitoring-config configmap none

Description Junqi Zhao 2018-11-02 08:00:58 UTC
Description of problem:
Deploy cluster monitoring, prometheus-k8s pod is in ContainerCreating
# oc -n openshift-monitoring get pod
NAME                                          READY     STATUS              RESTARTS   AGE
cluster-monitoring-operator-bb9c969fd-c2jmj   1/1       Running             0          13m
grafana-867fb88f6-wcr28                       2/2       Running             0          10m
prometheus-k8s-0                              0/4       ContainerCreating   0          9m
prometheus-operator-7b9988b85d-jhr5k          1/1       Running             0          12m

# oc -n openshift-monitoring describe pod prometheus-k8s-0
Events:
  Type     Reason                  Age                From                                Message
  ----     ------                  ----               ----                                -------
  Warning  FailedScheduling        10m (x3 over 10m)  default-scheduler                   pod has unbound PersistentVolumeClaims (repeated 2 times)
  Normal   Scheduled               10m                default-scheduler                   Successfully assigned openshift-monitoring/prometheus-k8s-0 to preserved-juzhao-40-nrr-1
  Normal   SuccessfulAttachVolume  10m                attachdetach-controller             AttachVolume.Attach succeeded for volume "pvc-ce393e74-de70-11e8-bf6a-fa163e8ec639"
  Warning  FailedMount             1m (x12 over 10m)  kubelet, preserved-juzhao-40-nrr-1  MountVolume.SetUp failed for volume "secret-kube-etcd-client-certs" : secrets "kube-etcd-client-certs" not found

# oc -n openshift-monitoring get secret kube-etcd-client-certs
No resources found.
Error from server (NotFound): secrets "kube-etcd-client-certs" not found

Version-Release number of selected component (if applicable):
ose-prometheus-operator-v4.0.0-0.43.0.0


How reproducible:
Always

Steps to Reproduce:
1. Deploy cluster monitoring 4.0
2.
3.

Actual results:
secret "kube-etcd-client-certs" is not created

Expected results:
secret "kube-etcd-client-certs" should be created

Additional info:

Comment 1 Junqi Zhao 2018-11-02 08:29:50 UTC
NOTE: The ose-cluster-monitoring-operator image has not been packaged telemeter client.

Comment 2 Frederic Branczyk 2018-11-02 09:05:10 UTC
What is the content of the `cluster-monitoring` configmap in the `openshift-monitroing` namespace?

Comment 3 Junqi Zhao 2018-11-07 02:01:37 UTC
Created attachment 1502796 [details]
cluster-monitoring-config configmap

used origin images

Comment 4 Junqi Zhao 2018-11-07 02:04:41 UTC
quay.io/openshift/origin-cluster-monitoring-operator:v4.0
openshift/prometheus:v2.4.2
quay.io/coreos/prometheus-config-reloader:v0.25.0
openshift/oauth-proxy:v1.1.0
quay.io/coreos/kube-rbac-proxy:v0.4.0
quay.io/coreos/prom-label-proxy:v0.1.0
quay.io/coreos/configmap-reload:v0.0.1

Comment 5 Junqi Zhao 2018-11-07 06:14:13 UTC
Blocked installation with origin images

Comment 6 Frederic Branczyk 2018-11-09 11:01:54 UTC
Strange, this does look like a bug, we will have to investigate. etcd monitoring is not enabled in the configmap so it shouldn't be attempting to mount the secret.

Comment 7 Junqi Zhao 2018-11-09 11:19:36 UTC
(In reply to Frederic Branczyk from comment #6)
> Strange, this does look like a bug, we will have to investigate. etcd
> monitoring is not enabled in the configmap so it shouldn't be attempting to
> mount the secret.

Is it related to the grafana-dashboard-etcd is created? from the attachment in Comment 3

oc -n openshift-monitoring get cm
NAME                                        DATA      AGE
cluster-monitoring-config                   1         11m
grafana-dashboard-etcd                      1         10m
grafana-dashboard-k8s-cluster-rsrc-use      1         10m
grafana-dashboard-k8s-node-rsrc-use         1         10m
grafana-dashboard-k8s-resources-cluster     1         10m
grafana-dashboard-k8s-resources-namespace   1         10m
grafana-dashboard-k8s-resources-pod         1         10m
grafana-dashboards                          1         10m
prometheus-k8s-rulefiles-0                  1         10m
prometheus-serving-certs-ca-bundle          1         10m

Comment 8 Junqi Zhao 2018-11-14 05:44:16 UTC
Workaround is create kube-etcd-client-certs secret, then prometheus-k8s pod will be started up

# cat kube-etcd-client-certs.yaml 
apiVersion: v1
data:
  etcd-client-ca.crt: ""
  etcd-client.crt: ""
  etcd-client.key: ""
kind: Secret
metadata:
  name: kube-etcd-client-certs
  namespace: openshift-monitoring
type: Opaque

Comment 9 Junqi Zhao 2018-11-14 05:54:56 UTC
Note: The previous issue is happen when installing cluster-monitoring with openshift-ansible 4.0.

Also installed OCP on libvirt by using Next-Gen installer, the issue is not happen, it is because, although not enabled etcd monitoring, kube-etcd-client-certs secret and grafana-dashboard-etcd are created

$ oc -n openshift-monitoring get cm cluster-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    prometheusOperator:
      baseImage: quay.io/coreos/prometheus-operator
      prometheusConfigReloaderBaseImage: quay.io/coreos/prometheus-config-reloader
      configReloaderBaseImage: quay.io/coreos/configmap-reload
    prometheusK8s:
      baseImage: openshift/prometheus
    alertmanagerMain:
      baseImage: openshift/prometheus-alertmanager
    nodeExporter:
      baseImage: openshift/prometheus-node-exporter
    kubeRbacProxy:
      baseImage: quay.io/coreos/kube-rbac-proxy
    kubeStateMetrics:
      baseImage: quay.io/coreos/kube-state-metrics
    grafana:
      baseImage: grafana/grafana
    auth:
      baseImage: openshift/oauth-proxy
kind: ConfigMap
metadata:
  creationTimestamp: 2018-11-14T04:24:10Z
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "6231"
  selfLink: /api/v1/namespaces/openshift-monitoring/configmaps/cluster-monitoring-config
  uid: 2244f71d-e7c5-11e8-83ba-5282253f2bb7


$ oc -n openshift-monitoring get secret kube-etcd-client-certs -oyaml
apiVersion: v1
data:
  etcd-client-ca.crt: ""
  etcd-client.crt: ""
  etcd-client.key: ""
kind: Secret
metadata:
  creationTimestamp: 2018-11-14T04:24:10Z
  name: kube-etcd-client-certs
  namespace: openshift-monitoring
  resourceVersion: "6235"
  selfLink: /api/v1/namespaces/openshift-monitoring/secrets/kube-etcd-client-certs
  uid: 224914ed-e7c5-11e8-83ba-5282253f2bb7
type: Opaque

$ oc -n openshift-monitoring get cm
NAME                                        DATA      AGE
cluster-monitoring-config                   1         1h
grafana-dashboard-etcd                      1         1h
grafana-dashboard-k8s-cluster-rsrc-use      1         1h
grafana-dashboard-k8s-node-rsrc-use         1         1h
grafana-dashboard-k8s-resources-cluster     1         1h
grafana-dashboard-k8s-resources-namespace   1         1h
grafana-dashboard-k8s-resources-pod         1         1h
grafana-dashboards                          1         1h
prometheus-k8s-rulefiles-0                  1         1h
prometheus-serving-certs-ca-bundle          1         1h

Comment 10 lserven 2018-11-14 10:40:43 UTC
I think this is due to the semantics of the configuration file. If no etcd configuration is specified, monitoring etcd defaults to true: https://github.com/openshift/cluster-monitoring-operator/blob/master/pkg/manifests/config.go#L108-L115. We could change the semantics of the defaulting, or change the default config to have `etcd.enabled=false`

Comment 11 Frederic Branczyk 2018-11-14 10:51:23 UTC
Agreed. When no config is given we should default to not monitoring etcd.

Comment 12 Junqi Zhao 2018-12-13 06:04:22 UTC
now,grafana-dashboard-etcd configmap is not created by default
$ oc -n openshift-monitoring get cm
NAME                                        DATA      AGE
adapter-config                              1         18m
cluster-monitoring-config                   1         29m
grafana-dashboard-k8s-cluster-rsrc-use      1         28m
grafana-dashboard-k8s-node-rsrc-use         1         28m
grafana-dashboard-k8s-resources-cluster     1         28m
grafana-dashboard-k8s-resources-namespace   1         28m
grafana-dashboard-k8s-resources-pod         1         28m
grafana-dashboards                          1         28m
prometheus-adapter-prometheus-config        1         18m
prometheus-k8s-rulefiles-0                  1         21m
serving-certs-ca-bundle                     1         27m
sharing-config                              3         17m
telemeter-client-serving-certs-ca-bundle    1         18m

$ oc version
oc v4.0.0-alpha.0+9d2874f-759
kubernetes v1.11.0+9d2874f

used images
docker.io/grafana/grafana:5.2.4
docker.io/openshift/oauth-proxy:v1.1.0
docker.io/openshift/prometheus-alertmanager:v0.15.2
docker.io/openshift/prometheus-node-exporter:v0.16.0
docker.io/openshift/prometheus:v2.5.0
quay.io/coreos/configmap-reload:v0.0.1
quay.io/coreos/kube-rbac-proxy:v0.4.0
quay.io/coreos/kube-state-metrics:v1.4.0
quay.io/coreos/prom-label-proxy:v0.1.0
quay.io/coreos/prometheus-config-reloader:v0.26.0
quay.io/coreos/prometheus-operator:v0.26.0
quay.io/openshift/origin-configmap-reload:v3.11
quay.io/openshift/origin-telemeter:v4.0
quay.io/surbania/k8s-prometheus-adapter-amd64:326bf3c
quay.io/openshift-release-dev/ocp-v4.0@sha256:4f94db8849ed915994678726680fc39bdb47722d3dd570af47b666b0160602e5

Comment 15 errata-xmlrpc 2019-06-04 10:40:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758