Hide Forgot
Description of problem: Cloned from https://jira.coreos.com/browse/MON-579 Create cluster-monitoring-config configmap to attach PVs, content see below apiVersion: v1 data: config.yaml: | prometheusK8s: volumeClaimTemplate: spec: storageClassName: gp2 resources: requests: storage: 2Gi alertmanagerMain: volumeClaimTemplate: spec: storageClassName: gp2 resources: requests: storage: 2Gi kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring # oc get sc NAME PROVISIONER AGE gp2 (default) kubernetes.io/aws-ebs 144m But cluster-monitoring-operator pod reports error, "spec.storage.volumeClaimTemplate.metadata.creationTimestamp in body must be of type string: "null"", details please see # oc -n openshift-monitoring logs cluster-monitoring-operator-89d8d78df-rlbpc | grep "openshift-monitoring/cluster-monitoring-config" E0219 10:56:33.794706 1 operator.go:244] Syncing "openshift-monitoring/cluster-monitoring-config" failed E0219 10:56:33.794731 1 operator.go:245] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Prometheus-k8s failed: reconciling Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "k8s" is invalid: []: Invalid value: map[string]interface {}{"apiVersion":"monitoring.coreos.com/v1", "metadata":map[string]interface {}{"name":"k8s", "namespace":"openshift-monitoring", "resourceVersion":"12632", "generation":1, "uid":"5c5e724b-3421-11e9-a787-0ad8c958fe58", "creationTimestamp":"2019-02-19T08:35:49Z", "labels":map[string]interface {}{"prometheus":"k8s"}}, "spec":map[string]interface {}{"serviceMonitorNamespaceSelector":map[string]interface {}{}, "serviceAccountName":"prometheus-k8s", "image":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba01869048bf44fc5e8c57f0a34369750ce27e3fb0b5eb47c78f42022640154c", "baseImage":"openshift/prometheus", "secrets":[]interface {}{"prometheus-k8s-tls", "prometheus-k8s-proxy", "prometheus-k8s-htpasswd", "kube-rbac-proxy"}, "resources":map[string]interface {}{}, "nodeSelector":map[string]interface {}{"beta.kubernetes.io/os":"linux"}, "ruleSelector":map[string]interface {}{"matchLabels":map[string]interface {}{"prometheus":"k8s", "role":"alert-rules"}}, "version":"v2.5.0", "storage":map[string]interface {}{"volumeClaimTemplate":map[string]interface {}{"spec":map[string]interface {}{"resources":map[string]interface {}{"requests":map[string]interface {}{"storage":"2Gi"}}, "storageClassName":"gp2", "dataSource":interface {}(nil)}, "status":map[string]interface {}{}, "metadata":map[string]interface {}{"creationTimestamp":interface {}(nil)}}}, "containers":[]interface {}{map[string]interface {}{"volumeMounts":[]interface {}{map[string]interface {}{"name":"secret-prometheus-k8s-tls", "mountPath":"/etc/tls/private"}, map[string]interface {}{"name":"secret-prometheus-k8s-proxy", "mountPath":"/etc/proxy/secrets"}, map[string]interface {}{"name":"secret-prometheus-k8s-htpasswd", "mountPath":"/etc/proxy/htpasswd"}}, "name":"prometheus-proxy", "image":"quay.io/openshift/origin-oauth-proxy:latest", "args":[]interface {}{"-provider=openshift", "-https-address=:9091", "-http-address=", "-email-domain=*", "-upstream=http://localhost:9090", "-htpasswd-file=/etc/proxy/htpasswd/auth", "-openshift-service-account=prometheus-k8s", "-openshift-sar={\"resource\": \"namespaces\", \"verb\": \"get\"}", "-openshift-delegate-urls={\"/\": {\"resource\": \"namespaces\", \"verb\": \"get\"}}", "-tls-cert=/etc/tls/private/tls.crt", "-tls-key=/etc/tls/private/tls.key", "-client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token", "-cookie-secret-file=/etc/proxy/secrets/session_secret", "-openshift-ca=/etc/pki/tls/cert.pem", "-openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt", "-skip-auth-regex=^/metrics"}, "ports":[]interface {}{map[string]interface {}{"name":"web", "containerPort":9091}}, "resources":map[string]interface {}{}}, map[string]interface {}{"resources":map[string]interface {}{}, "volumeMounts":[]interface {}{map[string]interface {}{"name":"secret-prometheus-k8s-tls", "mountPath":"/etc/tls/private"}, map[string]interface {}{"name":"secret-kube-rbac-proxy", "mountPath":"/etc/kube-rbac-proxy"}}, "name":"kube-rbac-proxy", "image":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:451274b24916b97e5ba2116dd0775cdb7e1de98d034ac8874b81c1a3b22cf6b1", "args":[]interface {}{"--secure-listen-address=0.0.0.0:9092", "--upstream=http://127.0.0.1:9095", "--config-file=/etc/kube-rbac-proxy/config.yaml", "--tls-cert-file=/etc/tls/private/tls.crt", "--tls-private-key-file=/etc/tls/private/tls.key", "--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256", "--logtostderr=true", "--v=10"}, "ports":[]interface {}{map[string]interface {}{"name":"tenancy", "containerPort":9092}}}, map[string]interface {}{"image":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8675adb4a2a367c9205e3879b986da69400b9187df7ac3f3fbf9882e6a356252", "args":[]interface {}{"--insecure-listen-address=127.0.0.1:9095", "--upstream=http://127.0.0.1:9090", "--label=namespace"}, "resources":map[string]interface {}{}, "name":"prom-label-proxy"}}, "affinity":map[string]interface {}{"podAntiAffinity":map[string]interface {}{"preferredDuringSchedulingIgnoredDuringExecution":[]interface {}{map[string]interface {}{"weight":100, "podAffinityTerm":map[string]interface {}{"labelSelector":map[string]interface {}{"matchExpressions":[]interface {}{map[string]interface {}{"values":[]interface {}{"k8s"}, "key":"prometheus", "operator":"In"}}}, "namespaces":[]interface {}{"openshift-monitoring"}, "topologyKey":"kubernetes.io/hostname"}}}}}, "securityContext":map[string]interface {}{}, "replicas":2, "listenLocal":true, "serviceMonitorSelector":map[string]interface {}{}, "retention":"15d", "alerting":map[string]interface {}{"alertmanagers":[]interface {}{map[string]interface {}{"namespace":"openshift-monitoring", "name":"alertmanager-main", "port":"web", "scheme":"https", "tlsConfig":map[string]interface {}{"caFile":"/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt", "serverName":"alertmanager-main.openshift-monitoring.svc"}, "bearerTokenFile":"/var/run/secrets/kubernetes.io/serviceaccount/token"}}}, "externalUrl":"https://prometheus-k8s-openshift-monitoring.apps.qe-juzhao2.qe.devcluster.openshift.com/", "configMaps":[]interface {}{"serving-certs-ca-bundle", "csr-controller-ca-bundle"}}, "kind":"Prometheus"}: validation failure list: spec.storage.volumeClaimTemplate.metadata.creationTimestamp in body must be of type string: "null" Version-Release number of selected component (if applicable): #oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-02-20-194410 True False 58m Cluster version is 4.0.0-0.nightly-2019-02-20-194410 configmap-reloader: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:037fa98f23ff812b6861675127d52eea43caa44bb138e7fe41c7199cb8d4d634 prometheus-config-reloader: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0b88f4c0bfc31f15d368619b951b9020853686ce46d36692f62ef437d83b1012 kube-state-metrics: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:36f168dc7fc6ada9af0f2eeb88f394f2e7311340acc25f801830fe509fd93911 prometheus-node-exporter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:42be8e58f00a54b4f4cbf849203a139c93bebde8cc40e5be84305246be620350 prometheus-alertmanager: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:455855037348f33f9810f7531d52e86450e5c75d9d06531d144abc5ac53c6786 kube-rbac-proxy: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4d229dee301eb7452227fefc2704b30cf58e7a7f85e0c66dd3798b6b64b79728 prometheus-operator: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:50de7804ddd623f1b4e0f57157ce01102db7e68179c5744bac4e92c81714a881 cluster-monitoring-operator: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:534a71a355e3b9c79ef5a192a200730b8641f5e266abe290b6f7c6342210d8a0 telemeter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9021d3e9ce028fc72301f8e0a40c37e488db658e1500a790c794bfd38903bef1 prom-label-proxy: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:90a29a928beffc938345760f88b6890dccdc6f1a6503f09fea7399469a6ca72a prometheus: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba51ac66b4c3a46d5445bdfa32f1f04b882498fe5405d88dc78a956742657105 k8s-prometheus-adapter: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ee79721af3078dfbcfaa75e9a47da1526464cf6685a7f4195ea214c840b59e9f grafana: quay.io/openshift/origin-grafana:latest oauth-proxy: quay.io/openshift/origin-oauth-proxy:latest How reproducible: Always Steps to Reproduce: 1. See the description part 2. 3. Actual results: Failed to attach PVs for monitoring Expected results: Be able to attach PVs for monitoring Additional info:
PVs could be attached, but this fix bring other problems, could not scrape kubelet from worker nodes, " x509: certificate signed by unknown authority" for the 10250/metrics/cadvisor and 10250/metrics targets on worker node See from below, alertmanager-main pods and prometheus-k8s pods are recreated after attaching PVs, and allocated to worker nodes $ oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-f509119d-398e-11e9-8827-0e9060eacf7c 2Gi RWO gp2 62m alertmanager-main-db-alertmanager-main-1 Bound pvc-05d73872-398f-11e9-8827-0e9060eacf7c 2Gi RWO gp2 62m alertmanager-main-db-alertmanager-main-2 Bound pvc-166daa08-398f-11e9-8827-0e9060eacf7c 2Gi RWO gp2 61m prometheus-k8s-db-prometheus-k8s-0 Bound pvc-d16161d8-398e-11e9-8827-0e9060eacf7c 4Gi RWO gp2 63m prometheus-k8s-db-prometheus-k8s-1 Bound pvc-d16637ec-398e-11e9-8827-0e9060eacf7c 4Gi RWO gp2 63m $ oc -n openshift-monitoring get pod -o wide | grep -e alertmanager-main -e prometheus-k8s alertmanager-main-0 3/3 Running 0 116m 10.129.2.34 ip-10-0-174-68.us-east-2.compute.internal <none> alertmanager-main-1 3/3 Running 0 115m 10.128.2.11 ip-10-0-143-223.us-east-2.compute.internal <none> alertmanager-main-2 3/3 Running 0 115m 10.131.0.93 ip-10-0-146-225.us-east-2.compute.internal <none> prometheus-k8s-0 6/6 Running 1 117m 10.128.2.10 ip-10-0-143-223.us-east-2.compute.internal <none> prometheus-k8s-1 6/6 Running 1 117m 10.131.0.92 ip-10-0-146-225.us-east-2.compute.internal <none> $ oc get node -o wide | grep worker | awk '{print $1" "$3" "$6}' ip-10-0-143-223.us-east-2.compute.internal worker 10.0.143.223 ip-10-0-146-225.us-east-2.compute.internal worker 10.0.146.225 ip-10-0-174-68.us-east-2.compute.internal worker 10.0.174.68 See from the picture, "x509: certificate signed by unknown authority" for all the worker nodes BTW, due to Bug 1678645 is not fixed, used following to check targets $ prometheus_route=$(oc -n openshift-monitoring get route | grep prometheus-k8s | awk '{print $2}');curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://${prometheus_route}/targets > page_targets.html $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-02-25-194625 True False 6h32m Cluster version is 4.0.0-0.nightly-2019-02-25-194625 RHCOS build: 47.330
Created attachment 1538705 [details] "x509: certificate signed by unknown authority" for worker nodes
Add info for Comment 3, all targets are UP before attaching PVs for monitoring, there is not error "x509: certificate signed by unknown authority" for the 10250/metrics/cadvisor and 10250/metrics targets on worker node
You should be able to `kubectl port-forward` just fine to the Prometheus pod, for testing :) . Looking at the attachment, I find it striking, that this only applies to compute nodes. Did this maybe resolve itself after a few minutes? We may just need to wait for the kubelet serving certs CA to be (re-)mounted. Could you share the Prometheus StatefulSet as well as the content of the "openshift-monitoring/kubelet-serving-ca-bundle" and "openshift-config-managed/kubelet-serving-ca" ConfigMaps? Thanks!
For what it's worth, I just tested the exact same thing on an origin cluster, and was not able to reproduce. I feel like what you saw was an unrelated thing to this bug.
Created attachment 1538804 [details] info for Comment 6
BTW:PVs already attached to pod, such as volumes: - name: alertmanager-main-db persistentVolumeClaim: claimName: alertmanager-main-db-alertmanager-main-0
I agree having Kubernetes apply defaults (phase: Pending in status and creationTimestamp default) is a bit of a distraction, but the functionality works as expected. Should these beauty marks be an issue please file an RFE that we can schedule for later improvement. The TLS issue is distinct from using/provisioning persistence, and is being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1683913. Due to all of these facts, I'm moving this concrete issue to modified.
(In reply to Frederic Branczyk from comment #18) > I agree having Kubernetes apply defaults (phase: Pending in status and > creationTimestamp default) is a bit of a distraction, but the functionality > works as expected. Should these beauty marks be an issue please file an RFE > that we can schedule for later improvement. The TLS issue is distinct from > using/provisioning persistence, and is being tracked in > https://bugzilla.redhat.com/show_bug.cgi?id=1683913. Due to all of these > facts, I'm moving this concrete issue to modified. Agree, will verify this bug
RFE mentioned in Comment 18, please see bug 1684352 Since PVs now could attach to monitoring, close this issue $ for i in $(oc -n openshift-monitoring get pod | grep -e alertmanager-main -e prometheus-k8s | grep -v NAME |awk '{print $1}'); do echo $i; oc -n openshift-monitoring get po $i -oyaml | grep -i claim;done alertmanager-main-0 persistentVolumeClaim: claimName: alertmanager-main-db-alertmanager-main-0 alertmanager-main-1 persistentVolumeClaim: claimName: alertmanager-main-db-alertmanager-main-1 alertmanager-main-2 persistentVolumeClaim: claimName: alertmanager-main-db-alertmanager-main-2 prometheus-k8s-0 persistentVolumeClaim: claimName: prometheus-k8s-db-prometheus-k8s-0 prometheus-k8s-1 persistentVolumeClaim: claimName: prometheus-k8s-db-prometheus-k8s-1 $ oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-2adefe29-3bd1-11e9-8b6c-0ac2ab4d1ff2 2Gi RWO gp2 25m alertmanager-main-db-alertmanager-main-1 Bound pvc-3bb52eb9-3bd1-11e9-8b6c-0ac2ab4d1ff2 2Gi RWO gp2 24m alertmanager-main-db-alertmanager-main-2 Bound pvc-4c427826-3bd1-11e9-8b6c-0ac2ab4d1ff2 2Gi RWO gp2 24m prometheus-k8s-db-prometheus-k8s-0 Bound pvc-3208ccb8-3bd1-11e9-8b6c-0ac2ab4d1ff2 4Gi RWO gp2 24m prometheus-k8s-db-prometheus-k8s-1 Bound pvc-3214392e-3bd1-11e9-8b6c-0ac2ab4d1ff2 4Gi RWO gp2 24m $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-02-27-213933 True False 80m Cluster version is 4.0.0-0.nightly-2019-02-27-213933
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758
*** Bug 1801023 has been marked as a duplicate of this bug. ***