Description of problem: On Instaling a fresh OCP 3.11 cluster on OSP 14, I am seeing that several of the monitoring related pods are in CrashLoopBackOff [openshift@master-0 ~]$ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-znmb9 1/1 Running 0 1d default registry-console-1-g59d5 1/1 Running 0 1d default router-1-t6vtk 1/1 Running 0 1d kube-system master-api-master-0.openshift.example.com 1/1 Running 0 1d kube-system master-controllers-master-0.openshift.example.com 1/1 Running 0 1d kube-system master-etcd-master-0.openshift.example.com 1/1 Running 0 1d openshift-console console-66549ff897-rd95z 1/1 Running 0 1d openshift-infra bootstrap-autoapprover-0 1/1 Running 0 1d openshift-infra kuryr-cni-ds-bkvfx 2/2 Running 0 1d openshift-infra kuryr-cni-ds-fk8gg 2/2 Running 0 1d openshift-infra kuryr-cni-ds-fnb8n 2/2 Running 0 1d openshift-infra kuryr-cni-ds-nw2th 2/2 Running 0 1d openshift-infra kuryr-controller-7bdfdf4ddb-cw5gt 1/1 Running 0 1d openshift-monitoring alertmanager-main-0 3/3 Running 0 1d openshift-monitoring alertmanager-main-1 3/3 Running 0 1d openshift-monitoring alertmanager-main-2 3/3 Running 0 1d openshift-monitoring cluster-monitoring-operator-75c6b544dd-r2xxc 1/1 Running 0 1d openshift-monitoring grafana-c7d5bc87c-dl69x 2/2 Running 0 1d openshift-monitoring kube-state-metrics-c57bd9dfd-hlghv 1/3 CrashLoopBackOff 1078 1d openshift-monitoring node-exporter-7tbv5 1/2 CrashLoopBackOff 540 1d openshift-monitoring node-exporter-d8rcx 1/2 CrashLoopBackOff 534 1d openshift-monitoring node-exporter-ghxrx 1/2 CrashLoopBackOff 540 1d openshift-monitoring node-exporter-wdv79 1/2 CrashLoopBackOff 541 1d openshift-monitoring prometheus-k8s-0 4/4 Running 1 1d openshift-monitoring prometheus-k8s-1 4/4 Running 1 1d openshift-monitoring prometheus-operator-5b47ff445b-nz8sf 1/1 Running 0 1d openshift-node sync-lmksf 1/1 Running 0 1d openshift-node sync-ls6t5 1/1 Running 0 1d openshift-node sync-mnj4s 1/1 Running 0 1d openshift-node sync-qvw9z 1/1 Running 0 1d openshift-web-console webconsole-787f54c7f8-p2lcv 1/1 Running 0 1d Version-Release number of selected component (if applicable): OCP Enterprise 3.11 [openshift@master-0 ~]$ rpm -qa | grep openshift atomic-openshift-hyperkube-3.11.77-1.git.0.8baa0fb.el7.x86_64 atomic-openshift-clients-3.11.77-1.git.0.8baa0fb.el7.x86_64 atomic-openshift-3.11.77-1.git.0.8baa0fb.el7.x86_64 atomic-openshift-docker-excluder-3.11.77-1.git.0.8baa0fb.el7.noarch atomic-openshift-node-3.11.77-1.git.0.8baa0fb.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Deploy OCP 3.11 2. oc get pods --all-namespaces 3. Actual results: The monitoring pods are in CrashLoopBackOff Expected results: The pods should be in running state Additional info: [openshift@master-0 ~]$ oc get ev LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 1h 1d 526 kube-state-metrics-c57bd9dfd-hlghv.158047ad2fb3f7f1 Pod spec.containers{kube-rbac-proxy-main} Normal Pulled kubelet, infra-node-0.openshift.example.com Container image "registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy:v3.11" already present on machine 1h 1d 527 node-exporter-7tbv5.158047ad0b9a2181 Pod spec.containers{kube-rbac-proxy} Normal Pulled kubelet, app-node-0.openshift.example.com Container image "registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy:v3.11" already present on machine 45m 1d 533 node-exporter-wdv79.158047ad159af523 Pod spec.containers{kube-rbac-proxy} Normal Pulled kubelet, app-node-1.openshift.example.com Container image "registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy:v3.11" already present on machine 39m 1d 533 node-exporter-ghxrx.158047acf3e366bd Pod spec.containers{kube-rbac-proxy} Normal Pulled kubelet, infra-node-0.openshift.example.com Container image "registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy:v3.11" already present on machine 29m 1d 12648 kube-state-metrics-c57bd9dfd-hlghv.158047b45eab46e6 Pod spec.containers{kube-rbac-proxy-main} Warning BackOff kubelet, infra-node-0.openshift.example.com Back-off restarting failed container 5m 1d 12309 node-exporter-7tbv5.158047ae3c8dd05b Pod spec.containers{kube-rbac-proxy} Warning BackOff kubelet, app-node-0.openshift.example.com Back-off restarting failed container 5m 1d 12275 node-exporter-ghxrx.158047ae64407027 Pod spec.containers{kube-rbac-proxy} Warning BackOff kubelet, infra-node-0.openshift.example.com Back-off restarting failed container 4m 1d 12164 node-exporter-d8rcx.158047b0a6da0e9d Pod spec.containers{kube-rbac-proxy} Warning BackOff kubelet, master-0.openshift.example.com Back-off restarting failed container 4m 1d 12180 kube-state-metrics-c57bd9dfd-hlghv.158047b4e3ccfb49 Pod spec.containers{kube-rbac-proxy-self} Warning BackOff kubelet, infra-node-0.openshift.example.com Back-off restarting failed container 6s 1d 535 node-exporter-d8rcx.158047ae3e451bee Pod spec.containers{kube-rbac-proxy} Normal Pulled kubelet, master-0.openshift.example.com Container image "registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy:v3.11" already present on machine 3s 1d 12313 node-exporter-wdv79.158047ae4985686b Pod spec.containers{kube-rbac-proxy} Warning BackOff kubelet, app-node-1.openshift.example.com Back-off restarting failed container [openshift@master-0 ~]$ oc logs node-exporter-ghxrx -c kube-rbac-proxy F0206 20:28:10.780000 125990 main.go:240] failed to configure http2 server: http2: TLSConfig.CipherSuites index 11 contains an HTTP/2-approved cipher suite (0xc02f), but it comes after unapproved cipher suites. With this configuration, clients that don't support previous, approved cipher suites may be given an unapproved one and reject the connection. goroutine 1 [running]: github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.stacks(0xc420406b00, 0xc4202b4000, 0x163, 0x1b7) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:769 +0xcf github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).output(0x1a80580, 0xc400000003, 0xc4200c26e0, 0x19e36de, 0x7, 0xf0, 0x0) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:720 +0x32d github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).printf(0x1a80580, 0xc400000003, 0x1210047, 0x24, 0xc420457da0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:655 +0x14b github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.Fatalf(0x1210047, 0x24, 0xc420457da0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:1148 +0x67 main.main() /go/src/github.com/brancz/kube-rbac-proxy/main.go:240 +0x18fc [openshift@master-0 ~]$ oc logs kube-state-metrics-c57bd9dfd-hlghv -c kube-rbac-proxy-self F0206 20:30:04.979538 1 main.go:240] failed to configure http2 server: http2: TLSConfig.CipherSuites index 11 contains an HTTP/2-approved cipher suite (0xc02f), but it comes after unapproved cipher suites. With this configuration, clients that don't support previous, approved cipher suites may be given an unapproved one and reject the connection. goroutine 1 [running]: github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.stacks(0xc42040f800, 0xc4204ae000, 0x163, 0x1b7) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:769 +0xcf github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).output(0x1a80580, 0xc400000003, 0xc4200c2630, 0x19e36de, 0x7, 0xf0, 0x0) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:720 +0x32d github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).printf(0x1a80580, 0xc400000003, 0x1210047, 0x24, 0xc42014dda0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:655 +0x14b github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.Fatalf(0x1210047, 0x24, 0xc42014dda0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:1148 +0x67 main.main() /go/src/github.com/brancz/kube-rbac-proxy/main.go:240 +0x18fc
Could you please share a Pod manifest of one of the pods in question? Thanks!
https://gist.github.com/smalleni/921ee58df1bdb125da1a139656e6a797
Sorry about the inconvenience. What you are seeing is a defect that got in with this pull request: https://github.com/openshift/cluster-monitoring-operator/pull/210. This has already been fixed as of https://github.com/openshift/cluster-monitoring-operator/pull/225. We will just need to wait for the next OCP z stream release.
Tested with v3.11.82, issue is not fixed, # oc get pod -n openshift-monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 3/3 Running 0 12m alertmanager-main-1 3/3 Running 0 12m alertmanager-main-2 3/3 Running 0 12m cluster-monitoring-operator-548fc4f6d4-pmkfh 1/1 Running 0 13m grafana-69bb9997f5-ppswq 2/2 Running 0 13m kube-state-metrics-946b9f84d-s4hzr 1/3 CrashLoopBackOff 14 12m node-exporter-h7wlb 1/2 CrashLoopBackOff 7 12m node-exporter-nztr7 1/2 CrashLoopBackOff 7 12m node-exporter-zdnzg 1/2 CrashLoopBackOff 7 12m prometheus-k8s-0 4/4 Running 1 13m prometheus-k8s-1 4/4 Running 1 13m prometheus-operator-55bbdd949b-wq7bt 1/1 Running 0 13m
# oc -n openshift-monitoring logs node-exporter-h7wlb -c kube-rbac-proxy F0211 08:32:06.230291 75189 main.go:240] failed to configure http2 server: http2: TLSConfig.CipherSuites index 11 contains an HTTP/2-approved cipher suite (0xc02f), but it comes after unapproved cipher suites. With this configuration, clients that don't support previous, approved cipher suites may be given an unapproved one and reject the connection. goroutine 1 [running]: github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.stacks(0xc4202e6200, 0xc420326000, 0x163, 0x1b7) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:769 +0xcf github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).output(0x1a80580, 0xc400000003, 0xc4200de630, 0x19e36de, 0x7, 0xf0, 0x0) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:720 +0x32d github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).printf(0x1a80580, 0xc400000003, 0x1210047, 0x24, 0xc420319da0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:655 +0x14b github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.Fatalf(0x1210047, 0x24, 0xc420319da0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:1148 +0x67 main.main() /go/src/github.com/brancz/kube-rbac-proxy/main.go:240 +0x18fc **************************************************************************************************************************** # oc -n openshift-monitoring logs kube-state-metrics-946b9f84d-s4hzr -c kube-rbac-proxy-main F0211 08:33:48.829740 1 main.go:240] failed to configure http2 server: http2: TLSConfig.CipherSuites index 11 contains an HTTP/2-approved cipher suite (0xc02f), but it comes after unapproved cipher suites. With this configuration, clients that don't support previous, approved cipher suites may be given an unapproved one and reject the connection. goroutine 1 [running]: github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.stacks(0xc420070100, 0xc4205fc000, 0x163, 0x1b7) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:769 +0xcf github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).output(0x1a80580, 0xc400000003, 0xc4200de630, 0x19e36de, 0x7, 0xf0, 0x0) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:720 +0x32d github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).printf(0x1a80580, 0xc400000003, 0x1210047, 0x24, 0xc42032fda0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:655 +0x14b github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.Fatalf(0x1210047, 0x24, 0xc42032fda0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:1148 +0x67 main.main() /go/src/github.com/brancz/kube-rbac-proxy/main.go:240 +0x18fc **************************************************************************************************************************** # oc -n openshift-monitoring logs kube-state-metrics-946b9f84d-s4hzr -c kube-rbac-proxy-self F0211 08:33:49.430652 1 main.go:240] failed to configure http2 server: http2: TLSConfig.CipherSuites index 11 contains an HTTP/2-approved cipher suite (0xc02f), but it comes after unapproved cipher suites. With this configuration, clients that don't support previous, approved cipher suites may be given an unapproved one and reject the connection. goroutine 1 [running]: github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.stacks(0xc420411c00, 0xc420436000, 0x163, 0x1b7) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:769 +0xcf github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).output(0x1a80580, 0xc400000003, 0xc4200de630, 0x19e36de, 0x7, 0xf0, 0x0) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:720 +0x32d github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.(*loggingT).printf(0x1a80580, 0xc400000003, 0x1210047, 0x24, 0xc420317da0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:655 +0x14b github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog.Fatalf(0x1210047, 0x24, 0xc420317da0, 0x1, 0x1) /go/src/github.com/brancz/kube-rbac-proxy/vendor/github.com/golang/glog/glog.go:1148 +0x67 main.main() /go/src/github.com/brancz/kube-rbac-proxy/main.go:240 +0x18fc
@Junqi could you share one of the pod manifests? Just to check if the patch [1] mentioned above ever made it into the images. [1] https://github.com/openshift/cluster-monitoring-operator/pull/225/files
Looking into this some more, the described patch did not make it into the binary properly due to an issue in our build system. https://github.com/openshift/cluster-monitoring-operator/pull/241 should fix this.
I just validated the fix on Openshift 3.11. Once this got another code review we will go ahead and merge.
https://github.com/openshift/cluster-monitoring-operator/pull/241 is merged. Would you mind taking another look Junqi?
Tested with ose-cluster-monitoring-operator:v3.11.82-4, issue is fixed and cluster monitoring works well # oc -n openshift-monitoring get po NAME READY STATUS RESTARTS AGE alertmanager-main-0 3/3 Running 0 18m alertmanager-main-1 3/3 Running 0 18m alertmanager-main-2 3/3 Running 0 17m cluster-monitoring-operator-98f84d4dd-st5v7 1/1 Running 0 25m grafana-7fb8d6b4bf-b7nqs 2/2 Running 0 22m kube-state-metrics-9bf978578-z6pwz 3/3 Running 0 16m node-exporter-887fk 2/2 Running 0 17m node-exporter-mfdvx 2/2 Running 0 17m node-exporter-w6fpx 2/2 Running 0 17m prometheus-k8s-0 4/4 Running 1 21m prometheus-k8s-1 4/4 Running 1 20m prometheus-operator-544d79d996-gmhnb 1/1 Running 0 24m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0326