Created attachment 1475753 [details] prometheus-operator pod in CrashLoopBackOff status Description of problem: Deploy cluster monitoring, prometheus-operator pod is in CrashLoopBackOff status. This blocks cluster monitoring installation now. # kubectl -n openshift-monitoring get pod NAME READY STATUS RESTARTS AGE cluster-monitoring-operator-9f7578d96-c2m8p 1/1 Running 0 49m prometheus-operator-9f6cffdb-vrrtf 0/1 CrashLoopBackOff 13 47m # kubectl -n openshift-monitoring get deploy prometheus-operator -o yaml status: conditions: - lastTransitionTime: 2018-08-14T05:31:33Z lastUpdateTime: 2018-08-14T05:31:33Z message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: 2018-08-14T05:41:34Z lastUpdateTime: 2018-08-14T05:41:34Z message: ReplicaSet "prometheus-operator-9f6cffdb" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 4 replicas: 1 unavailableReplicas: 1 updatedReplicas: 1 The installation log also showed the ServiceMonitor CRD was not created Version-Release number of selected component (if applicable): ose-prometheus-operator:v3.11.0-0.14.0.0 How reproducible: Always Steps to Reproduce: 1. Deploy cluster monitoring 2. 3. Actual results: prometheus-operator pod in CrashLoopBackOff status Expected results: prometheus-operator pod should be OK Additional info: # parameters openshift_cluster_monitoring_operator_install=true openshift_cluster_monitoring_operator_node_selector={'role': 'node'}
Created attachment 1475755 [details] installation log
Could you also share the logs of the Prometheus Operator?
(In reply to Frederic Branczyk from comment #2) > Could you also share the logs of the Prometheus Operator? # kubectl logs prometheus-operator-c7dd5cb69-vc85r standard_init_linux.go:178: exec user process caused "operation not permitted"
It seems it is the same issue with https://github.com/google/metallb/issues/21
# docker ps -a | grep operator 83ea8c727627 6313079d656b "/usr/bin/operator..." 4 minutes ago Exited (1) 4 minutes ago k8s_prometheus-operator_prometheus-operator-c7dd5cb69-vc85r_openshift-monitoring_429902ea-a039-11e8-8c6d-42010af00009_29 68785f345171 registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.11.0-0.14.0 "/usr/bin/pod" 2 hours ago Up 2 hours k8s_POD_prometheus-operator-c7dd5cb69-vc85r_openshift-monitoring_429902ea-a039-11e8-8c6d-42010af00009_0 # docker logs 83ea8c727627 standard_init_linux.go:178: exec user process caused "operation not permitted" # docker version Client: Version: 1.13.1 API version: 1.26 Package version: <unknown> Go version: go1.8.3 Git commit: 774336d/1.13.1 Built: Tue Feb 20 13:46:34 2018 OS/Arch: linux/amd64 Server: Version: 1.13.1 API version: 1.26 (minimum version 1.12) Package version: <unknown> Go version: go1.8.3 Git commit: 774336d/1.13.1 Built: Tue Feb 20 13:46:34 2018 OS/Arch: linux/amd64 Experimental: false
We just merged https://github.com/openshift/cluster-monitoring-operator/pull/67, so this should be fixed in the next 3.11 build.
Issue is fixed with the fix, but kube-state-metrics pod/service/deployment/replicaset are not created, defect is tracked in Bug 1617695
Issue is fixed in ose-prometheus-operator-v3.11.0-0.17.0.0 # openshift version openshift v3.11.0-0.17.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652