Bug 1615732
| Summary: | prometheus-operator ReplicaSet has timed out progressing | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||||
| Component: | Monitoring | Assignee: | Frederic Branczyk <fbranczy> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 3.11.0 | CC: | xtian | ||||||
| Target Milestone: | --- | Keywords: | TestBlocker | ||||||
| Target Release: | 3.11.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-10-11 07:24:39 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1475755 [details]
installation log
Could you also share the logs of the Prometheus Operator? (In reply to Frederic Branczyk from comment #2) > Could you also share the logs of the Prometheus Operator? # kubectl logs prometheus-operator-c7dd5cb69-vc85r standard_init_linux.go:178: exec user process caused "operation not permitted" It seems it is the same issue with https://github.com/google/metallb/issues/21 # docker ps -a | grep operator 83ea8c727627 6313079d656b "/usr/bin/operator..." 4 minutes ago Exited (1) 4 minutes ago k8s_prometheus-operator_prometheus-operator-c7dd5cb69-vc85r_openshift-monitoring_429902ea-a039-11e8-8c6d-42010af00009_29 68785f345171 registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.11.0-0.14.0 "/usr/bin/pod" 2 hours ago Up 2 hours k8s_POD_prometheus-operator-c7dd5cb69-vc85r_openshift-monitoring_429902ea-a039-11e8-8c6d-42010af00009_0 # docker logs 83ea8c727627 standard_init_linux.go:178: exec user process caused "operation not permitted" # docker version Client: Version: 1.13.1 API version: 1.26 Package version: <unknown> Go version: go1.8.3 Git commit: 774336d/1.13.1 Built: Tue Feb 20 13:46:34 2018 OS/Arch: linux/amd64 Server: Version: 1.13.1 API version: 1.26 (minimum version 1.12) Package version: <unknown> Go version: go1.8.3 Git commit: 774336d/1.13.1 Built: Tue Feb 20 13:46:34 2018 OS/Arch: linux/amd64 Experimental: false We just merged https://github.com/openshift/cluster-monitoring-operator/pull/67, so this should be fixed in the next 3.11 build. Issue is fixed with the fix, but kube-state-metrics pod/service/deployment/replicaset are not created, defect is tracked in Bug 1617695 Issue is fixed in ose-prometheus-operator-v3.11.0-0.17.0.0 # openshift version openshift v3.11.0-0.17.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |
Created attachment 1475753 [details] prometheus-operator pod in CrashLoopBackOff status Description of problem: Deploy cluster monitoring, prometheus-operator pod is in CrashLoopBackOff status. This blocks cluster monitoring installation now. # kubectl -n openshift-monitoring get pod NAME READY STATUS RESTARTS AGE cluster-monitoring-operator-9f7578d96-c2m8p 1/1 Running 0 49m prometheus-operator-9f6cffdb-vrrtf 0/1 CrashLoopBackOff 13 47m # kubectl -n openshift-monitoring get deploy prometheus-operator -o yaml status: conditions: - lastTransitionTime: 2018-08-14T05:31:33Z lastUpdateTime: 2018-08-14T05:31:33Z message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: 2018-08-14T05:41:34Z lastUpdateTime: 2018-08-14T05:41:34Z message: ReplicaSet "prometheus-operator-9f6cffdb" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 4 replicas: 1 unavailableReplicas: 1 updatedReplicas: 1 The installation log also showed the ServiceMonitor CRD was not created Version-Release number of selected component (if applicable): ose-prometheus-operator:v3.11.0-0.14.0.0 How reproducible: Always Steps to Reproduce: 1. Deploy cluster monitoring 2. 3. Actual results: prometheus-operator pod in CrashLoopBackOff status Expected results: prometheus-operator pod should be OK Additional info: # parameters openshift_cluster_monitoring_operator_install=true openshift_cluster_monitoring_operator_node_selector={'role': 'node'}