Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1615732 - prometheus-operator ReplicaSet has timed out progressing
prometheus-operator ReplicaSet has timed out progressing
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring (Show other bugs)
3.11.0
Unspecified Unspecified
high Severity high
: ---
: 3.11.0
Assigned To: Frederic Branczyk
Junqi Zhao
: TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-08-14 02:54 EDT by Junqi Zhao
Modified: 2018-10-11 03:25 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-11 03:24:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
prometheus-operator pod in CrashLoopBackOff status (9.27 KB, text/plain)
2018-08-14 02:54 EDT, Junqi Zhao
no flags Details
installation log (417.49 KB, text/plain)
2018-08-14 02:56 EDT, Junqi Zhao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 None None None 2018-10-11 03:25 EDT

  None (edit)
Description Junqi Zhao 2018-08-14 02:54:09 EDT
Created attachment 1475753 [details]
prometheus-operator pod in CrashLoopBackOff status

Description of problem:
Deploy cluster monitoring, prometheus-operator pod is in CrashLoopBackOff status. This blocks cluster monitoring installation now.

# kubectl -n openshift-monitoring get pod
NAME                                          READY     STATUS             RESTARTS   AGE
cluster-monitoring-operator-9f7578d96-c2m8p   1/1       Running            0          49m
prometheus-operator-9f6cffdb-vrrtf            0/1       CrashLoopBackOff   13         47m

# kubectl -n openshift-monitoring get deploy prometheus-operator -o yaml
status:
  conditions:
  - lastTransitionTime: 2018-08-14T05:31:33Z
    lastUpdateTime: 2018-08-14T05:31:33Z
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: 2018-08-14T05:41:34Z
    lastUpdateTime: 2018-08-14T05:41:34Z
    message: ReplicaSet "prometheus-operator-9f6cffdb" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 4
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

The installation log also showed the ServiceMonitor CRD was not created
Version-Release number of selected component (if applicable):
ose-prometheus-operator:v3.11.0-0.14.0.0

How reproducible:
Always

Steps to Reproduce:
1. Deploy cluster monitoring
2.
3.

Actual results:
prometheus-operator pod in CrashLoopBackOff status

Expected results:
prometheus-operator pod should be OK

Additional info:
# parameters
openshift_cluster_monitoring_operator_install=true
openshift_cluster_monitoring_operator_node_selector={'role': 'node'}
Comment 1 Junqi Zhao 2018-08-14 02:56 EDT
Created attachment 1475755 [details]
installation log
Comment 2 Frederic Branczyk 2018-08-14 05:44:26 EDT
Could you also share the logs of the Prometheus Operator?
Comment 3 Junqi Zhao 2018-08-15 00:18:24 EDT
(In reply to Frederic Branczyk from comment #2)
> Could you also share the logs of the Prometheus Operator?

# kubectl logs prometheus-operator-c7dd5cb69-vc85r
standard_init_linux.go:178: exec user process caused "operation not permitted"
Comment 4 Junqi Zhao 2018-08-15 00:19:52 EDT
It seems it is the same issue with
https://github.com/google/metallb/issues/21
Comment 5 Junqi Zhao 2018-08-15 01:24:35 EDT
# docker ps -a | grep operator
83ea8c727627        6313079d656b                                                                                                                                       "/usr/bin/operator..."   4 minutes ago       Exited (1) 4 minutes ago                       k8s_prometheus-operator_prometheus-operator-c7dd5cb69-vc85r_openshift-monitoring_429902ea-a039-11e8-8c6d-42010af00009_29
68785f345171        registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.11.0-0.14.0                                                                               "/usr/bin/pod"           2 hours ago         Up 2 hours                                     k8s_POD_prometheus-operator-c7dd5cb69-vc85r_openshift-monitoring_429902ea-a039-11e8-8c6d-42010af00009_0
# docker logs 83ea8c727627
standard_init_linux.go:178: exec user process caused "operation not permitted"


# docker version
Client:
 Version:         1.13.1
 API version:     1.26
 Package version: <unknown>
 Go version:      go1.8.3
 Git commit:      774336d/1.13.1
 Built:           Tue Feb 20 13:46:34 2018
 OS/Arch:         linux/amd64

Server:
 Version:         1.13.1
 API version:     1.26 (minimum version 1.12)
 Package version: <unknown>
 Go version:      go1.8.3
 Git commit:      774336d/1.13.1
 Built:           Tue Feb 20 13:46:34 2018
 OS/Arch:         linux/amd64
 Experimental:    false
Comment 6 Frederic Branczyk 2018-08-15 04:55:18 EDT
We just merged https://github.com/openshift/cluster-monitoring-operator/pull/67, so this should be fixed in the next 3.11 build.
Comment 7 Junqi Zhao 2018-08-16 00:27:40 EDT
Issue is fixed with the fix, but kube-state-metrics pod/service/deployment/replicaset are not created, defect is tracked in Bug 1617695
Comment 8 Junqi Zhao 2018-08-20 01:56:33 EDT
Issue is fixed in ose-prometheus-operator-v3.11.0-0.17.0.0


# openshift version
openshift v3.11.0-0.17.0
Comment 10 errata-xmlrpc 2018-10-11 03:24:39 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Note You need to log in before you can comment on or make changes to this bug.