Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1615732

Summary:

prometheus-operator ReplicaSet has timed out progressing

Product:

OpenShift Container Platform

Reporter:

Junqi Zhao <juzhao>

Component:

Monitoring

Assignee:

Frederic Branczyk <fbranczy>

Status:

CLOSED ERRATA

QA Contact:

Junqi Zhao <juzhao>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.11.0

CC:

xtian

Target Milestone:

---

Keywords:

TestBlocker

Target Release:

3.11.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-10-11 07:24:39 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
prometheus-operator pod in CrashLoopBackOff status	none
installation log	none

Description Junqi Zhao 2018-08-14 06:54:09 UTC

Created attachment 1475753 [details]
prometheus-operator pod in CrashLoopBackOff status

Description of problem:
Deploy cluster monitoring, prometheus-operator pod is in CrashLoopBackOff status. This blocks cluster monitoring installation now.

# kubectl -n openshift-monitoring get pod
NAME                                          READY     STATUS             RESTARTS   AGE
cluster-monitoring-operator-9f7578d96-c2m8p   1/1       Running            0          49m
prometheus-operator-9f6cffdb-vrrtf            0/1       CrashLoopBackOff   13         47m

# kubectl -n openshift-monitoring get deploy prometheus-operator -o yaml
status:
  conditions:
  - lastTransitionTime: 2018-08-14T05:31:33Z
    lastUpdateTime: 2018-08-14T05:31:33Z
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: 2018-08-14T05:41:34Z
    lastUpdateTime: 2018-08-14T05:41:34Z
    message: ReplicaSet "prometheus-operator-9f6cffdb" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 4
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

The installation log also showed the ServiceMonitor CRD was not created
Version-Release number of selected component (if applicable):
ose-prometheus-operator:v3.11.0-0.14.0.0

How reproducible:
Always

Steps to Reproduce:
1. Deploy cluster monitoring
2.
3.

Actual results:
prometheus-operator pod in CrashLoopBackOff status

Expected results:
prometheus-operator pod should be OK

Additional info:
# parameters
openshift_cluster_monitoring_operator_install=true
openshift_cluster_monitoring_operator_node_selector={'role': 'node'}

Comment 1 Junqi Zhao 2018-08-14 06:56:29 UTC

Created attachment 1475755 [details]
installation log

Comment 2 Frederic Branczyk 2018-08-14 09:44:26 UTC

Could you also share the logs of the Prometheus Operator?

Comment 3 Junqi Zhao 2018-08-15 04:18:24 UTC

(In reply to Frederic Branczyk from comment #2)
> Could you also share the logs of the Prometheus Operator?

# kubectl logs prometheus-operator-c7dd5cb69-vc85r
standard_init_linux.go:178: exec user process caused "operation not permitted"

Comment 4 Junqi Zhao 2018-08-15 04:19:52 UTC

It seems it is the same issue with
https://github.com/google/metallb/issues/21

Comment 5 Junqi Zhao 2018-08-15 05:24:35 UTC

# docker ps -a | grep operator
83ea8c727627        6313079d656b                                                                                                                                       "/usr/bin/operator..."   4 minutes ago       Exited (1) 4 minutes ago                       k8s_prometheus-operator_prometheus-operator-c7dd5cb69-vc85r_openshift-monitoring_429902ea-a039-11e8-8c6d-42010af00009_29
68785f345171        registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.11.0-0.14.0                                                                               "/usr/bin/pod"           2 hours ago         Up 2 hours                                     k8s_POD_prometheus-operator-c7dd5cb69-vc85r_openshift-monitoring_429902ea-a039-11e8-8c6d-42010af00009_0
# docker logs 83ea8c727627
standard_init_linux.go:178: exec user process caused "operation not permitted"


# docker version
Client:
 Version:         1.13.1
 API version:     1.26
 Package version: <unknown>
 Go version:      go1.8.3
 Git commit:      774336d/1.13.1
 Built:           Tue Feb 20 13:46:34 2018
 OS/Arch:         linux/amd64

Server:
 Version:         1.13.1
 API version:     1.26 (minimum version 1.12)
 Package version: <unknown>
 Go version:      go1.8.3
 Git commit:      774336d/1.13.1
 Built:           Tue Feb 20 13:46:34 2018
 OS/Arch:         linux/amd64
 Experimental:    false

Comment 6 Frederic Branczyk 2018-08-15 08:55:18 UTC

We just merged https://github.com/openshift/cluster-monitoring-operator/pull/67, so this should be fixed in the next 3.11 build.

Comment 7 Junqi Zhao 2018-08-16 04:27:40 UTC

Issue is fixed with the fix, but kube-state-metrics pod/service/deployment/replicaset are not created, defect is tracked in Bug 1617695

Comment 8 Junqi Zhao 2018-08-20 05:56:33 UTC

Issue is fixed in ose-prometheus-operator-v3.11.0-0.17.0.0


# openshift version
openshift v3.11.0-0.17.0

Comment 10 errata-xmlrpc 2018-10-11 07:24:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652