Bug 1484679 - [free-stg] Metrics does not work
Summary: [free-stg] Metrics does not work
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 3.x
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.x
Assignee: Antonio Murdaca
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1499598 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-24 06:20 UTC by Junqi Zhao
Modified: 2017-11-09 18:54 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-09 18:54:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Junqi Zhao 2017-08-24 06:20:43 UTC
Description of problem:
Metrics pods are in ImagePullBackOff status, this caused metrics can not work.
Command ***** oc get pod -n openshift-infra ***** result as below:

NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-w6ld9   0/1       ImagePullBackOff   0          1d
hawkular-cassandra-2-h6766   0/1       ImagePullBackOff   0          1d
hawkular-metrics-prg6t       0/1       ImagePullBackOff   0          1d
heapster-zn413               0/1       ImagePullBackOff   0          1d

Get pods info, found they were pull images from registry.reg-aws.openshift.com:443/online/ repo, is it right?
image: registry.reg-aws.openshift.com:443/online/metrics-cassandra:v3.7.0-0.104.0
image: registry.reg-aws.openshift.com:443/online/metrics-hawkular-metrics:v3.7.0-0.104.0
image: registry.reg-aws.openshift.com:443/online/metrics-heapster:v3.7.0-0.104.0

Version-Release number of selected component (if applicable):
oc v3.7.0-0.104.0
kubernetes v1.7.0+695f48a16f


How reproducible:
Always

Steps to Reproduce:
1. oc get pod -n openshift-infra
2.
3.

Actual results:
Metrics pods are in ImagePullBackOff status

Expected results:
Metrics pods should be running well

Additional info:

Comment 2 Junqi Zhao 2017-10-09 08:15:45 UTC
*** Bug 1499598 has been marked as a duplicate of this bug. ***

Comment 8 Junqi Zhao 2017-10-12 06:30:01 UTC
free-int cluster, metrics pods's status are running and metrics diagrams could be viewed from web console, delete [free-int] prefix from title, metrics pods in free-stg are still not in running status.
 
Command ***** oc get pod -n openshift-infra ***** result as below:

NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-89xr2   1/1       Running   4          3h
hawkular-cassandra-2-2gdq1   1/1       Running   4          3h
hawkular-metrics-xh95d       1/1       Running   1          3h
heapster-hvlj8               1/1       Running   1          3h

env:
OpenShift Master:v3.7.0-0.147.0 (online version 3.6.0.38)
Kubernetes Master:v1.7.6+a08f5eeb62

Comment 11 Seth Jennings 2017-10-12 13:57:36 UTC
sending to Containers to investigate the error from runc in comment 6.  Seems to pertain to cgroup settings wrt prestart hooks.

Possibly related:
https://github.com/opencontainers/runc/pull/1586
https://github.com/opencontainers/runc/pull/1239 (introduced the error case we are seeing)

Comment 12 Antonio Murdaca 2017-10-12 14:31:22 UTC
could this be related to https://bugzilla.redhat.com/show_bug.cgi?id=1459826

Comment 13 Seth Jennings 2017-10-12 15:07:21 UTC
Antonio, it is similar.  Differences are:

- this hawkular-cassandra doesn't have init containers
- the error is "for procHooks" and not "for ready

Here is that pastebin I sent on #aos for reference
http://pastebin.test.redhat.com/524082

Comment 17 Justin Pierce 2017-10-12 19:45:40 UTC
sjennings says this is because of clusterresourceoverride in master-config that is present on all starter clusters.

https://github.com/openshift/origin/pull/16845 is suggested to disable this. 

Alternatively, we could have used the annotation:
quota.openshift.io/cluster-resource-override-enabled: "false"

Comment 19 Junqi Zhao 2017-10-13 03:33:03 UTC
Metrics pods are in running status now

Command ***** oc get pod -n openshift-infra ***** result as below:

NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-0z34x   1/1       Running   0          3h
hawkular-cassandra-2-g4svq   1/1       Running   4          3h
hawkular-metrics-x2cjr       1/1       Running   0          3h
heapster-5dq2q               1/1       Running   0          3h

env:
OpenShift Master:v3.7.0-0.143.1 (online version 3.6.0.35)
Kubernetes Master:v1.7.0+80709908fd

Comment 20 Junqi Zhao 2017-10-17 03:29:13 UTC
metrics pods in free-int are  not in running status now. free-stg is fine
Command ***** oc get pod -n openshift-infra ***** result as below:

NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-24qck   0/1       CrashLoopBackOff   63         11h
hawkular-cassandra-2-xt6lg   0/1       CrashLoopBackOff   63         11h
hawkular-metrics-k48g8       0/1       CrashLoopBackOff   128        11h
heapster-v0421               0/1       ImagePullBackOff   0          11h

env:
OpenShift Master:v3.7.0-0.147.0 (online version 3.6.0.38)
Kubernetes Master:v1.7.6+a08f5eeb62 

reopen this defect, and add free-int prefix in title

Comment 22 Junqi Zhao 2017-10-17 04:14:02 UTC
Close this defect, use https://bugzilla.redhat.com/show_bug.cgi?id=1502924 to track issue on free-int


Note You need to log in before you can comment on or make changes to this bug.