Description of problem: Metrics pods are in ImagePullBackOff status, this caused metrics can not work. Command ***** oc get pod -n openshift-infra ***** result as below: NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-w6ld9 0/1 ImagePullBackOff 0 1d hawkular-cassandra-2-h6766 0/1 ImagePullBackOff 0 1d hawkular-metrics-prg6t 0/1 ImagePullBackOff 0 1d heapster-zn413 0/1 ImagePullBackOff 0 1d Get pods info, found they were pull images from registry.reg-aws.openshift.com:443/online/ repo, is it right? image: registry.reg-aws.openshift.com:443/online/metrics-cassandra:v3.7.0-0.104.0 image: registry.reg-aws.openshift.com:443/online/metrics-hawkular-metrics:v3.7.0-0.104.0 image: registry.reg-aws.openshift.com:443/online/metrics-heapster:v3.7.0-0.104.0 Version-Release number of selected component (if applicable): oc v3.7.0-0.104.0 kubernetes v1.7.0+695f48a16f How reproducible: Always Steps to Reproduce: 1. oc get pod -n openshift-infra 2. 3. Actual results: Metrics pods are in ImagePullBackOff status Expected results: Metrics pods should be running well Additional info:
*** Bug 1499598 has been marked as a duplicate of this bug. ***
free-int cluster, metrics pods's status are running and metrics diagrams could be viewed from web console, delete [free-int] prefix from title, metrics pods in free-stg are still not in running status. Command ***** oc get pod -n openshift-infra ***** result as below: NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-89xr2 1/1 Running 4 3h hawkular-cassandra-2-2gdq1 1/1 Running 4 3h hawkular-metrics-xh95d 1/1 Running 1 3h heapster-hvlj8 1/1 Running 1 3h env: OpenShift Master:v3.7.0-0.147.0 (online version 3.6.0.38) Kubernetes Master:v1.7.6+a08f5eeb62
sending to Containers to investigate the error from runc in comment 6. Seems to pertain to cgroup settings wrt prestart hooks. Possibly related: https://github.com/opencontainers/runc/pull/1586 https://github.com/opencontainers/runc/pull/1239 (introduced the error case we are seeing)
could this be related to https://bugzilla.redhat.com/show_bug.cgi?id=1459826
Antonio, it is similar. Differences are: - this hawkular-cassandra doesn't have init containers - the error is "for procHooks" and not "for ready Here is that pastebin I sent on #aos for reference http://pastebin.test.redhat.com/524082
sjennings says this is because of clusterresourceoverride in master-config that is present on all starter clusters. https://github.com/openshift/origin/pull/16845 is suggested to disable this. Alternatively, we could have used the annotation: quota.openshift.io/cluster-resource-override-enabled: "false"
Metrics pods are in running status now Command ***** oc get pod -n openshift-infra ***** result as below: NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-0z34x 1/1 Running 0 3h hawkular-cassandra-2-g4svq 1/1 Running 4 3h hawkular-metrics-x2cjr 1/1 Running 0 3h heapster-5dq2q 1/1 Running 0 3h env: OpenShift Master:v3.7.0-0.143.1 (online version 3.6.0.35) Kubernetes Master:v1.7.0+80709908fd
metrics pods in free-int are not in running status now. free-stg is fine Command ***** oc get pod -n openshift-infra ***** result as below: NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-24qck 0/1 CrashLoopBackOff 63 11h hawkular-cassandra-2-xt6lg 0/1 CrashLoopBackOff 63 11h hawkular-metrics-k48g8 0/1 CrashLoopBackOff 128 11h heapster-v0421 0/1 ImagePullBackOff 0 11h env: OpenShift Master:v3.7.0-0.147.0 (online version 3.6.0.38) Kubernetes Master:v1.7.6+a08f5eeb62 reopen this defect, and add free-int prefix in title
Close this defect, use https://bugzilla.redhat.com/show_bug.cgi?id=1502924 to track issue on free-int