Hide Forgot
Description of problem: Customer identified issue because Heapster was stuck in a crashloopbackoff state. Looking at the logs, heapster points to hawkular-metrics, and there don't appear to be any issues with cassandra (at least evident in the logs). Behavior appears to persist after scaling the components down to 0 and then back up (cassandra, then hawkular-metrics, then heapster). Version-Release number of selected component (if applicable): [root@<SYSTEM> ~]# openshift version openshift v3.2.1.13-1-gc2a90e1 kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5 [root@<SYSTEM> ~]# oc get rc -o yaml |grep -i image image: registry.access.redhat.com/openshift3/metrics-cassandra:3.2.1 imagePullPolicy: IfNotPresent image: registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.2.1 imagePullPolicy: IfNotPresent image: registry.access.redhat.com/openshift3/metrics-heapster:3.2.1 imagePullPolicy: IfNotPresent Attaching logs from before scaling down and from after scaling back up shortly
From the post_restart files, none of the logs indicate any errors. From the Heapster logs, these are the logs when Heapster is first starting up, and as such they have not reached any sort of error state yet. Can you please attach the logs for the failed Heapster pod? You can usually get this with 'oc logs -p $POD_NAME' where -p specifies to return the previous logs (which in a crashloopbackoff would be the one with the error message). If you cannot get the previous logs, then can you please run 'oc logs -f $POD_NAME', this will 'follow' the logs as they are written and will gather the full logs up until the pod is restarted. Also, having access to the events which have been happening during this time would also be very helpful.
The user has reported that it is now working after fixing an ip tables issues and restarting. As this is not something we have been able to reproduce or determine if its caused by something specific in OpenShift Metrics, closing as 'WORKSFORME'