Description of problem: Heapster was constantly restarted because the hawkular metrics pod was not ready: [...] [qxn7076@ose3adm ops]$ oc logs heapster-vfgt1 Starting Heapster with the following arguments: --source=kubernetes:https://kubernetes.default.svc:443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=WqwkwJMUf0031EE&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy --stats_resolution=30s I0512 09:43:49.185681 1 heapster.go:60] heapster --source=kubernetes:https://kubernetes.default.svc:443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=WqwkwJMUf0031EE&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy --stats_resolution=30s I0512 09:43:49.190906 1 heapster.go:61] Heapster version 0.18.0 I0512 09:43:49.191397 1 kube_factory.go:168] Using Kubernetes client with master "https://kubernetes.default.svc:443" and version "v1" I0512 09:43:49.191412 1 kube_factory.go:169] Using kubelet port 10250 I0512 09:43:49.192312 1 driver.go:491] Initialised Hawkular Sink with parameters {_system https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=WqwkwJMUf0031EE&filter=label(container_name:^/system.slice.*|^/user.slice) 0xc20817eea0 } I0512 09:43:50.592720 1 heapster.go:71] Starting heapster on port 8082 E0512 09:44:08.772517 1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet E0512 09:44:38.796927 1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet E0512 09:45:08.836620 1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet E0512 09:45:38.874711 1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet E0512 09:46:08.895800 1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet [qxn7076@ose3adm ops] [...] However, the hawkular metrics pod showed no errors and I had to manually restart it to make the metrics work again. Version-Release number of selected component (if applicable): Openshift Enterprise 3.1 How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
For 3.2 we have resolved this a bit by making the time in between reboots far longer, but we are still going to have a similar issue. If Heapster cannot properly connect to Hawkular Metrics after a certain grace period, then we consider this an error condition and restart the pod (just like how any pod should be restarted if it enters an error state). For 3.2 we have also helped to make this easier by changing how the lifecycle of the pod functions and by having these error messages showing up in the events log (there are current edge cases in OpenShift where the old lifecycle handling did not function properly). Heapster should have automatically connected to Hawkular Metrics once it was properly started though. Are you sure there wasn't any error messages in the Hawkular Metrics logs or that that the state was ready in the Hawkular Metrics status page? (eg by visiting https://HAWKULAR_METRICS_HOSTNAME/hawkular/metrics in a browser).
Closing this as it been fixed in OSE 3.2