Bug 1338794 - Heapster was constantly restarted because the hawkular metrics pod was not ready
Summary: Heapster was constantly restarted because the hawkular metrics pod was not ready
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: chunchen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-23 12:34 UTC by Miheer Salunke
Modified: 2019-10-10 12:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-20 14:44:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Miheer Salunke 2016-05-23 12:34:31 UTC
Description of problem:
Heapster was constantly restarted because the hawkular metrics pod was not ready: 

[...]
[qxn7076@ose3adm ops]$ oc logs heapster-vfgt1
Starting Heapster with the following arguments: --source=kubernetes:https://kubernetes.default.svc:443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=WqwkwJMUf0031EE&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy --stats_resolution=30s
I0512 09:43:49.185681       1 heapster.go:60] heapster --source=kubernetes:https://kubernetes.default.svc:443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=WqwkwJMUf0031EE&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users=system:master-proxy --stats_resolution=30s
I0512 09:43:49.190906       1 heapster.go:61] Heapster version 0.18.0
I0512 09:43:49.191397       1 kube_factory.go:168] Using Kubernetes client with master "https://kubernetes.default.svc:443" and version "v1"
I0512 09:43:49.191412       1 kube_factory.go:169] Using kubelet port 10250
I0512 09:43:49.192312       1 driver.go:491] Initialised Hawkular Sink with parameters {_system https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=WqwkwJMUf0031EE&filter=label(container_name:^/system.slice.*|^/user.slice) 0xc20817eea0 }
I0512 09:43:50.592720       1 heapster.go:71] Starting heapster on port 8082
E0512 09:44:08.772517       1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet
E0512 09:44:38.796927       1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet
E0512 09:45:08.836620       1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet
E0512 09:45:38.874711       1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet
E0512 09:46:08.895800       1 model_handlers.go:620] unable to get pod list metric: the model is not populated yet
[qxn7076@ose3adm ops]
[...]

However, the hawkular metrics pod showed no errors and I had to manually restart it to make the metrics work again. 

Version-Release number of selected component (if applicable):
Openshift Enterprise 3.1

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Matt Wringe 2016-05-24 13:16:55 UTC
For 3.2 we have resolved this a bit by making the time in between reboots far longer, but we are still going to have a similar issue. If Heapster cannot properly connect to Hawkular Metrics after a certain grace period, then we consider this an error condition and restart the pod (just like how any pod should be restarted if it enters an error state).

For 3.2 we have also helped to make this easier by changing how the lifecycle of the pod functions and by having these error messages showing up in the events log (there are current edge cases in OpenShift where the old lifecycle handling did not function properly).

Heapster should have automatically connected to Hawkular Metrics once it was properly started though. Are you sure there wasn't any error messages in the Hawkular Metrics logs or that that the state was ready in the Hawkular Metrics status page? (eg by visiting https://HAWKULAR_METRICS_HOSTNAME/hawkular/metrics in a browser).

Comment 2 Matt Wringe 2016-07-20 14:44:24 UTC
Closing this as it been fixed in OSE 3.2


Note You need to log in before you can comment on or make changes to this bug.