Bug 1392105 - Hawkular-Metrics having issues communicating with Cassandra
Summary: Hawkular-Metrics having issues communicating with Cassandra
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.2.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: Peng Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-04 20:08 UTC by Eric Jones
Modified: 2019-12-16 07:19 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-11 20:29:26 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Eric Jones 2016-11-04 20:08:37 UTC
Description of problem:
Customer identified issue because Heapster was stuck in a crashloopbackoff state. Looking at the logs, heapster points to hawkular-metrics, and there don't appear to be any issues with cassandra (at least evident in the logs). 

Behavior appears to persist after scaling the components down to 0 and then back up (cassandra, then hawkular-metrics, then heapster).

Version-Release number of selected component (if applicable):
[root@<SYSTEM> ~]# openshift version
openshift v3.2.1.13-1-gc2a90e1
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

[root@<SYSTEM> ~]# oc get rc -o yaml |grep -i image
          image: registry.access.redhat.com/openshift3/metrics-cassandra:3.2.1
          imagePullPolicy: IfNotPresent
          image: registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.2.1
          imagePullPolicy: IfNotPresent
          image: registry.access.redhat.com/openshift3/metrics-heapster:3.2.1
          imagePullPolicy: IfNotPresent


Attaching logs from before scaling down and from after scaling back up shortly

Comment 2 Matt Wringe 2016-11-07 14:55:52 UTC
From the post_restart files, none of the logs indicate any errors.

From the Heapster logs, these are the logs when Heapster is first starting up, and as such they have not reached any sort of error state yet.

Can you please attach the logs for the failed Heapster pod? You can usually get this with 'oc logs -p $POD_NAME' where -p specifies to return the previous logs (which in a crashloopbackoff would be the one with the error message).

If you cannot get the previous logs, then can you please run 'oc logs -f $POD_NAME', this will 'follow' the logs as they are written and will gather the full logs up until the pod is restarted.

Also, having access to the events which have been happening during this time would also be very helpful.

Comment 6 Matt Wringe 2016-11-11 20:29:26 UTC
The user has reported that it is now working after fixing an ip tables issues and restarting.

As this is not something we have been able to reproduce or determine if its caused by something specific in OpenShift Metrics, closing as 'WORKSFORME'


Note You need to log in before you can comment on or make changes to this bug.