Bug 1460350 - [starter][starter-us-east-1] metrics failing to show in console
Summary: [starter][starter-us-east-1] metrics failing to show in console
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: Liming Zhou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-09 19:37 UTC by Steve Speicher
Modified: 2018-07-26 19:34 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-17 13:40:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
oc get pod heapster -o json (9.30 KB, text/plain)
2017-07-21 16:20 UTC, Eric Paris
no flags Details
console empty metrics graph (51.70 KB, image/png)
2017-07-25 00:39 UTC, Steve Speicher
no flags Details

Description Steve Speicher 2017-06-09 19:37:05 UTC
In using the webconsole: https://console.starter-us-east-1.openshift.com/console

Trying to view metrics for a pod, I'm seeing this error in the browser's console:

Response:

{"errorMsg":"Failed to perform operation due to an error: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.54.151:9042 (com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)))"}

Response headers:

HTTP/1.1 500 Internal Server Error
X-Powered-By: Undertow/1
Access-Control-Allow-Headers: origin,accept,content-type,hawkular-tenant,authorization
Server: JBoss-EAP/7
Date: Fri, 09 Jun 2017 19:15:10 GMT
Access-Control-Allow-Origin: https://console.starter-us-east-1.openshift.com
Access-Control-Allow-Credentials: true
Content-Type: application/json
Content-Length: 296
Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE, OPTIONS, HEAD
Access-Control-Max-Age: 259200
Set-Cookie: ebfa7d2b9ec400af3c79e2d068d9ce9b=14842d22ee85f2ac4e90aa05722bbee3; path=/; HttpOnly; Secure

Request headers:

POST /hawkular/metrics/metrics/stats/query HTTP/1.1
Host: metrics.starter-us-east-1.openshift.com
Connection: keep-alive
Content-Length: 150
Pragma: no-cache
Cache-Control: no-cache
Hawkular-Tenant: sspeiche1
Authorization: Bearer 
Origin: https://console.starter-us-east-1.openshift.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Content-Type: application/json
Accept: application/json
DNT: 1
Referer: https://console.starter-us-east-1.openshift.com/console/project/sspeiche1/overview
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8

Comment 1 Steve Speicher 2017-06-09 19:38:41 UTC
Following IRC, sounds like starter-us-east-1 is suffering both AWS API rate limiting and etcd pressures

Comment 2 Matt Wringe 2017-06-20 18:03:03 UTC
I can't seem to access starter-us-east-1 (I always get redirected to starter-us-east-2). Is this still an issue? Or was it resolved when the other issues in the cluster were fixed?

Comment 4 Steve Speicher 2017-07-19 01:26:09 UTC
Now that it has been upgraded to 3.6, just simply getting 503 error hitting https://metrics.starter-us-east-1.openshift.com/hawkular/metrics

Comment 5 Matt Wringe 2017-07-19 15:41:48 UTC
The 503 error you are seeing for https://metrics.starter-us-east-1.openshift.com/hawkular/metrics is coming from the router, not Hawkular Metrics. This either means the route is not properly configured or the Hawkular Metrics pod is not in the running state.

Can you please check what the state is of the metric pods in this cluster.

Comment 6 Eric Paris 2017-07-21 16:19:50 UTC
The problem is not the router. It is the pods. I have no idea what is wrong with the pods:

# oc get pod -n openshift-infra
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-j5v0k   1/1       Running   0          2d
hawkular-cassandra-2-khd50   1/1       Running   0          2d
hawkular-metrics-rcwjh       0/1       Running   423        2d
heapster-0w9r6               0/1       Running   441        2d

# oc logs -n openshift-infra heapster-0w9r6 | tail
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.

I have no idea how to debug heapster itself.

Comment 7 Eric Paris 2017-07-21 16:20:29 UTC
Created attachment 1302495 [details]
oc get pod heapster -o json

Comment 8 Steve Speicher 2017-07-25 00:39:48 UTC
Created attachment 1303937 [details]
console empty metrics graph

Comment 11 Matt Wringe 2017-07-25 14:04:07 UTC
(In reply to Eric Paris from comment #6)
> The problem is not the router. It is the pods. I have no idea what is wrong
> with the pods:
> 
> # oc get pod -n openshift-infra
> NAME                         READY     STATUS    RESTARTS   AGE
> hawkular-cassandra-1-j5v0k   1/1       Running   0          2d
> hawkular-cassandra-2-khd50   1/1       Running   0          2d
> hawkular-metrics-rcwjh       0/1       Running   423        2d
> heapster-0w9r6               0/1       Running   441        2d
> 

Who is monitoring this cluster? Obviously if hawkular-metrics has failed 423 times over the past 2 days something is really wrong. Why has it been restarting so many times?

> 
> I have no idea how to debug heapster itself.

Heapster is not the problem, it wont start until Hawkular Metrics has started. So we need to fix Hawkular Metrics first.

Comment 12 Robert Baumgartner 2017-08-10 08:06:27 UTC
I have same issue on https://console.starter-us-west-2.openshift.com/console, no metrics data...
The log shows a 204(no data) on the POST to https://metrics.starter-us-west-2.openshift.com/hawkular/metrics/metrics/stats/query with Body: {"tags":"descriptor_name:network/tx_rate|network/rx_rate,type:pod,pod_id:d513a826-7d3f-11e7-8490-0a69cdf75e6f","bucketDuration":"1mn","start":"-15mn"}

Comment 13 Matt Wringe 2017-08-10 13:56:47 UTC
(In reply to Robert Baumgartner from comment #12)
> I have same issue on
> https://console.starter-us-west-2.openshift.com/console, no metrics data...
> The log shows a 204(no data) on the POST to
> https://metrics.starter-us-west-2.openshift.com/hawkular/metrics/metrics/
> stats/query with Body:
> {"tags":"descriptor_name:network/tx_rate|network/rx_rate,type:pod,pod_id:
> d513a826-7d3f-11e7-8490-0a69cdf75e6f","bucketDuration":"1mn","start":"-15mn"}

metrics failing to show up in the console is a very basic error condition for any number of problems. Can you please open a new issue so that we can properly track it there? Otherwise we end up with the situation where one bugzilla ends up covering multiple issues and its difficult to keep track of what is happening.

Comment 14 Robert Baumgartner 2017-09-04 11:41:13 UTC
(In reply to Matt Wringe from comment #13)
> (In reply to Robert Baumgartner from comment #12)
> > I have same issue on
> > https://console.starter-us-west-2.openshift.com/console, no metrics data...
> > The log shows a 204(no data) on the POST to
> > https://metrics.starter-us-west-2.openshift.com/hawkular/metrics/metrics/
> > stats/query with Body:
> > {"tags":"descriptor_name:network/tx_rate|network/rx_rate,type:pod,pod_id:
> > d513a826-7d3f-11e7-8490-0a69cdf75e6f","bucketDuration":"1mn","start":"-15mn"}
> 
> metrics failing to show up in the console is a very basic error condition
> for any number of problems. Can you please open a new issue so that we can
> properly track it there? Otherwise we end up with the situation where one
> bugzilla ends up covering multiple issues and its difficult to keep track of
> what is happening.

done, https://bugzilla.redhat.com/show_bug.cgi?id=1480261

Comment 15 Matt Wringe 2017-10-17 13:40:19 UTC
Closing as I think this is out of date. If we still have an issue here, please re-open


Note You need to log in before you can comment on or make changes to this bug.