Bug 1480261 - [starter][starter-us-west-2] metrics failing to show in console
Summary: [starter][starter-us-west-2] metrics failing to show in console
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ruben Vargas Palma
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-10 14:21 UTC by Robert Baumgartner
Modified: 2018-09-03 12:33 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 19:16:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
console empty metrics data (60.79 KB, image/png)
2017-08-10 14:21 UTC, Robert Baumgartner
no flags Details
console metrics error (81.41 KB, image/png)
2017-09-04 09:03 UTC, Robert Baumgartner
no flags Details
Metrics are not available (85.28 KB, image/png)
2017-09-04 09:04 UTC, Robert Baumgartner
no flags Details
Overview with no values (54.17 KB, image/png)
2017-09-11 14:38 UTC, Robert Baumgartner
no flags Details
Pod with no values (55.56 KB, image/png)
2017-09-11 14:39 UTC, Robert Baumgartner
no flags Details

Description Robert Baumgartner 2017-08-10 14:21:54 UTC
Created attachment 1311786 [details]
console empty metrics data

In using the webconsole: https://console.starter-us-west-2.openshift.com/console, no metrics data...
The log(browser) shows a 204(no data) on the POST to https://metrics.starter-us-west-2.openshift.com/hawkular/metrics/metrics/stats/query with Body: {"tags":"descriptor_name:network/tx_rate|network/rx_rate,type:pod,pod_id:d513a826-7d3f-11e7-8490-0a69cdf75e6f","bucketDuration":"1mn","start":"-15mn"}

Comment 1 Matt Wringe 2017-08-16 18:57:02 UTC
It looks like the router is bad right now.

It returns that https://metrics.starter-us-west-2.openshift.com/hawkular/metrics/metrics doesn't have any running pods, but oc get pods show that hawkular metrics is in the 1/1 state.

Comment 2 Steve Speicher 2017-08-21 01:24:23 UTC
Is there anyone looking into the router issue? Metrics is failing for me too. Though routes work fine on other applications.

Comment 3 Robert Baumgartner 2017-09-04 09:01:08 UTC
Problem still unsolved...

Comment 4 Robert Baumgartner 2017-09-04 09:03:13 UTC
Created attachment 1321736 [details]
console metrics error

Comment 5 Robert Baumgartner 2017-09-04 09:04:21 UTC
Created attachment 1321737 [details]
Metrics are not available

Comment 6 Robin Williams 2017-09-09 09:19:17 UTC
Metrics came up for a bit after the platform update, but have now died again.

Comment 7 Matt Wringe 2017-09-11 14:25:32 UTC
I don't have access to the west-2 environment (if someone could let me know how to get access that would be great) so I can't tell if its working in the console or not.

Somehow the Cassandra pods got out of sync and they were no longer able to communicate with each other, normally they should just resolve this on their own, but for some reason they did not in this instance.

Because the Cassandra pods were out of sync, the Hawkular Metrics pod was unable to connect to them and was failing.

I restarted the pods and now the metric pods are all back in the 1/1 state and metric should soon be flowing back.

We need to figure out how the Cassandra pods were not able to form a cluster when they were brought back online.

Comment 8 Matt Wringe 2017-09-11 14:26:03 UTC
Can you confirm if metrics are working again in the console?

Comment 9 Robin Williams 2017-09-11 14:37:45 UTC
The service is responding now (it was not before), but metrics appear to be showing zero usage.

Comment 10 Robert Baumgartner 2017-09-11 14:38:38 UTC
Created attachment 1324486 [details]
Overview with no values

Comment 11 Robert Baumgartner 2017-09-11 14:39:21 UTC
Created attachment 1324487 [details]
Pod with no values

Comment 12 Robert Baumgartner 2017-09-11 14:39:46 UTC
still no values... :-(

Comment 13 Steve Speicher 2017-09-11 14:49:09 UTC
I'm getting "empty=true" results as well

>>> Request >>>>
GET /hawkular/metrics/gauges/pod%2F7fef5eb6-92d3-11e7-8dc1-0ab8769191d3%2Fnetwork%2Frx_rate/data?bucketDuration=120000ms&start=1505141133786 HTTP/1.1
Host: metrics.starter-us-west-2.openshift.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Hawkular-Tenant: sspeiche1
Accept: application/json
Origin: https://console.starter-us-west-2.openshift.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
DNT: 1
Referer: https://console.starter-us-west-2.openshift.com/console/project/sspeiche1/browse/pods/java-2-spx4n?tab=metrics
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
<<<< Request <<<<

>>>> Response >>>>
HTTP/1.1 200 OK
Content-Encoding: gzip
X-Powered-By: Undertow/1
Access-Control-Allow-Headers: origin,accept,content-type,hawkular-tenant,authorization
Server: JBoss-EAP/7
Date: Mon, 11 Sep 2017 14:46:35 GMT
Access-Control-Allow-Origin: https://console.starter-us-west-2.openshift.com
Access-Control-Allow-Credentials: true
Content-Type: application/json
Content-Length: 65
Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE, OPTIONS, HEAD
Access-Control-Max-Age: 259200
Cache-control: private

[{"start":1505141133786,"end":1505141253786,"empty":true}]
<<<< Response <<<<


@Matt, as a Red Hatter you can register for access to west-2 by using mwringe+us-west-2 (or whatever "+stuff" to your acccount) or requests through tiered access (see aos-devel on the topic)

Comment 19 Matt Wringe 2017-10-17 13:40:29 UTC
Closing as I think this is out of date. If we still have an issue here, please re-open

Comment 20 Robin Williams 2017-10-17 18:43:31 UTC
Please re-open - metrics are still showing all zeros.

Comment 21 Robert Baumgartner 2017-10-18 06:59:21 UTC
Still no Metrics!!!

Comment 22 Robin Williams 2017-10-18 11:54:23 UTC
Also metrics not working at all on https://metrics.starter-ca-central-1.openshift.com/hawkular/metrics - returning 503

Comment 23 John Sanda 2017-10-18 14:50:24 UTC
Do we know whether or not the memory and heap changes were made to the Cassandra pods?

Comment 24 John Sanda 2018-07-30 18:25:50 UTC
Is this ticket still relevant? I believe all of the clusters mention this ticket are now running later versions.

Comment 25 Steve Speicher 2018-07-30 19:16:17 UTC
appears to be working now

Comment 26 Robert Baumgartner 2018-09-03 12:33:47 UTC
works!


Note You need to log in before you can comment on or make changes to this bug.