Bug 1460350
Summary: | [starter][starter-us-east-1] metrics failing to show in console | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Steve Speicher <sspeiche> | ||||||
Component: | Hawkular | Assignee: | Matt Wringe <mwringe> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Liming Zhou <lizhou> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | unspecified | CC: | aos-bugs, dmcphers, eparis, jcantril, mhalachev, rbaumgar, sspeiche | ||||||
Target Milestone: | --- | Keywords: | OnlineStarter | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-10-17 13:40:19 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Steve Speicher
2017-06-09 19:37:05 UTC
Following IRC, sounds like starter-us-east-1 is suffering both AWS API rate limiting and etcd pressures I can't seem to access starter-us-east-1 (I always get redirected to starter-us-east-2). Is this still an issue? Or was it resolved when the other issues in the cluster were fixed? Now that it has been upgraded to 3.6, just simply getting 503 error hitting https://metrics.starter-us-east-1.openshift.com/hawkular/metrics The 503 error you are seeing for https://metrics.starter-us-east-1.openshift.com/hawkular/metrics is coming from the router, not Hawkular Metrics. This either means the route is not properly configured or the Hawkular Metrics pod is not in the running state. Can you please check what the state is of the metric pods in this cluster. The problem is not the router. It is the pods. I have no idea what is wrong with the pods: # oc get pod -n openshift-infra NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-j5v0k 1/1 Running 0 2d hawkular-cassandra-2-khd50 1/1 Running 0 2d hawkular-metrics-rcwjh 0/1 Running 423 2d heapster-0w9r6 0/1 Running 441 2d # oc logs -n openshift-infra heapster-0w9r6 | tail Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying. I have no idea how to debug heapster itself. Created attachment 1302495 [details]
oc get pod heapster -o json
Created attachment 1303937 [details]
console empty metrics graph
(In reply to Eric Paris from comment #6) > The problem is not the router. It is the pods. I have no idea what is wrong > with the pods: > > # oc get pod -n openshift-infra > NAME READY STATUS RESTARTS AGE > hawkular-cassandra-1-j5v0k 1/1 Running 0 2d > hawkular-cassandra-2-khd50 1/1 Running 0 2d > hawkular-metrics-rcwjh 0/1 Running 423 2d > heapster-0w9r6 0/1 Running 441 2d > Who is monitoring this cluster? Obviously if hawkular-metrics has failed 423 times over the past 2 days something is really wrong. Why has it been restarting so many times? > > I have no idea how to debug heapster itself. Heapster is not the problem, it wont start until Hawkular Metrics has started. So we need to fix Hawkular Metrics first. I have same issue on https://console.starter-us-west-2.openshift.com/console, no metrics data... The log shows a 204(no data) on the POST to https://metrics.starter-us-west-2.openshift.com/hawkular/metrics/metrics/stats/query with Body: {"tags":"descriptor_name:network/tx_rate|network/rx_rate,type:pod,pod_id:d513a826-7d3f-11e7-8490-0a69cdf75e6f","bucketDuration":"1mn","start":"-15mn"} (In reply to Robert Baumgartner from comment #12) > I have same issue on > https://console.starter-us-west-2.openshift.com/console, no metrics data... > The log shows a 204(no data) on the POST to > https://metrics.starter-us-west-2.openshift.com/hawkular/metrics/metrics/ > stats/query with Body: > {"tags":"descriptor_name:network/tx_rate|network/rx_rate,type:pod,pod_id: > d513a826-7d3f-11e7-8490-0a69cdf75e6f","bucketDuration":"1mn","start":"-15mn"} metrics failing to show up in the console is a very basic error condition for any number of problems. Can you please open a new issue so that we can properly track it there? Otherwise we end up with the situation where one bugzilla ends up covering multiple issues and its difficult to keep track of what is happening. (In reply to Matt Wringe from comment #13) > (In reply to Robert Baumgartner from comment #12) > > I have same issue on > > https://console.starter-us-west-2.openshift.com/console, no metrics data... > > The log shows a 204(no data) on the POST to > > https://metrics.starter-us-west-2.openshift.com/hawkular/metrics/metrics/ > > stats/query with Body: > > {"tags":"descriptor_name:network/tx_rate|network/rx_rate,type:pod,pod_id: > > d513a826-7d3f-11e7-8490-0a69cdf75e6f","bucketDuration":"1mn","start":"-15mn"} > > metrics failing to show up in the console is a very basic error condition > for any number of problems. Can you please open a new issue so that we can > properly track it there? Otherwise we end up with the situation where one > bugzilla ends up covering multiple issues and its difficult to keep track of > what is happening. done, https://bugzilla.redhat.com/show_bug.cgi?id=1480261 Closing as I think this is out of date. If we still have an issue here, please re-open |