Bug 1458186
| Summary: | Hawkular metrics rest api responding sporadically | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Miheer Salunke <misalunk> | ||||
| Component: | Hawkular | Assignee: | John Sanda <jsanda> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 3.4.1 | CC: | aos-bugs, jsanda, juzhao, knakayam, miburman, misalunk, mmahut, mruzicka, mwringe, pweil, snegrea, tumeya, zhizhang | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.5.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause:
Extra, unnecessary queries were being performed on each request. The issue was logged upstream as https://issues.jboss.org/browse/HWKMETRICS-711.
Consequence:
The GET /hawkular/metrics/metrics endpoint could fail with timeouts.
Fix:
Only perform the extra queries when explicitly requested. By default, do not execute the extra queries which provide optional data.
Result:
The endpoint is more stable and not as susceptible to timeouts.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-12-07 07:10:26 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1500644 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Comment 12
Matt Wringe
2017-06-20 18:07:26 UTC
@Stefan, as you requested we will collect the logs again, but could you please give us your comment about below error, which is outputting the customer's env? ~~~ E0508 06:26:09.276956 1 client.go:243] Post https://hawkular-metrics:443/hawkular/metrics/counters/data: dial tcp 172.30.xxx.xx:443: getsockopt: no route to host ~~~ getsockopt is a sign that Kubernetes networking can't communicate between pods. These requests will never reach Hawkular-Metrics (the requests stop in the networking layer already). Yeah, so I think that op team or someone who can access to the cluster should investigate the network more. But we should collect new metrics logs? (I am asking because ops let us file this ticket and ask dev team.) This issue was originally fixed upstream in HWKMETRICS-625. We will back port it in HWKMETRICS-711 so that it can go into OCP 3.5.1. From an e-mail on 2017-08-28, this fix was waiting on Hawkular Metrics 0.23.10. Is that still the case, please? Thanks. It looks like in OCP 3.5.1 we are using 0.23.8, so the update has not yet been applied. (In reply to Matt Wringe from comment #26) === > It looks like in OCP 3.5.1 we are using 0.23.8, so the update has not yet > been applied. === Is this still the case? Can you please share the errata for this ? created pod within projects and let metrics running for 8 hours, during this time
use oc client to run
#for i in {0..99}; ./shell.sh ; done
shell.sh is script to curl metrics, see the attached file.
checked metrics log, there did not have exceptions in pods log and rest api was worked well.
# openshift version
openshift v3.5.5.31.39
kubernetes v1.5.2+43a9be4
etcd 3.1.0
metrics-hawkular-metrics/images/3.5.0-50
metrics-cassandra/images/3.5.0-41
metrics-heapster/images/3.5.0-33
Created attachment 1343645 [details]
script to curl rest api
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3389 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |