Bug 1626362
| Summary: | Metrics network graph empty intermittently | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Min Woo Park <mpark> | ||||||||||||
| Component: | Hawkular | Assignee: | Ruben Vargas Palma <rvargasp> | ||||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Junqi Zhao <juzhao> | ||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 3.4.1 | CC: | aos-bugs, mpark | ||||||||||||
| Target Milestone: | --- | ||||||||||||||
| Target Release: | 3.4.z | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2019-04-23 18:16:51 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
Created attachment 1481503 [details]
pod metrics graph screen shot 2
Created attachment 1481504 [details]
heapster pod log
Created attachment 1481505 [details]
hawkular-metrics pod log
Created attachment 1481506 [details]
hawkular-cassandra pod log
Gaps in the graphs means that there are missing data points. Has the customer looked at other graphs for other pods/applications to see if there are any gaps? Unless there is an issue on the heapster side, I doubt the issue is specific to network metrics. The heapster log has very little info in it, and there are no errors in the hawkular-metrics log. The cassandra log reports lots of ParNew GCs. ParNew is a stop-the-world garbage collection which means all application threads are paused while it runs. Excessive GC could cause Cassandra to fall behind with and then drop some requests. Can we get the output of: $ oc -n openshift-infra exec <cassandra pod> nodetool tpstats |
Created attachment 1481502 [details] pod metrics graph screen shot 1 Description of problem: When the customer monitor from pod metrics, they could see emtpy graph only for network intermittently. (attached screenshot) CPU and Memory graph is no problem at the same time. Does this mean Pod network got any issue at that time? How can we troubleshooting for this issue? Attached hawkular-metrics, hawkular-cassandra, heapster pod log Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: