Bug 1569096
Summary: | Unbounded growth of tags index prevents graphs from rendering | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | John Sanda <jsanda> | ||||
Component: | Hawkular | Assignee: | Ruben Vargas Palma <rvargasp> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Junqi Zhao <juzhao> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.9.0 | CC: | aanjarle, aos-bugs, ddelcian, erich, erjones, gsapienz, haowang, jmalde, jolee, openshift-bugs-escalate, pvarma, rekhan, ricferna, rvargasp, schoudha | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.11.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-07-18 04:03:42 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1519754 | ||||||
Attachments: |
|
Description
John Sanda
2018-04-18 15:24:42 UTC
There is a bit of a brute force work around for this. First, scale down heapster and hawkular-metrics. Next, you need to truncate a couple tables in Cassandra. # These commands only need to be run from one cassandra pod if there # are multiple # cassandra pods. # $ oc -n openshift-infra exec <cassandra pod> -- cqlsh --ssl -e "truncate table hawkular_metrics.metrics_tags_idx" $ oc -n openshift-infra exec <cassandra pod> -- cqlsh --ssl -e "truncate table hawkular_metrics.metrics_idx" Scale hawkular-metrics back up. Scale heapster back up. On restart heapster will resend tags to repopulate the metrics_tags_idx and metrics_idx tables. When truncating a table, Cassandra creates a snapshot by default. If anything goes wrong, you can copy the files from the snapshot directory back into the parent directory to effectively revert the changes. As This issue has not been fixed, We are seeing it appear more and more (attaching a new customer case). If the truncate workaround works to resolve the problem, I would imagine that it would only be a temporary fix. How often would you guess we would need to rerun truncate process to keep the cluster happy? It is hard to say how often it should be run, maybe weekly. Ruben and I had discussed the possibility of providing a script to automate this. I am going to reassign to him. Eric, please discuss with Ruben about whether or not providing some automation for this makes sense. Thanks. Hey John and Ruben, I think that depends on how the automation would be implemented. Are you thinking some tooling in the pod, or a script that customers could run to modify things as appropriate? *** Bug 1614084 has been marked as a duplicate of this bug. *** Hello team, Any updates on it. The customer(02395295) is checking to see if this is fixed there? Thanks. Praveen Associate Manager - Openshift Created attachment 1640570 [details]
metrics pod logs as requested
metrics pod logs as requested
|