Bug 1569096 - Unbounded growth of tags index prevents graphs from rendering
Summary: Unbounded growth of tags index prevents graphs from rendering
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.11.z
Assignee: Ruben Vargas Palma
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1614084 (view as bug list)
Depends On:
Blocks: 1519754
TreeView+ depends on / blocked
 
Reported: 2018-04-18 15:24 UTC by John Sanda
Modified: 2023-03-24 14:03 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-18 04:03:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
metrics pod logs as requested (87.89 KB, text/plain)
2019-11-29 05:38 UTC, amit anjarlekar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3534961 0 None None None 2018-07-17 19:56:13 UTC

Internal Links: 1773860

Description John Sanda 2018-04-18 15:24:42 UTC
Description of problem:
A little background is necessary. It is a common design with Cassandra to implement indexes as ordinary tables whose primary purpose is to help with and/or optimize a query or queries. Cassandra does have secondary indexes, but they are generally avoided for reasons outside the scope of this ticket. The "tags index" refers specifically to the metrics_tags_idx table.

The metrics_tags_idx table is used at least in a couple places in the console overview page for rendering graphs of deployments as one example.

Heapster collects and reports on a set of metrics for every pod in an OpenShift cluster. Each of those is stored in a separate time series. Each metric also has a set of tags associated with it. When a new pod is deployed in an OpenShift cluster, Heapster sends HTTP requests to Hawkular Metrics to store the tags for the new metrics associated with the pod. Those tags get stored in the metrics_tags_idx table. The metric data points that Heapster collects are stored elsewhere.

A project in OpenShift can have any number of pods. While it might theoretically be possible to have an unbounded number of pods in a project, there are certainly physical constraints that will limit the number of pods. If you consider all pods, including those that have been terminated, then you can effectively have an unbounded number of pods.

All of the data points that get stored have an expiration attached to them via a TTL. Metric data points are never stored indefinitely. When pods are deleted, there is no mechanism in place for removing corresponding tags from the metrics_tags_idx table.

There is a background job that runs in Hawkular Metrics that is supposed to help with cleaning up index tables; however, that job was not working as intended and can actually cause OutOfMemoryErrors. See bug 1559440 for details.

In one production cluster I recently saw a warning in the hawkular-metrics log that reported this:

/hawkular/metrics/m/stats/query took: 424148 ms, exceeds 10000 ms threshold, tenant-id: <tenant id>

That is over 7 minutes. I did some more investigation to figure out what was going on. I directly ran a Cassandra query that gets executed by the /stats/query endpoint.

$ oc -n openshift-infra exec <cassandra pod> -- cqlsh --ssl -e "select count(*) from hawkular_metrics.metrics_tags_idx where tenant_id = '<tenant id>' and tname = 'type' and tvalue = 'pod'"

count
---------
1044227

There were over 1 million rows for just that one tag, and at the time there were only 44 pods in this particular project.

The Cassandra driver has paging built in, and we use a page size of 1,000. This means that more than 1,000 round trips to Cassandra are required to get all of the rows. This alone explains long HTTP response times that often result in exceptions in the hawkular-metrics log like this:

ERROR [org.jboss.resteasy.resteasy_jaxrs.i18n] (RxComputationScheduler-2) RESTEASY002020: Unhandled asynchronous exception, sending back 500: org.jboss.resteasy.spi.UnhandledException: RESTEASY003770: Response is committed, can't handle exception

There are some other problems as well. Multiple tags queries, including the one above, get executed for the /stats/query endpoint. The result sets for those queries get fully realized in memory. When dealing with really large result sets, this will cause a lot of heap pressure in the Hawkular Metrics JVM which could result in lots of garbage collection or even possibly an OutOfMemoryError. Excessive GC can seriously degrade performance.

We need to put a proper solution in place for removing rows from the metrics_tags_idx table. This will most likely involve using Kubernetes watch APIs to get notifications of when pods and project are deleted so that we can perform necessary clean up.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2018-04-19 14:42:44 UTC
There is a bit of a brute force work around for this. First, scale down heapster and hawkular-metrics. Next, you need to truncate a couple tables in Cassandra.

# These commands only need to be run from one cassandra pod if there 
# are multiple # cassandra pods.
#
$ oc -n openshift-infra exec <cassandra pod> -- cqlsh --ssl -e "truncate table hawkular_metrics.metrics_tags_idx"

$ oc -n openshift-infra exec <cassandra pod> -- cqlsh --ssl -e "truncate table hawkular_metrics.metrics_idx"


Scale hawkular-metrics back up. Scale heapster back up. On restart heapster will resend tags to repopulate the metrics_tags_idx and metrics_idx tables. When truncating a table, Cassandra creates a snapshot by default. If anything goes wrong, you can copy the files from the snapshot directory back into the parent directory to effectively revert the changes.

Comment 7 Eric Jones 2018-10-03 13:26:27 UTC
As This issue has not been fixed, We are seeing it appear more and more (attaching a new customer case).

If the truncate workaround works to resolve the problem, I would imagine that it would only be a temporary fix. How often would you guess we would need to rerun truncate process to keep the cluster happy?

Comment 9 John Sanda 2018-10-03 13:56:01 UTC
It is hard to say how often it should be run, maybe weekly. Ruben and I had discussed the possibility of providing a script to automate this. I am going to reassign to him.

Eric, please discuss with Ruben about whether or not providing some automation for this makes sense. Thanks.

Comment 10 Eric Jones 2018-10-03 16:00:40 UTC
Hey John and Ruben,

I think that depends on how the automation would be implemented.

Are you thinking some tooling in the pod, or a script that customers could run to modify things as appropriate?

Comment 15 Ruben Vargas Palma 2019-04-10 19:56:06 UTC
*** Bug 1614084 has been marked as a duplicate of this bug. ***

Comment 16 Praveen Varma 2019-07-16 03:40:44 UTC
Hello team, 

Any updates on it. 

The customer(02395295) is checking to see if this is fixed there?

Thanks.
Praveen
Associate Manager - Openshift

Comment 20 amit anjarlekar 2019-11-29 05:38:13 UTC
Created attachment 1640570 [details]
metrics pod logs as requested

metrics pod logs as requested


Note You need to log in before you can comment on or make changes to this bug.