Bug 1316275 - Cassandra filling up diskspace
Summary: Cassandra filling up diskspace
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.2.1
Assignee: John Sanda
QA Contact: chunchen
URL:
Whiteboard:
: 1331831 (view as bug list)
Depends On:
Blocks: 1267746
TreeView+ depends on / blocked
 
Reported: 2016-03-09 20:29 UTC by Matt Wringe
Modified: 2019-11-14 07:34 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-27 15:05:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker HWKMETRICS-367 0 Major Closed Reduce gc_grace_seconds for data table 2020-10-27 05:43:58 UTC
Red Hat Product Errata RHBA-2016:1343 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 3.2.1.1 bug fix and enhancement update 2016-06-27 19:04:05 UTC

Description Matt Wringe 2016-03-09 20:29:42 UTC
Description of problem: The metrics configuration uses Cassandra as a persistent storage with a configurable duration for how long metrics are to exist before they are deleted from storage.

If the disk space given to Cassandra is not large enough, it is possible to fill the whole disk space with data.

We need a way to handle the situation where if the disk space is full or getting full that I user can remove some of the data to free up space and get metrics functioning again.

Comment 4 John Sanda 2016-03-11 04:48:10 UTC
When data is deleted in Cassandra, space is not reclaimed immediately. A deleted cell, i.e., column, is replaced with a tombstone. Some point after the duration specified by gc_grace_seconds has elapsed, the space will be reclaimed during the compaction process. The gc_grace_seconds setting is set per table and defaults to 10 days. 

When running out of disk space, performing a major compaction can help for tables using size tiered compaction, which we are for the data table. There is a caveat to this though. The compaction process needs roughly double the size of the table to do its work. If the disk is close to being full and if the table is large enough, a major compaction may fail.

There is something else to consider with running a major compaction. Once you run a major compaction, you may wind up having to run major compactions manually going forward instead of being able to rely on Cassandra automatically performing minor compactions.

There is another option to reclaim space. SSTables are immutable. Once Cassandra writes a table to disk, it never writes to that SSTable again. We can use this immutability to our advantage. Let's say the data retention in Hawkular Metrics is set to 7 days (which happens to be the system default). If the last modification time for an SSTable is greater than 7 days, then we know it only contains deleted data and thus can be safely deleted. Note that I am referring only to the data table, which is append-only. We never update existing cells. If we did update existing cells, it would not be safe to purge the files until gc_grace_seconds time had elapsed.

For an almost full disk, I recommend seeing if any SSTables can be safely deleted from disk manually as explained above. That should provide an immediate fix. For the long term, we definitely need to do some profiling so that at a minimum we can provide documentation around sizing.

Comment 30 Matt Wringe 2016-04-08 15:14:02 UTC
PR sent to the docs project to high light this potential issue: https://github.com/openshift/openshift-docs/pull/1877

Comment 31 Matt Wringe 2016-04-11 18:27:25 UTC
When the PR for the docs is pulled, I think we can close this issue.

Metrics are kept in Cassandra based on a time to live value (set via the METRIC_DURATION deployer parameter, defaulting to 7 days). Once this duration is up, the metrics beyond this threshold are deleted to make room for more recent metrics. This way we are not constantly filling up disk space forever, and the disk usage should level off once the duration is meet (and assuming a constant OpenShift cluster size).

The problem will still exist though if the persistent volume size is set to be too small for the amount of metrics that need to be stored for this time frame. There is not much we can do in this situation. 

As mentioned in the doc PR, it will have to be up to the admin to manage and monitor disk usage.

We do need the results for the tests that QE are running to give advise on how large of disk space Cassandra will require based on varying OpenShift cluster sizes and metric duration values.

Comment 32 John Sanda 2016-04-12 18:33:18 UTC
(In reply to Matt Wringe from comment #31)
> When the PR for the docs is pulled, I think we can close this issue.
> 
> Metrics are kept in Cassandra based on a time to live value (set via the
> METRIC_DURATION deployer parameter, defaulting to 7 days). Once this
> duration is up, the metrics beyond this threshold are deleted to make room
> for more recent metrics. This way we are not constantly filling up disk
> space forever, and the disk usage should level off once the duration is meet
> (and assuming a constant OpenShift cluster size).
> 
> The problem will still exist though if the persistent volume size is set to
> be too small for the amount of metrics that need to be stored for this time
> frame. There is not much we can do in this situation. 
> 
> As mentioned in the doc PR, it will have to be up to the admin to manage and
> monitor disk usage.
> 
> We do need the results for the tests that QE are running to give advise on
> how large of disk space Cassandra will require based on varying OpenShift
> cluster sizes and metric duration values.

I really think we should include the changes for HWKMETRICS-367. It reduces the time Cassandra waits to reclaim disk space from 10 days to 1 day. That is pretty significant.

We could also include https://issues.jboss.org/browse/HWKMETRICS-381. If the default, typical deployment is with a single Cassandra node, then we ought to be setting gc_grace_seconds to zero for all tables. Only when we have multiple nodes do we need to change that setting.

Comment 36 Dan McPherson 2016-05-02 13:47:37 UTC
*** Bug 1331831 has been marked as a duplicate of this bug. ***

Comment 37 Eric Jones 2016-05-31 21:26:48 UTC
Do we have any concrete steps for customers that can be used to clean up the Cassandra PV should it potentially run out of space?

I know this was discussed previously but It is not clear if it was resolved.

Comment 38 John Sanda 2016-06-04 15:45:28 UTC
Assuming all metrics are using the same retention period, you can manually remove SSTables that are older than the retention period. SSTables are commonly referred to as the data files, but an SSTable is comprised of several files. For the hawkular_metrics.data table these would include:

* Data.db
* Index.db
* Filter.db
* CompressionInfo.db
* Statistics.db
* Digest.adler32
* Summary.db
* TOC.txt

Each component file name is prefixed with a two letter version string followed by an integer that represents a generation. In this context version refers to the version of the SSTable format and the generation is like an increasing revision number. This prefix will be the same for all components of an SSTable. Each SSTable (i.e., each set of component files) will have a different generation. Here is an example of the component files:

la-84-CompressionInfo.db
la-84-Data.db
la-84-Digest.adler32
la-84-Filter.db
la-84-Index.db
la-84-Statistics.db
la-84-Summary.db
la-84-TOC.txt

My apologies for throwing in all these details, but I want to make sure I provide sufficient background. Now let's say that the data retention is 10 days. If an SSTable is older than 10 days, you can safely delete all the component files. If there are any snapshots, you can remove those directories as well.

Comment 39 chunchen 2016-06-08 09:19:14 UTC
According to comment #38, we will manually remove SSTables to reclaim disk space for now, so mark it as verified.

Comment 41 errata-xmlrpc 2016-06-27 15:05:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1343


Note You need to log in before you can comment on or make changes to this bug.