Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1331831

Summary: How can you constrain the metrics collection service to a set disk space?
Product: OpenShift Container Platform Reporter: Miheer Salunke <misalunk>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED DUPLICATE QA Contact: chunchen <chunchen>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.0CC: aos-bugs, wsun
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-02 13:47:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miheer Salunke 2016-04-29 17:47:20 UTC
Description of problem:

This is regarding a discussion being in http://post-office.corp.redhat.com/archives/openshift-sme/2016-April/msg00788.html

>>>>>>>>When running metrics the casandra database expands until we run out of >>>>>>>>disk space.  We are presently just running it as an emptydir mount.  >>>>>>>>How can we configure the metrics service so that it manages the amount >>>>>>>>of space that it uses?  It doesn't just grow indefinitely does it?

The way it's setup is that metrics are kept for a specific amount of time, by default this is 7 days but it can be configured using the METRICS_DURATION parameter with the deployer. Once this duration is met, those metrics are deleted to make way space for new metrics

So the Cassandra database size should essentially level out. We don't have exact database size recommendations at this point, QE is suppose to be running tests on that right now to determine the size recommendations, but I have yet to see their results.

But, This solution is incomplete because it still doesn't deal with the issue of correctly sizing the metrics collection service, allowing me to choose how many days metrics that I can keep does not help me calculate how much storage I should aim to give the service.

Even if I constrain the number of days that metrics are kept, I can still run out of disk space due to other factors, such as the number of pods/nodes being monitored.  I think that the documentation needs to supply estimates of metrics database size based on different sizes of cluster so that we can work out broad requirements for sizing the service.  I want to keep the metrics service available and running out of space breaks the service.

I think that a better solution to the sizing problem would be to provide a setting which will constrain the size of the metrics database to an available size, and then the number of days/hours/minutes are kept are then varied by the collection process in order to ensure that it does not breach this size.  You could think of this as the service adapting to the space that it has available.  Other things to consider would be a procedure for adding additional storage to the service without taking it offline.

So shall we suggest the customers a recommended size for elastic search or alert the user when the size is going to get full or shall they set the no of days using METRIC_DURATION in the deployer template or they shall be aware of the usage that is going happen and that the admin shall allocate storage accordingly

Version-Release number of selected component (if applicable):
3.1.1

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Dan McPherson 2016-05-02 13:47:37 UTC

*** This bug has been marked as a duplicate of bug 1316275 ***