Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1570140 - Hawkular is not clearing out old data even though hawkular.metrics.default-ttl is specified
Hawkular is not clearing out old data even though hawkular.metrics.default-tt...
Status: CLOSED DUPLICATE of bug 1567222
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular (Show other bugs)
3.7.0
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.7.z
Assigned To: John Sanda
Junqi Zhao
:
Depends On: 1567222
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-20 13:28 EDT by Luke Stanton
Modified: 2018-06-05 09:16 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-05-30 21:40:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Older output from du /cassandra_data/data/hawkular_metrics (30.72 KB, application/x-gzip)
2018-04-20 18:35 EDT, Luke Stanton
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1798 normal SHIPPED_LIVE OpenShift Container Platform 3.7 bug fix update 2018-06-26 18:41:33 EDT

  None (edit)
Description Luke Stanton 2018-04-20 13:28:02 EDT
Description of problem:
Hawkular is not honoring the hawkular.metrics.default-ttl. User has set value and restarted metrics components to pick up the change but older data isn't being cleared out and the volume is running out of space.

One of the Hawkular logs has the following recurring error...

[31m2018-04-10 05:01:11,450 ERROR [org.hawkular.metrics.core.service.MetricsServiceImpl] (RxComputationScheduler-4) Failure while trying to apply compression, skipping block: 
  java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/10.131.7.65:9042 
  (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/10.131.7.65:9042] Timed out waiting for server response))...

How reproducible:
Consistently reproducible by customer

Steps to Reproduce:
1. Set -Dhawkular.metrics.default-ttl=3 in hawkular-metrics replication controller
2. Restart hawkular-metrics pod

Actual results:
Metrics data in Cassandra does not appear to get cleaned up per the ttl setting.

Expected results:
Older data should be cleared out automatically per the ttl setting.
Comment 2 John Sanda 2018-04-20 14:02:55 EDT
The customer might be hitting bug 1567222. Can I get the output of `du -h /cassandra_data/data/hawkular_metrics`.
Comment 3 John Sanda 2018-04-20 14:20:21 EDT
Cassandra is getting bogged down with garbage collection which is very likely the cause for most of the exceptions you are seeing in the logs. I recommend doubling the memory to 4 GB for the Cassandra pod.
Comment 4 Luke Stanton 2018-04-20 18:35 EDT
Created attachment 1424759 [details]
Older output from du /cassandra_data/data/hawkular_metrics

I don't know if this is still useful but customer had attached this data in an earlier comment.
Comment 5 John Sanda 2018-04-20 19:28:32 EDT
(In reply to Luke Stanton from comment #4)
> Created attachment 1424759 [details]
> Older output from du /cassandra_data/data/hawkular_metrics
> 
> I don't know if this is still useful but customer had attached this data in
> an earlier comment.

Definitely useful. It does look like the customer is hitting bug 1567222. As a temporary work around until the fix is pushed out run:

$ oc -n openshift-infra <cassandra pod> nodetool clearsnapshot
Comment 10 Junqi Zhao 2018-05-30 21:40:48 EDT

*** This bug has been marked as a duplicate of bug 1567222 ***

Note You need to log in before you can comment on or make changes to this bug.