Bug 1422271

Summary: Large partitions make Cassandra unstable and cause requests to fail in Hawkular Metric
Product: OpenShift Container Platform Reporter: John Sanda <jsanda>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: aos-bugs, bmorriso, gburges, jforrest, jgoulding, jsanda, lizhou, mmahut, pdwyer, pweil, smunilla, sten, trankin, vlaad, whearn, xtian, zhiwliu, zhizhang
Target Milestone: ---Keywords: OpsBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1439910 1439912 (view as bug list) Environment:
Last Closed: 2017-08-10 05:18:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1439910, 1439912    

Description John Sanda 2017-02-14 22:48:10 UTC
Description of problem:
I have seen logs from a few different environments where there are partitions in the metrics_idx table that are as large as 496 MB and even 700 MB. Cassandra reports warning messages like this:

################
WARN  21:08:41 Writing large partition hawkular_metrics/metrics_idx:ops-health-monitoring:2 (699699190 bytes)
################

Cassandra is letting us know that it is writing a partition to the metrics_idx table that is 700 MB in size. The compaction_large_partition_warning_threshold_mb parameter in cassandra.yaml controls when the warning will be logged. It defaults to 100 MB.

Really big partitions like this can be problematic during compaction. It is causing lots of client requests to Hawkular Metrics to fail.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 61 John Sanda 2017-06-27 13:48:12 UTC
This issue is addressed upstream by https://issues.jboss.org/browse/HWKMETRICS-613. 3.6 builds should have this new functionality.

Comment 63 Junqi Zhao 2017-07-07 09:14:46 UTC
This defect should be do stress testing on lb instance, currently there is one defect https://bugzilla.redhat.com/show_bug.cgi?id=1468113, will verify this defect after BZ #1468113 is get fixed.

Comment 64 Junqi Zhao 2017-07-17 03:48:12 UTC
@jsanda,

Tested and attached the hawkular-cassandra pod log, it did throw out warn info if large_partition threshold exceed 100Mb, see the following messags, 106891573 bytes = 101.9397Mb
****************************************************************************
2017-07-15 02:44:11,061 BigTableWriter.java:171 - Writing large partition hawkular_metrics/metrics_idx:clusterproject:0 (106891573 bytes to sstable /cassandra_data/data/hawkular_metrics/metrics_idx-7701404068a611e795d11b216051c746/mc-67-big-Data.db) 
WARN  [SharedPool-Worker-20] 2017-07-15 02:44:35,286 NoSpamLogger.java:94 - Unlogged batch covering 12 partitions detected against table [hawkular_metrics.data]. You should use a logged batch for atomicity, or asynchronous writes for performance.
****************************************************************************

But I also see the warning messages, 165956826 bytes = 158.2688Mb, my question is: if the large_partition still grows, for example, it grows to 200, 300Mb, does the program do nothing, but just throw out the warn messages to indicate it exceeds the large_partition threshold?
***************************************************************************
WARN  [CompactionExecutor:491] 2017-07-15 08:20:45,238 BigTableWriter.java:171 - Writing large partition hawkular_metrics/metrics_idx:clusterproject:0 (165956826 bytes to sstable /cassandra_data/data/hawkular_metrics/metrics_idx-7701404068a611e795d11b216051c746/mc-99-big-Data.db) 
WARN  [SharedPool-Worker-1] 2017-07-15 08:21:05,760 NoSpamLogger.java:94 - Unlogged batch covering 16 partitions detected against table [hawkular_metrics.data]. You should use a logged batch for atomicity, or asynchronous writes for performance.
****************************************************************************

Images from ops registry
metrics-hawkular-metrics    v3.6.140            3a5bebd0476a        6 days ago          1.293 GB
metrics-heapster            v3.6.140            5549c67d8607        6 days ago          274.4 MB
metrics-cassandra           v3.6.140            9644ec21e399        6 days ago          573.2 MB

Comment 66 Junqi Zhao 2017-07-18 10:14:57 UTC
Vlaad(vlaad) created 6500 pods and deleted them under one project, and I checked the hawkular-cassandra and hawkular-metrics pod logs
There was warning info when large partition is larger than settings of  compaction_large_partition_warning_threshold_mb. And for OCP 3.6 we have introduced a background job in hawkular-metrics that clean up the index tables removing rows for the deleted pods. This should help prevent those partitions from constantly getting bigger.

Images from ops registry
metrics-hawkular-metrics    v3.6.140            3a5bebd0476a        6 days ago          1.293 GB
metrics-heapster            v3.6.140            5549c67d8607        6 days ago          274.4 MB
metrics-cassandra           v3.6.140            9644ec21e399        6 days ago          573.2 MB

Comment 67 John Sanda 2017-07-18 13:36:01 UTC
(In reply to Junqi Zhao from comment #66)
> Vlaad(vlaad) created 6500 pods and deleted them under one
> project, and I checked the hawkular-cassandra and hawkular-metrics pod logs
> There was warning info when large partition is larger than settings of 
> compaction_large_partition_warning_threshold_mb. And for OCP 3.6 we have
> introduced a background job in hawkular-metrics that clean up the index
> tables removing rows for the deleted pods. This should help prevent those
> partitions from constantly getting bigger.
> 
> Images from ops registry
> metrics-hawkular-metrics    v3.6.140            3a5bebd0476a        6 days
> ago          1.293 GB
> metrics-heapster            v3.6.140            5549c67d8607        6 days
> ago          274.4 MB
> metrics-cassandra           v3.6.140            9644ec21e399        6 days
> ago          573.2 MB

The deletion job only runs once a week by default. It can be scheduled to run more frequently by setting the METRICS_EXPIRATION_JOB_FREQUENCY envar whose values are interpreted in days.

Comment 68 Matt Wringe 2017-07-18 14:43:22 UTC
(In reply to John Sanda from comment #67)
> (In reply to Junqi Zhao from comment #66)
> > Vlaad(vlaad) created 6500 pods and deleted them under one
> > project, and I checked the hawkular-cassandra and hawkular-metrics pod logs
> > There was warning info when large partition is larger than settings of 
> > compaction_large_partition_warning_threshold_mb. And for OCP 3.6 we have
> > introduced a background job in hawkular-metrics that clean up the index
> > tables removing rows for the deleted pods. This should help prevent those
> > partitions from constantly getting bigger.
> > 
> > Images from ops registry
> > metrics-hawkular-metrics    v3.6.140            3a5bebd0476a        6 days
> > ago          1.293 GB
> > metrics-heapster            v3.6.140            5549c67d8607        6 days
> > ago          274.4 MB
> > metrics-cassandra           v3.6.140            9644ec21e399        6 days
> > ago          573.2 MB
> 
> The deletion job only runs once a week by default. It can be scheduled to
> run more frequently by setting the METRICS_EXPIRATION_JOB_FREQUENCY envar
> whose values are interpreted in days.

Should we set this default to be lower than 7 days by default? Or expose this parameter in ansible?

Comment 69 John Sanda 2017-07-18 15:17:12 UTC
(In reply to Matt Wringe from comment #68)
> (In reply to John Sanda from comment #67)
> > (In reply to Junqi Zhao from comment #66)
> > > Vlaad(vlaad) created 6500 pods and deleted them under one
> > > project, and I checked the hawkular-cassandra and hawkular-metrics pod logs
> > > There was warning info when large partition is larger than settings of 
> > > compaction_large_partition_warning_threshold_mb. And for OCP 3.6 we have
> > > introduced a background job in hawkular-metrics that clean up the index
> > > tables removing rows for the deleted pods. This should help prevent those
> > > partitions from constantly getting bigger.
> > > 
> > > Images from ops registry
> > > metrics-hawkular-metrics    v3.6.140            3a5bebd0476a        6 days
> > > ago          1.293 GB
> > > metrics-heapster            v3.6.140            5549c67d8607        6 days
> > > ago          274.4 MB
> > > metrics-cassandra           v3.6.140            9644ec21e399        6 days
> > > ago          573.2 MB
> > 
> > The deletion job only runs once a week by default. It can be scheduled to
> > run more frequently by setting the METRICS_EXPIRATION_JOB_FREQUENCY envar
> > whose values are interpreted in days.
> 
> Should we set this default to be lower than 7 days by default? Or expose
> this parameter in ansible?

I think that the default of 7 days was based on the default data retention of 7 days. The job will not delete any metric definitions if they have live data points.

To keep things consistent it probably makes sense to expose the setting in ansible. There are one or two other properties that might need to be exposed. I will take a look and create a ticket and PR for the changes.

Comment 71 errata-xmlrpc 2017-08-10 05:18:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716