Bug 1471239

Summary: Cassandra Java heap parameters can configured incorrectly
Product: OpenShift Container Platform Reporter: John Sanda <jsanda>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: aos-bugs, jsanda, pdwyer, pweil, snegrea
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 3.7.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:01:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Sanda 2017-07-14 19:15:33 UTC
Description of problem:
We pass the -Xms, -Xmx, and -Xmn flags to the Cassandra JVM. -Xms sets the minimum size for all of the heap. -Xmx set the max size for all of the heap. -Xmn sets the size of the new generation. The heap is basically divided into two sections, the new or young generation and the old generation. We calculate the value for -Xmn based on the number of cpu cores. There are times, like when there are no cpu limits, that we can end up with -Xmn having a value larger than -Xmx. If that happens, the JVM will log a warning at start up like this:

OpenJDK 64-Bit Server VM warning: MaxNewSize (4096000k) is equal to or greater than the entire heap (1048576k).  A new max generation size of 1048512k will be used.

We need to fix this. The JVM will dynamically resize the generations as needed, and this can also be controlled with other flags which we do not set. For the type of work loads with which we are dealing, I think we generally want the new generation to be between 1/4 and 1/2 of the total heap. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Matt Wringe 2017-07-18 13:33:08 UTC
We have been setting it this way based on the Cassandra recommendations (although we were not taking into account the total memory available).

How serious is this issue?

Comment 3 John Sanda 2017-07-19 22:08:02 UTC
I am not sure how serious it is. We know it does not prevent Cassandra from starting, and I only think it is an issue where there is unlimited cpu. Based on some additional research I think we should set the new generation to 1/4 of the total heap. I think that is a reasonable default.

Comment 4 Matt Wringe 2017-09-29 18:39:59 UTC
We are now setting the heap newsize to be 1/3 of the heapsize.

Comment 5 Junqi Zhao 2017-10-09 03:06:01 UTC
Tested with currently latest metrics-cassandra image:metrics-cassandra:v3.7.0-0.144.0.0 and set openshift_metrics_cassandra_limits_memory=3Gi

From metrics-cassandra logs, HEAP_NEWSIZE is 1/3 of MAX_HEAP_SIZE.
# oc logs hawkular-cassandra-1-2lzzf
The MAX_HEAP_SIZE envar is not set. Basing the MAX_HEAP_SIZE on the available memory limit for the pod (3221225472).
The memory limit is between 2 and 4GB. Setting max_heap_size to 1GB.
The MAX_HEAP_SIZE has been set to 1024M
The HEAP_NEWSIZE envar is not set. Setting the HEAP_NEWSIZE to one third the MAX_HEAP_SIZE: 341M


Also tested the default limit memory size(2G), from pod log, the HEAP_NEWSIZE is also 1/3 of MAX_HEAP_SIZE now. 
# oc logs hawkular-cassandra-1-h2f6j
The MAX_HEAP_SIZE envar is not set. Basing the MAX_HEAP_SIZE on the available memory limit for the pod (2000000000).
The memory limit is less than 2GB. Using 1/2 of available memory for the max_heap_size.
The MAX_HEAP_SIZE has been set to 953M
The HEAP_NEWSIZE envar is not set. Setting the HEAP_NEWSIZE to one third the MAX_HEAP_SIZE: 317M

Close it as VERIFIED

Comment 8 errata-xmlrpc 2017-11-28 22:01:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188