+++ This bug was initially created as a clone of Bug #1457499 +++ Description of problem: OCP 3.4 ships with Cassandra 3.0.9 which uses Netty 4.0.23. There is a severe bug in 4.0.23, https://github.com/netty/netty/issues/3057, which can lead to OutOfMemoryErrors. This issue was tracked in Cassandra under https://issues.apache.org/jira/browse/CASSANDRA-13114 and https://issues.apache.org/jira/browse/CASSANDRA-13126. We should upgrade to the latest 3.0.x release of Cassandra which is currently 3.0.13. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Deployed Metrics 3.5.0, and CASSANDRA_VERSION=3.0.12.redhat-1, metrics worked well, do we only need to verify the CASSANDRA_VERSION, or should we do more testings about netty? # oc rsh hawkular-cassandra-1-c418t sh-4.2$ env | grep -i ver JBOSS_IMAGE_VERSION=1.3 JAVA_VERSION=1.8.0 CASSANDRA_VERSION=3.0.12.redhat-1 Images from brew registry # docker images | grep metrics openshift3/metrics-heapster v3.5 a2b34520f0fc 2 days ago 318.5 MB openshift3/metrics-hawkular-metrics v3.5 9e8f35b1eaf8 2 days ago 1.29 GB openshift3/metrics-cassandra v3.5 0452b69291ff 2 days ago 545.1 MB By the way, CASSANDRA_VERSION=3.0.13.redhat-1 in Metrics 3.6.0
(In reply to Junqi Zhao from comment #5) > Deployed Metrics 3.5.0, and CASSANDRA_VERSION=3.0.12.redhat-1, metrics > worked well, do we only need to verify the CASSANDRA_VERSION, or should we > do more testings about netty? I believe this is one of these situations where it may be difficult to reproduce the problem encountered. @jsanda: is it possible to reproduce this issue do that ops can verify that its been fixed? And also include this test in their test suite. > # oc rsh hawkular-cassandra-1-c418t > sh-4.2$ env | grep -i ver > JBOSS_IMAGE_VERSION=1.3 > JAVA_VERSION=1.8.0 > CASSANDRA_VERSION=3.0.12.redhat-1 > > Images from brew registry > # docker images | grep metrics > openshift3/metrics-heapster v3.5 a2b34520f0fc > 2 days ago 318.5 MB > openshift3/metrics-hawkular-metrics v3.5 9e8f35b1eaf8 > 2 days ago 1.29 GB > openshift3/metrics-cassandra v3.5 0452b69291ff > 2 days ago 545.1 MB > > > By the way, CASSANDRA_VERSION=3.0.13.redhat-1 in Metrics 3.6.0 Yes, I pushed through a build to downgrade it to 3.0.12, but it looked it couldn't be started due to another build. I have pushed out another build now.
(In reply to Matt Wringe from comment #6) > (In reply to Junqi Zhao from comment #5) > > Deployed Metrics 3.5.0, and CASSANDRA_VERSION=3.0.12.redhat-1, metrics > > worked well, do we only need to verify the CASSANDRA_VERSION, or should we > > do more testings about netty? > > I believe this is one of these situations where it may be difficult to > reproduce the problem encountered. > > @jsanda: is it possible to reproduce this issue do that ops can verify that > its been fixed? And also include this test in their test suite. The only thing I know to do is to compare heap dumps from the different versions. > > > # oc rsh hawkular-cassandra-1-c418t > > sh-4.2$ env | grep -i ver > > JBOSS_IMAGE_VERSION=1.3 > > JAVA_VERSION=1.8.0 > > CASSANDRA_VERSION=3.0.12.redhat-1 > > > > Images from brew registry > > # docker images | grep metrics > > openshift3/metrics-heapster v3.5 a2b34520f0fc > > 2 days ago 318.5 MB > > openshift3/metrics-hawkular-metrics v3.5 9e8f35b1eaf8 > > 2 days ago 1.29 GB > > openshift3/metrics-cassandra v3.5 0452b69291ff > > 2 days ago 545.1 MB > > > > > > By the way, CASSANDRA_VERSION=3.0.13.redhat-1 in Metrics 3.6.0 > > Yes, I pushed through a build to downgrade it to 3.0.12, but it looked it > couldn't be started due to another build. I have pushed out another build > now.
(In reply to John Sanda from comment #7) > The only thing I know to do is to compare heap dumps from the different > versions. Are there nice simple steps which could be done here to do that? or is this a bit more complex than what a simple steps can accomplish?
I do not know of a simple way to do it. Heap dumps could be generated with jmap. Then I would manually and visually inspect them with something like MAT[1] [1] http://www.eclipse.org/mat/
@John, If we don't find memory leaks issues by using mat tool, do you think we can close this issue?
(In reply to Junqi Zhao from comment #10) > @John, > > If we don't find memory leaks issues by using mat tool, do you think we can > close this issue? Yes I think we can close the ticket. The leak is documented in the upstream Netty ticket, so moving to the fix version is sufficient.
Based on Comment 5 and Comment 11, please change the status to ON_QA if you want me to close it, leave the status unchanged if you want to trigger the errata process.
CASSANDRA_VERSION is 3.0.12.redhat-1, did sanity testing, no issue was found # oc rsh ${hawkular-cassandra-pod} sh-4.2$ env | grep -i ver JBOSS_IMAGE_VERSION=1.0 JAVA_VERSION=1.8.0 CASSANDRA_VERSION=3.0.12.redhat-1 Testing env: # openshift version openshift v3.5.5.28 kubernetes v1.5.2+43a9be4 etcd 3.1.0 Images from ops mirror # docker images | grep metrics metrics-cassandra v3.5 45fb2eaea055 14 hours ago 540.4 MB metrics-hawkular-metrics v3.5 1f4b0595cd64 18 hours ago 1.27 GB metrics-heapster v3.5 af5d9710feec 18 hours ago 317.9 MB
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1640