Description of problem: Customer had been running metrics for ~5-6 months and then suddenly metrics were no longer available. After looking into the logs they saw the following message: ERROR <TIME> Exiting due to error while processing commit log during initialization. org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Mutation checksum failure at 27357160 in CommitLog-5-1484574169950.log After deleting this commitlog (they have saved it and I will provide it in another update) cassandra was able to start up normally and metrics started back up properly. Version-Release number of selected component (if applicable): metrics-cassandra:3.2.1 openshift v3.2.1.13-1-gc2a90e1 kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1385427 This issue is already resolved in 3.2.1, but it requires that the templates are updated, which would occur during a new metrics install. You can either deploy metrics again which will update the Cassandra template to fix this, or manually run the following command to give Cassandra more time to process its commit files when its being shut down: $ oc patch rc hawkular-cassandra-1 -p '{"spec":{"template":{"spec":{"terminationGracePeriodSeconds":"1800"}}}}' If you wish to skip over the commit log failures in the future, you can also run the following command: oc patch rc hawkular-cassandra-1 -p '{"spec":{"template":{"spec":{"containers":[{"name":"hawkular-cassandra-1", "env": [{"name": "JVM_OPTS", "value":"-Dcassandra.commitlog.ignorereplayerrors=true"}]}]}}}}' *** This bug has been marked as a duplicate of bug 1385427 ***