Bug 1417729

Summary: Cassandra error starting up due to "mutation checksum failure" on a commit log
Product: OpenShift Container Platform Reporter: Eric Jones <erjones>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED DUPLICATE QA Contact: Peng Li <penli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.2.1CC: aos-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-30 23:03:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Jones 2017-01-30 19:07:32 UTC
Description of problem:
Customer had been running metrics for ~5-6 months and then suddenly metrics were no longer available.

After looking into the logs they saw the following message:

ERROR <TIME> Exiting due to error while processing commit log during
initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Mutation checksum failure at 27357160 in CommitLog-5-1484574169950.log

After deleting this commitlog (they have saved it and I will provide it in another update) cassandra was able to start up normally and metrics started back up properly.

Version-Release number of selected component (if applicable):
metrics-cassandra:3.2.1
openshift v3.2.1.13-1-gc2a90e1
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

Comment 2 Matt Wringe 2017-01-30 23:03:05 UTC
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1385427

This issue is already resolved in 3.2.1, but it requires that the templates are updated, which would occur during a new metrics install.

You can either deploy metrics again which will update the Cassandra template to fix this, or manually run the following command to give Cassandra more time to process its commit files when its being shut down:

$ oc patch rc hawkular-cassandra-1 -p '{"spec":{"template":{"spec":{"terminationGracePeriodSeconds":"1800"}}}}'

If you wish to skip over the commit log failures in the future, you can also run the following command:

oc patch rc hawkular-cassandra-1 -p '{"spec":{"template":{"spec":{"containers":[{"name":"hawkular-cassandra-1", "env": [{"name": "JVM_OPTS", "value":"-Dcassandra.commitlog.ignorereplayerrors=true"}]}]}}}}'

*** This bug has been marked as a duplicate of bug 1385427 ***