Bug 1522793

Summary:	Hawkular Cassandra SSTables corruptions
Product:	OpenShift Container Platform	Reporter:	Vladislav Walek <vwalek>
Component:	Hawkular	Assignee:	John Sanda <jsanda>
Status:	CLOSED DEFERRED	QA Contact:	Junqi Zhao <juzhao>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	3.5.0	CC:	aos-bugs, erich, javier.ramirez, jcantril, jsanda, vwalek
Target Milestone:	---
Target Release:	3.5.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-06-04 13:59:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1531790, 1531795, 1595317
Bug Blocks:

Description Vladislav Walek 2017-12-06 13:18:05 UTC

Description of problem:

Issue with hawkular cassandra - the cassandra is failing due the error below:

 "Failed to perform operation due to an error: Cassandra failure during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded, 1 failed)", and we think that it could be related to SSTables corruptions.


When running the scrub tool the error message is showing the error:

/opt/apache-cassandra/bin/nodetool scrub

WARN  [CompactionExecutor:4] 2017-12-05 04:15:32,264 OutputHandler.java:52 - Row starting at position 1538453 is unreadable; skipping to next
WARN  [CompactionExecutor:4] 2017-12-05 04:15:32,265 OutputHandler.java:57 - Error reading row (stacktrace follows):
java.io.IOError: java.io.IOException: Unable to read row key from data file


The possible issue was already reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=1404643


Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.5

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 10 John Sanda 2018-02-16 22:36:58 UTC

I think the root cause for this is bug 1531790. Updating status to POST since the fix is addressed by bug 1531790.