1552257 – [GSS] Logging Corruption After OCP 3.7 Upgrade

Bug 1552257 - [GSS] Logging Corruption After OCP 3.7 Upgrade

Summary: [GSS] Logging Corruption After OCP 3.7 Upgrade

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-block
Sub Component:
Version:	cns-3.6
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Prasanna Kumar Kalever
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1573420 1622458 OCS-3.11.1-devel-triage-done 1642792
TreeView+	depends on / blocked

Reported:	2018-03-06 20:27 UTC by Matthew Robson
Modified:	2021-09-09 13:20 UTC (History)
CC List:	27 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-23 12:51:03 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Matthew Robson 2018-03-06 20:27:39 UTC

Description of problem:

Upgraded Infra nodes from 3.6 to 3.7 which resulted in 2 of 3 ES nodes in a crash loop throwing a CorruptStateException:

logging-es-data-master-dd2iwji6-2-pxws2   0/1       Running            0          23m       192.168.1.176   server171
logging-es-data-master-n3hrld40-2-lm5x5   0/1       CrashLoopBackOff   2          1m        192.168.1.186   server180
logging-es-data-master-qunqsokb-2-k5d5p   0/1       CrashLoopBackOff   2          1m        192.168.1.199   server170
[root@server100 ~]# oc logs logging-es-data-master-qunqsokb-2-k5d5p
[2018-03-01 23:20:30,904][INFO ][container.run            ] Begin Elasticsearch startup script
[2018-03-01 23:20:30,913][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2018-03-01 23:20:30,915][INFO ][container.run            ] Inspecting the maximum RAM available...
[2018-03-01 23:20:30,919][INFO ][container.run            ] ES_HEAP_SIZE: '16384m'
[2018-03-01 23:20:30,921][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2018-03-01 23:20:30,923][INFO ][container.run            ] Checking if Elasticsearch is ready on https://localhost:9200
Exception in thread "main" ElasticsearchException[failed to read [id:5, legacy:false, file:/elasticsearch/persistent/logging-es/data/logging-es/nodes/0/_state/global-5.st]]; nested: IOException[failed to read [id:5, legacy:false, file:/elasticsearch/persistent/logging-es/data/logging-es/nodes/0/_state/global-5.st]]; nested: CorruptStateException[codec footer mismatch (file truncated?): actual footer=1869505397 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path="/elasticsearch/persistent/logging-es/data/logging-es/nodes/0/_state/global-5.st")))];
Likely root cause: org.elasticsearch.gateway.CorruptStateException: codec footer mismatch (file truncated?): actual footer=1869505397 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path="/elasticsearch/persistent/logging-es/data/logging-es/nodes/0/_state/global-5.st")))
        at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:418)
        at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:330)
        at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:451)
        at org.elasticsearch.gateway.MetaDataStateFormat.read(MetaDataStateFormat.java:177)
        at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:299)
        at org.elasticsearch.gateway.MetaStateService.loadGlobalState(MetaStateService.java:119)
        at org.elasticsearch.gateway.MetaStateService.loadFullState(MetaStateService.java:87)
        at org.elasticsearch.gateway.GatewayMetaState.loadMetaState(GatewayMetaState.java:99)
        at org.elasticsearch.gateway.GatewayMetaState.pre20Upgrade(GatewayMetaState.java:225)
        at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:87)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at <<<guice>>>
        at org.elasticsearch.node.Node.<init>(Node.java:213)
        at org.elasticsearch.node.Node.<init>(Node.java:140)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)

Upgrading logging PODs to the 3.7 images resulted in the same issues


Version-Release number of selected component (if applicable):

3.7.14


How reproducible:
One time

Actual results:
Corrupted logging indexes.

Expected results:


Additional info:

Comment 2 Jeff Cantrill 2018-03-06 20:41:13 UTC

Can you please provide additional information about the persistent volumes you are using? Also please consider running  https://raw.githubusercontent.com/openshift/origin-aggregated-logging/master/hack/logging-dump.sh  to provide additional information about the environment

Comment 5 Jeff Cantrill 2018-03-06 21:33:54 UTC

Possible duplicate: https://bugzilla.redhat.com/show_bug.cgi?id=1379568

Comment 7 Peter Portante 2018-03-07 03:19:15 UTC

Shhouldn't this be owned by the GlusterFS team to determine why GlusterFS block storage is showing corruption when we have not seen this error on AWS EBS storage, or local disks?

Comment 9 Steven Barre 2018-03-07 19:57:23 UTC

Created attachment 1405523 [details]
logging-dump.sh output

Note You need to log in before you can comment on or make changes to this bug.

akhakhar
annair
aos-bugs
bgoyal
bkunal
bugs
ccustine
hchiramm
jarrpa
kramdoss
madam
mrobson
nbhatt
pkarampu
pportant
pprakash
prasanna.kalever
rhs-bugs
rmeggins
rreddy
rtalur
sankarshan
steven.barre
tkatarki
vbellur
vinug
xiubli