Bug 1430659
| Summary: | [GSS]AlreadyClosedException Underlying file changed by an external force | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ruben Romero Montes <rromerom> |
| Component: | glusterfs | Assignee: | Susant Kumar Palai <spalai> |
| Status: | CLOSED NOTABUG | QA Contact: | Byreddy <bsrirama> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | ocs-3.11 | CC: | bigon, devops008, dsau, fabio.martinelli, fred.blaise, khiremat, mduasope, nbalacha, nchilaka, nickytd, roberto.di.lallo, spalai, ssaha, vbellur |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | ES-on-gluster-fusemount | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-06 07:54:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ruben Romero Montes
2017-03-09 09:03:31 UTC
Elasticsearch recommends using local storage with SSDs (use hostPath in kubernetes) for it's back-end storage. We expect to mitigate this in a later release of RHGS this year, until then RHGS should not be used for backing Elasticsearch. Closing this based on Comment 2 above. Please reopen if you've concerns. Hi
Can we please reopen the ticket?
the same error is observed with SOLR instances. What causes that behaviour on glusterFS? FileLocking mechanism at the end is translated to the underlying native FS implementation.
Thanks,
Niki
Caused by: org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2017-08-05T11:07:08Z, (lock=NativeFSLock(path=/import/data/sitescope/data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],ctime=2017-08-05T11:07:08Z))
at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:179)
at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.<init>(CompressingStoredFieldsWriter.java:108)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
at ....
We're also facing AlreadyClosedException issue with ES_6.0.0/LUCENE_7.0.1 on glusterFS. Sayan Saha, do you have any news with regards to the mentioned potential mitigation? net usable_space [9.9gb], net total_space [9.9gb], types [fuse.glusterfs] [2017-11-16T16:11:04,825][WARN ][o.e.c.a.s.ShardStateAction] [07wReUC] [payments][2] received shard failed for shard id [[payments][2]], allocation id [L7lfY1CETg-ejAXdJupw3A], primary term [0], message [shard failure, reason [lucene commit failed]], failure [AlreadyClosedException[Underlying file changed by an external force at 2017-11-16T16:05:51.128853Z, (lock=NativeFSLock(path=/usr/share/elasticsearch/data/nodes/0/indices/qi1qlBgjRZuXq1_XaQ0BMQ/2/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2017-11-16T16:05:51.128366Z))]] org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2017-11-16T16:05:51.128853Z, (lock=NativeFSLock(path=/usr/share/elasticsearch/data/nodes/0/indices/qi1qlBgjRZuXq1_XaQ0BMQ/2/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2017-11-16T16:05:51.128366Z)) at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:179) ~[lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35] at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) ~[lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35] at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:450) ~[lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35] at org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:753) ~[lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35] at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4742) ~[lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35] at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3281) ~[lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35] at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3413) ~[lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35] at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3378) ~[lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35] at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:1970) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:1963) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.index.engine.InternalEngine.syncFlush(InternalEngine.java:1291) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.index.shard.IndexShard.syncFlush(IndexShard.java:984) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.indices.flush.SyncedFlushService.performSyncedFlush(SyncedFlushService.java:423) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.indices.flush.SyncedFlushService.access$1100(SyncedFlushService.java:70) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.indices.flush.SyncedFlushService$SyncedFlushTransportHandler.messageReceived(SyncedFlushService.java:704) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.indices.flush.SyncedFlushService$SyncedFlushTransportHandler.messageReceived(SyncedFlushService.java:700) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:659) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-6.0.0.jar:6.0.0] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.0.0.jar:6.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151] Same issue here with ES 6.2.4 and GlusterFS as on OpenShift 3.7.2:
[2018-05-18T13:24:21,132][WARN ][o.e.c.a.s.ShardStateAction] [2vD3p_e] [.watcher-history-7-2018.05.18][0] received shard failed for shard id [[.watcher-history-7-2018.05.18][0]], allocation id [OFiTF5akRgqDVxQStTrdaA], primary term [0], message [shard failure, reason [already closed by tragic event on the index writer]], failure [AlreadyClosedException[Underlying file changed by an external force at 2018-05-18T12:54:54.515332Z, (lock=NativeFSLock(path=/var/lib/elasticsearch/nodes/0/indices/WtkYIOY2Sc6Xl7_Ra6uKGA/0/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-05-18T12:54:54.515789Z))]]
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2018-05-18T12:54:54.515332Z, (lock=NativeFSLock(path=/var/lib/elasticsearch/nodes/0/indices/WtkYIOY2Sc6Xl7_Ra6uKGA/0/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-05-18T12:54:54.515789Z))
at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:191) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.codecs.lucene70.Lucene70NormsConsumer.<init>(Lucene70NormsConsumer.java:45) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.codecs.lucene70.Lucene70NormsFormat.normsConsumer(Lucene70NormsFormat.java:85) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DefaultIndexingChain.writeNorms(DefaultIndexingChain.java:288) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:125) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:452) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:556) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:672) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:268) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:258) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:104) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.elasticsearch.index.engine.InternalEngine$ExternalSearcherManager.refreshIfNeeded(InternalEngine.java:300) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.engine.InternalEngine$ExternalSearcherManager.refreshIfNeeded(InternalEngine.java:275) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1402) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1384) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.shard.IndexShard.refresh(IndexShard.java:861) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:698) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.IndexService.access$400(IndexService.java:97) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:902) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.IndexService$BaseAsyncTask.run(IndexService.java:812) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
[2018-05-18T13:24:25,100][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [elasticsearch[2vD3p_e][generic][T#4]]
org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.
at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:60) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.xpack.watcher.WatcherService.loadWatches(WatcherService.java:223) ~[?:?]
at org.elasticsearch.xpack.watcher.WatcherService.reload(WatcherService.java:188) ~[?:?]
at org.elasticsearch.xpack.watcher.WatcherLifeCycleService.lambda$checkAndSetAllocationIds$9(WatcherLifeCycleService.java:263) ~[?:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) ~[elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:235) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:69) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:66) ~[elasticsearch-6.2.4.jar:6.2.4]
... 8 more
Any update for this issue. I met the same error with elasticsearch 6.0.0. Glusterfs is used to storage elasticsearch data. errors: [2018-06-27T05:48:31,798][WARN ][o.e.c.r.a.AllocationService] [sBJEWnS] failing shard [failed shard, shard [logstash-2018.06.27][4], node[sBJEWnS1R1qL7blJL9Wl0Q], [P], recovery_source[existing recovery], s[INITIALIZING], a[id=CSql7_M8QnmfnylNrN-LFQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2018-06-27T05:48:30.059Z], failed_attempts[1], delayed=false, details[failed shard on node [sBJEWnS1R1qL7blJL9Wl0Q]: shard failure, reason [already closed by tragic event on the index writer], failure AlreadyClosedException[Underlying file changed by an external force at 2018-06-27T05:47:51.755Z, (lock=NativeFSLock(path=/data/nodes/0/indices/H9SdQlUeQluw6slOsBG7AQ/4/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-06-27T05:47:23.576Z))]], allocation_status[no_valid_shard_copy]], message [shard failure, reason [lucene commit failed]], failure [AlreadyClosedException[Underlying file changed by an external force at 2018-06-27T05:47:51.755Z, (lock=NativeFSLock(path=/data/nodes/0/indices/H9SdQlUeQluw6slOsBG7AQ/4/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-06-27T05:47:23.576Z))]], markAsStale [true]] org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2018-06-27T05:47:51.755Z, (lock=NativeFSLock(path=/data/nodes/0/indices/H9SdQlUeQluw6slOsBG7AQ/4/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-06-27T05:47:23.576Z)) Same here with glusterfs 6.0 and OpenShift 3.11 |