1250571 – Remote query - Unable to reopen IndexReader exceptions in clustered environment

Bug 1250571 - Remote query - Unable to reopen IndexReader exceptions in clustered environment

Summary: Remote query - Unable to reopen IndexReader exceptions in clustered environment

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	JBoss Data Grid 6
Classification:	JBoss
Component:	Infinispan
Sub Component:
Version:	6.5.0,6.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Tristan Tarrant
QA Contact:	Martin Gencur
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-05 13:26 UTC by Matej Čimbora
Modified:	2015-10-05 16:55 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-08-11 13:45:04 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Description Matej Čimbora 2015-08-05 13:26:05 UTC

Reproduced with jdg-6.5 & jdg-6.6 branches.

When performing queries on cache with index stored within JDG & using multiple clustered server instances, the following exception messages appear in client log:

WARN: ISPN004005: Error received from the server: org.hibernate.search.SearchException: Unable to reopen IndexReader
Exception in thread "Thread-6" org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for message id[513] returned server error (status=0x85): org.hibernate.search.SearchException: Unable to reopen IndexReader
	at org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:336)
	at org.infinispan.client.hotrod.impl.protocol.Codec20.readPartialHeader(Codec20.java:126)
	at org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:112)
	at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
	at org.infinispan.client.hotrod.impl.operations.QueryOperation.executeOperation(QueryOperation.java:57)
	at org.infinispan.client.hotrod.impl.operations.QueryOperation.executeOperation(QueryOperation.java:24)
	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:52)
	at org.infinispan.client.hotrod.impl.query.RemoteQuery.executeQuery(RemoteQuery.java:72)
	at org.infinispan.client.hotrod.impl.query.RemoteQuery.list(RemoteQuery.java:62)
	at org.jboss.as.quickstarts.datagrid.hotrod.query.ReproducerManager$QueryThread.run(ReproducerManager.java:160)

The issue does not seem to occur when using single server instance.

Created reproducer based on remote-query quickstart, available at https://github.com/mcimbora/jboss-jdg-quickstarts/commits/jdg-6.5.x.

Steps to reproduce:
1. Build remote-query quickstart using aforementioned reproducer
2. Add the following configuration snippet into default "clustered.xml" configuration of JDG server

<distributed-cache name="dist_lucene" mode="SYNC" start="EAGER">
                    <locking acquire-timeout="30000"/>
                    <indexing index="LOCAL">
                        <property name="default.directory_provider">infinispan</property>
                        <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager
                        </property>
                        <!--<property name="default.exclusive_index_use">true</property>-->
                        <property name="default.metadata_cachename">lucene_metadata_repl"</property>
                        <property name="default.data_cachename">lucene_data_dist</property>
                        <property name="default.locking_cachename">lucene_locking_repl</property>
                        <property name="hibernate.search.lucene_version">LUCENE_36</property>
                    </indexing>
                </distributed-cache>
                <replicated-cache name="lucene_metadata_repl" mode="SYNC">
                    <locking acquire-timeout="30000"/>
                </replicated-cache>
                <replicated-cache name="lucene_locking_repl" mode="SYNC">
                    <locking acquire-timeout="30000"/>
                </replicated-cache>
                <distributed-cache name="lucene_data_dist" mode="SYNC">
                    <locking acquire-timeout="30000"/>
                </distributed-cache>
3. Start 2 JDG server instances using bin/clustered.sh script.
4. Start ReproducerManager

Comment 3 Sanne Grinovero 2015-08-11 12:18:57 UTC

This exception is only the one from the client. We'd really need to know what happened on the server side.

Sorry it doesn't ring a bell, the only wild guess I could make is that the index got corrupted; which in turn could be caused by Infinispan having lost some data, like what you'd have when multiple nodes shut down.

In such cases you can have it rebuild the index using the MassIndexer, invoked via JMX:
 - https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/6.5/html/Administration_and_Configuration_Guide/MassIndexer.html

I had a quick look at the reproducer; I'm wondering how do you know if it's reproducing the same problem?

Comment 4 Matej Čimbora 2015-08-11 12:27:58 UTC

Thanks Sanne. Originally I was hitting the issue in RadarGun tests and I was trying to simulate my test reusing the quickstart (the same exception). Not sure whether that answers your question, if not, could you elaborate? I'll try to get some server side logs & update the issue.

Comment 5 Matej Čimbora 2015-08-11 13:45:04 UTC

OK, problem solved. Server side log indicated problem with metadata cache - double checking server configuration revealed that value of default.metadata_cachename property contained additional quotation marks.

15:10:43,374 DEBUG [org.infinispan.server.hotrod.CacheDecodeContext] (HotRodServerWorker-2) Exception caught: io.netty.handler.codec.DecoderException: org.infinispan.server.hotrod.HotRodException: org.hibernate.search.SearchException: Unable to reopen IndexReader
	at io.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:417) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:141) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at org.infinispan.server.hotrod.HotRodDecoder.org$infinispan$server$core$transport$StatsChannelHandler$$super$channelRead(HotRodDecoder.scala:31) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.core.transport.StatsChannelHandler$class.channelRead(StatsChannelHandler.scala:32) [infinispan-server-core-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.hotrod.HotRodDecoder.channelRead(HotRodDecoder.scala:31) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:341) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:327) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:116) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:494) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:461) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_25]
Caused by: org.infinispan.server.hotrod.HotRodException: org.hibernate.search.SearchException: Unable to reopen IndexReader
	at org.infinispan.server.hotrod.CacheDecodeContext.createServerException(CacheDecodeContext.scala:98) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.hotrod.HotRodDecoder.decode(HotRodDecoder.scala:55) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at io.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:362) [netty-all-4.0.18.Final-redhat-1.jar:4.0.18.Final-redhat-1]
	... 14 more
Caused by: org.hibernate.search.SearchException: Unable to reopen IndexReader
	at org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet(SharingBufferReaderProvider.java:254)
	at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.openIndexReader(SharingBufferReaderProvider.java:85)
	at org.hibernate.search.reader.impl.MultiReaderFactory.openReader(MultiReaderFactory.java:53)
	at org.hibernate.search.query.engine.impl.HSQueryImpl.buildSearcher(HSQueryImpl.java:635)
	at org.hibernate.search.query.engine.impl.HSQueryImpl.buildSearcher(HSQueryImpl.java:535)
	at org.hibernate.search.query.engine.impl.HSQueryImpl.queryEntityInfos(HSQueryImpl.java:265)
	at org.infinispan.query.impl.CacheQueryImpl.list(CacheQueryImpl.java:161)
	at org.infinispan.query.remote.QueryFacadeImpl.executeIndexedQuery(QueryFacadeImpl.java:181) [infinispan-remote-query-server-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.query.remote.QueryFacadeImpl.query(QueryFacadeImpl.java:85) [infinispan-remote-query-server-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.hotrod.Decoder2x$.customReadKey(Decoder2x.scala:350) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.hotrod.HotRodDecoder.customDecodeKey(HotRodDecoder.scala:196) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.hotrod.HotRodDecoder.org$infinispan$server$hotrod$HotRodDecoder$$decodeKey(HotRodDecoder.scala:106) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.hotrod.HotRodDecoder$$anonfun$decode$1.apply$mcV$sp(HotRodDecoder.scala:48) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.hotrod.HotRodDecoder.wrapSecurity(HotRodDecoder.scala:208) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.server.hotrod.HotRodDecoder.decode(HotRodDecoder.scala:45) [infinispan-server-hotrod-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	... 15 more
Caused by: java.io.FileNotFoundException: Error loading metadata for index file: segments_dx|M|org.infinispan.query.remote.indexing.ProtobufValueWrapper
	at org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:142) [infinispan-lucene-directory-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.infinispan.lucene.impl.DirectoryLuceneV3.openInput(DirectoryLuceneV3.java:133) [infinispan-lucene-directory-6.4.0.DR1-redhat-1.jar:6.4.0.DR1-redhat-1]
	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:265)
	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:363)
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:709)
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:554)
	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
	at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:483)
	at org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:891)
	at org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:455)
	at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:434)
	at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
	at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
	at org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet(SharingBufferReaderProvider.java:251)
	... 29 more

Comment 6 Alan Field 2015-08-11 15:05:24 UTC

Not sure this is not a bug! Isn't this invalid XML?:

<property name="default.metadata_cachename">lucene_metadata_repl"</property>

And shouldn't an error get thrown when the configuration is read?

Comment 7 Matej Čimbora 2015-08-11 15:14:40 UTC

Well '"' is a valid XML character, unless used within an attribute.

Comment 8 Alan Field 2015-08-11 15:27:36 UTC

Sure Matej, but a single '"' character in a property doesn't seem correct. A single ''' would need to be escaped as an apostrophe to be valid. I'm just wondering if this should fail fast with a better error message, because it's not the easiest thing to figure out from the exception you got!

Comment 9 Matej Čimbora 2015-08-12 11:23:11 UTC

Alan, that makes sense. I did some more investigation - if the cache (lucene_metadata_repl") is not among one of pre-configured caches, each of the servers creates it as 'local' (this explains why it happens only in clustered environment - metadata cannot be shared across the cluster). Sanne, is it technically possible to check whether caches customized by properties (e.g. default.metadata_cachename) are defined in configuration & fail fast if not? Does it make sense to you?

Comment 10 Sanne Grinovero 2015-10-05 16:55:43 UTC

Hi Matej,
sorry for the delay.
I agree it would be nice to do something to improve on this, but technically having a non-defined cache could be valid as the Cache would inherit a (potentially valid) configuration from the default cache.

In the Infinispan team meeting of last week we discussed to remove the default cache (in a future major version), so in that case this would have failed as you'd need an explicit cache configuration to be defined for each cache usage. That would help, would you agree?

Note You need to log in before you can comment on or make changes to this bug.