Bug 1262027 - Unable to commit volatile index ERRORs in a clustered setup on Windows
Summary: Unable to commit volatile index ERRORs in a clustered setup on Windows
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Enterprise Portal Platform 6
Classification: JBoss
Component: Portal
Version: 6.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Martin Weiler
QA Contact: Tomas Kyjovsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-10 15:54 UTC by Martin Weiler
Modified: 2019-08-15 05:23 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-27 20:27:30 UTC
Type: Support Patch
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
EXO WCM JIRA JCR-2406 0 None None None Never

Description Martin Weiler 2015-09-10 15:54:14 UTC
Description of problem:
Running multiple clustered JPP 6.2.0 nodes on Windows 2012 is leading to the following error on the first node, after the second node has come up.

Everything is fine until instance #2 has been started up, and that this instance is able to get the index data from the coordinator (node #1):

---------------
08:49:46,353 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Setting index OFFLINE (repository/repository_pc-system)
08:49:46,369 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Retrieving index from coordinator (repository/repository_pc-system)...
08:49:46,493 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Setting index ONLINE (repository/repository_pc-system)
---------------


We can see the corresponding activity logged on node #1:

---------------
08:49:46,369 INFO  [exo.jcr.component.core.MultiIndex] (Incoming-1,172.24.44.249:56201) Setting index OFFLINE (repository/repository_pc-system)
08:49:46,400 INFO  [exo.jcr.component.core.MultiIndex] (Incoming-1,172.24.44.249:56201) Setting index ONLINE (repository/repository_pc-system)
---------------


However, a few seconds later node #1 fails to write the new IndexInfo to the disk:

---------------
08:49:56,491 ERROR [exo.jcr.component.core.MultiIndex] (MultiIndex Flush Timer) Unable to commit volatile index: java.io.IOException: Cannot delete C:\tmp\TESTCLUSTER\1\jboss-portal-6.2\standalone\data\gatein\jcr\lucene\portal-system_portal\indexes
	at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) [lucene-core-3.5.0.jar:3.5.0 1204988 - simon - 2011-11-22 14:46:51]
	at org.exoplatform.services.jcr.impl.core.query.lucene.IndexInfos$2.run(IndexInfos.java:197) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedExceptionAction(SecurityHelper.java:310) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedIOExceptionAction(SecurityHelper.java:57) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.IndexInfos.write(IndexInfos.java:166) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$9.run(MultiIndex.java:1692) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$9.run(MultiIndex.java:1662) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedExceptionAction(SecurityHelper.java:310) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedIOExceptionAction(SecurityHelper.java:57) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.flush(MultiIndex.java:1661) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.checkFlush(MultiIndex.java:2250) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.access$2100(MultiIndex.java:107) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$10.run(MultiIndex.java:1798) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at java.util.TimerThread.mainLoop(Timer.java:555) [rt.jar:1.8.0_45]
	at java.util.TimerThread.run(Timer.java:505) [rt.jar:1.8.0_45]
---------------


The node is not able to recover from this error.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Copy the server directories:
copy standalone standalone1
copy standalone standalone2
copy standalone standalone3

2. Start the H2 db:
java -cp modules\system\layers\base\com\h2database\h2\main\h2-1.3.168.redhat-4.jar  org.h2.tools.Server

3. Start the three instances (wait until each one has started up successfully):
.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone1" -D"jboss.node.name=node1" -D"jboss.socket.binding.port-offset=100" -D"gatein.jgroups.udp.bind_port=56201"

.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone2" -D"jboss.node.name=node2" -D"jboss.socket.binding.port-offset=200" -D"gatein.jgroups.udp.bind_port=56202"

.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone3" -D"jboss.node.name=node3" -D"jboss.socket.binding.port-offset=300" -D"gatein.jgroups.udp.bind_port=56203"

Actual results:
Errors when node #2 or #3 is starting up

Expected results:
No errors

Additional info:
The same scenario works fine in a Linux environment

Comment 2 Martin Weiler 2015-09-11 13:18:48 UTC
I tried to use different implementations for the Lucene FSDirectory class, but to no avail:

set "JAVA_OPTS=%JAVA_OPTS%  -Dorg.exoplatform.jcr.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory

and

set "JAVA_OPTS=%JAVA_OPTS%  -Dorg.exoplatform.jcr.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDirectory


Note You need to log in before you can comment on or make changes to this bug.