Bug 1262027

Summary: Unable to commit volatile index ERRORs in a clustered setup on Windows
Product: [JBoss] JBoss Enterprise Portal Platform 6 Reporter: Martin Weiler <mweiler>
Component: PortalAssignee: Martin Weiler <mweiler>
Status: CLOSED CURRENTRELEASE QA Contact: Tomas Kyjovsky <tkyjovsk>
Severity: high Docs Contact:
Priority: high    
Version: 6.2.0CC: cobrien, epp-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-27 20:27:30 UTC Type: Support Patch
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Weiler 2015-09-10 15:54:14 UTC
Description of problem:
Running multiple clustered JPP 6.2.0 nodes on Windows 2012 is leading to the following error on the first node, after the second node has come up.

Everything is fine until instance #2 has been started up, and that this instance is able to get the index data from the coordinator (node #1):

---------------
08:49:46,353 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Setting index OFFLINE (repository/repository_pc-system)
08:49:46,369 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Retrieving index from coordinator (repository/repository_pc-system)...
08:49:46,493 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Setting index ONLINE (repository/repository_pc-system)
---------------


We can see the corresponding activity logged on node #1:

---------------
08:49:46,369 INFO  [exo.jcr.component.core.MultiIndex] (Incoming-1,172.24.44.249:56201) Setting index OFFLINE (repository/repository_pc-system)
08:49:46,400 INFO  [exo.jcr.component.core.MultiIndex] (Incoming-1,172.24.44.249:56201) Setting index ONLINE (repository/repository_pc-system)
---------------


However, a few seconds later node #1 fails to write the new IndexInfo to the disk:

---------------
08:49:56,491 ERROR [exo.jcr.component.core.MultiIndex] (MultiIndex Flush Timer) Unable to commit volatile index: java.io.IOException: Cannot delete C:\tmp\TESTCLUSTER\1\jboss-portal-6.2\standalone\data\gatein\jcr\lucene\portal-system_portal\indexes
	at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) [lucene-core-3.5.0.jar:3.5.0 1204988 - simon - 2011-11-22 14:46:51]
	at org.exoplatform.services.jcr.impl.core.query.lucene.IndexInfos$2.run(IndexInfos.java:197) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedExceptionAction(SecurityHelper.java:310) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedIOExceptionAction(SecurityHelper.java:57) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.IndexInfos.write(IndexInfos.java:166) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$9.run(MultiIndex.java:1692) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$9.run(MultiIndex.java:1662) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedExceptionAction(SecurityHelper.java:310) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedIOExceptionAction(SecurityHelper.java:57) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.flush(MultiIndex.java:1661) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.checkFlush(MultiIndex.java:2250) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.access$2100(MultiIndex.java:107) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$10.run(MultiIndex.java:1798) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at java.util.TimerThread.mainLoop(Timer.java:555) [rt.jar:1.8.0_45]
	at java.util.TimerThread.run(Timer.java:505) [rt.jar:1.8.0_45]
---------------


The node is not able to recover from this error.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Copy the server directories:
copy standalone standalone1
copy standalone standalone2
copy standalone standalone3

2. Start the H2 db:
java -cp modules\system\layers\base\com\h2database\h2\main\h2-1.3.168.redhat-4.jar  org.h2.tools.Server

3. Start the three instances (wait until each one has started up successfully):
.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone1" -D"jboss.node.name=node1" -D"jboss.socket.binding.port-offset=100" -D"gatein.jgroups.udp.bind_port=56201"

.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone2" -D"jboss.node.name=node2" -D"jboss.socket.binding.port-offset=200" -D"gatein.jgroups.udp.bind_port=56202"

.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone3" -D"jboss.node.name=node3" -D"jboss.socket.binding.port-offset=300" -D"gatein.jgroups.udp.bind_port=56203"

Actual results:
Errors when node #2 or #3 is starting up

Expected results:
No errors

Additional info:
The same scenario works fine in a Linux environment

Comment 2 Martin Weiler 2015-09-11 13:18:48 UTC
I tried to use different implementations for the Lucene FSDirectory class, but to no avail:

set "JAVA_OPTS=%JAVA_OPTS%  -Dorg.exoplatform.jcr.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory

and

set "JAVA_OPTS=%JAVA_OPTS%  -Dorg.exoplatform.jcr.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDirectory