Bug 1262027 - Unable to commit volatile index ERRORs in a clustered setup on Windows
Unable to commit volatile index ERRORs in a clustered setup on Windows
Status: CLOSED CURRENTRELEASE
Product: JBoss Enterprise Portal Platform 6
Classification: JBoss
Component: Portal (Show other bugs)
6.2.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Martin Weiler
Tomas Kyjovsky
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-10 11:54 EDT by Martin Weiler
Modified: 2015-10-27 16:27 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-27 16:27:30 EDT
Type: Support Patch
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
EXO WCM JIRA JCR-2406 None None None Never

  None (edit)
Description Martin Weiler 2015-09-10 11:54:14 EDT
Description of problem:
Running multiple clustered JPP 6.2.0 nodes on Windows 2012 is leading to the following error on the first node, after the second node has come up.

Everything is fine until instance #2 has been started up, and that this instance is able to get the index data from the coordinator (node #1):

---------------
08:49:46,353 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Setting index OFFLINE (repository/repository_pc-system)
08:49:46,369 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Retrieving index from coordinator (repository/repository_pc-system)...
08:49:46,493 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Setting index ONLINE (repository/repository_pc-system)
---------------


We can see the corresponding activity logged on node #1:

---------------
08:49:46,369 INFO  [exo.jcr.component.core.MultiIndex] (Incoming-1,172.24.44.249:56201) Setting index OFFLINE (repository/repository_pc-system)
08:49:46,400 INFO  [exo.jcr.component.core.MultiIndex] (Incoming-1,172.24.44.249:56201) Setting index ONLINE (repository/repository_pc-system)
---------------


However, a few seconds later node #1 fails to write the new IndexInfo to the disk:

---------------
08:49:56,491 ERROR [exo.jcr.component.core.MultiIndex] (MultiIndex Flush Timer) Unable to commit volatile index: java.io.IOException: Cannot delete C:\tmp\TESTCLUSTER\1\jboss-portal-6.2\standalone\data\gatein\jcr\lucene\portal-system_portal\indexes
	at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) [lucene-core-3.5.0.jar:3.5.0 1204988 - simon - 2011-11-22 14:46:51]
	at org.exoplatform.services.jcr.impl.core.query.lucene.IndexInfos$2.run(IndexInfos.java:197) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedExceptionAction(SecurityHelper.java:310) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedIOExceptionAction(SecurityHelper.java:57) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.IndexInfos.write(IndexInfos.java:166) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$9.run(MultiIndex.java:1692) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$9.run(MultiIndex.java:1662) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedExceptionAction(SecurityHelper.java:310) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedIOExceptionAction(SecurityHelper.java:57) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.flush(MultiIndex.java:1661) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.checkFlush(MultiIndex.java:2250) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.access$2100(MultiIndex.java:107) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$10.run(MultiIndex.java:1798) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at java.util.TimerThread.mainLoop(Timer.java:555) [rt.jar:1.8.0_45]
	at java.util.TimerThread.run(Timer.java:505) [rt.jar:1.8.0_45]
---------------


The node is not able to recover from this error.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Copy the server directories:
copy standalone standalone1
copy standalone standalone2
copy standalone standalone3

2. Start the H2 db:
java -cp modules\system\layers\base\com\h2database\h2\main\h2-1.3.168.redhat-4.jar  org.h2.tools.Server

3. Start the three instances (wait until each one has started up successfully):
.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone1" -D"jboss.node.name=node1" -D"jboss.socket.binding.port-offset=100" -D"gatein.jgroups.udp.bind_port=56201"

.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone2" -D"jboss.node.name=node2" -D"jboss.socket.binding.port-offset=200" -D"gatein.jgroups.udp.bind_port=56202"

.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone3" -D"jboss.node.name=node3" -D"jboss.socket.binding.port-offset=300" -D"gatein.jgroups.udp.bind_port=56203"

Actual results:
Errors when node #2 or #3 is starting up

Expected results:
No errors

Additional info:
The same scenario works fine in a Linux environment
Comment 2 Martin Weiler 2015-09-11 09:18:48 EDT
I tried to use different implementations for the Lucene FSDirectory class, but to no avail:

set "JAVA_OPTS=%JAVA_OPTS%  -Dorg.exoplatform.jcr.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory

and

set "JAVA_OPTS=%JAVA_OPTS%  -Dorg.exoplatform.jcr.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDirectory

Note You need to log in before you can comment on or make changes to this bug.