1262027 – Unable to commit volatile index ERRORs in a clustered setup on Windows

Bug 1262027 - Unable to commit volatile index ERRORs in a clustered setup on Windows

Summary: Unable to commit volatile index ERRORs in a clustered setup on Windows

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Enterprise Portal Platform 6
Classification:	JBoss
Component:	Portal
Sub Component:
Version:	6.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Martin Weiler
QA Contact:	Tomas Kyjovsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-10 15:54 UTC by Martin Weiler
Modified:	2019-08-15 05:23 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-10-27 20:27:30 UTC
Type:	Support Patch
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
EXO WCM JIRA	JCR-2406	0	None	None	None	Never

Description Martin Weiler 2015-09-10 15:54:14 UTC

Description of problem:
Running multiple clustered JPP 6.2.0 nodes on Windows 2012 is leading to the following error on the first node, after the second node has come up.

Everything is fine until instance #2 has been started up, and that this instance is able to get the index data from the coordinator (node #1):

---------------
08:49:46,353 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Setting index OFFLINE (repository/repository_pc-system)
08:49:46,369 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Retrieving index from coordinator (repository/repository_pc-system)...
08:49:46,493 INFO  [exo.jcr.component.core.MultiIndex] (MSC service thread 1-4) Setting index ONLINE (repository/repository_pc-system)
---------------


We can see the corresponding activity logged on node #1:

---------------
08:49:46,369 INFO  [exo.jcr.component.core.MultiIndex] (Incoming-1,172.24.44.249:56201) Setting index OFFLINE (repository/repository_pc-system)
08:49:46,400 INFO  [exo.jcr.component.core.MultiIndex] (Incoming-1,172.24.44.249:56201) Setting index ONLINE (repository/repository_pc-system)
---------------


However, a few seconds later node #1 fails to write the new IndexInfo to the disk:

---------------
08:49:56,491 ERROR [exo.jcr.component.core.MultiIndex] (MultiIndex Flush Timer) Unable to commit volatile index: java.io.IOException: Cannot delete C:\tmp\TESTCLUSTER\1\jboss-portal-6.2\standalone\data\gatein\jcr\lucene\portal-system_portal\indexes
	at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) [lucene-core-3.5.0.jar:3.5.0 1204988 - simon - 2011-11-22 14:46:51]
	at org.exoplatform.services.jcr.impl.core.query.lucene.IndexInfos$2.run(IndexInfos.java:197) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedExceptionAction(SecurityHelper.java:310) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedIOExceptionAction(SecurityHelper.java:57) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.IndexInfos.write(IndexInfos.java:166) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$9.run(MultiIndex.java:1692) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$9.run(MultiIndex.java:1662) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedExceptionAction(SecurityHelper.java:310) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.commons.utils.SecurityHelper.doPrivilegedIOExceptionAction(SecurityHelper.java:57) [exo.kernel.commons-2.4.11-GA-redhat-1.jar:2.4.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.flush(MultiIndex.java:1661) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.checkFlush(MultiIndex.java:2250) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex.access$2100(MultiIndex.java:107) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at org.exoplatform.services.jcr.impl.core.query.lucene.MultiIndex$10.run(MultiIndex.java:1798) [exo.jcr.component.core-1.15.11-GA-redhat-1.jar:1.15.11-GA-redhat-1]
	at java.util.TimerThread.mainLoop(Timer.java:555) [rt.jar:1.8.0_45]
	at java.util.TimerThread.run(Timer.java:505) [rt.jar:1.8.0_45]
---------------


The node is not able to recover from this error.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Copy the server directories:
copy standalone standalone1
copy standalone standalone2
copy standalone standalone3

2. Start the H2 db:
java -cp modules\system\layers\base\com\h2database\h2\main\h2-1.3.168.redhat-4.jar  org.h2.tools.Server

3. Start the three instances (wait until each one has started up successfully):
.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone1" -D"jboss.node.name=node1" -D"jboss.socket.binding.port-offset=100" -D"gatein.jgroups.udp.bind_port=56201"

.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone2" -D"jboss.node.name=node2" -D"jboss.socket.binding.port-offset=200" -D"gatein.jgroups.udp.bind_port=56202"

.\bin\standalone.bat -c standalone-ha.xml -b 127.0.0.1 -u 230.0.0.4 -D"jboss.server.base.dir=standalone3" -D"jboss.node.name=node3" -D"jboss.socket.binding.port-offset=300" -D"gatein.jgroups.udp.bind_port=56203"

Actual results:
Errors when node #2 or #3 is starting up

Expected results:
No errors

Additional info:
The same scenario works fine in a Linux environment

Comment 2 Martin Weiler 2015-09-11 13:18:48 UTC

I tried to use different implementations for the Lucene FSDirectory class, but to no avail:

set "JAVA_OPTS=%JAVA_OPTS%  -Dorg.exoplatform.jcr.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory

and

set "JAVA_OPTS=%JAVA_OPTS%  -Dorg.exoplatform.jcr.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDirectory

Comment 8 Christopher O'Brien 2015-10-27 20:27:30 UTC

https://access.redhat.com/jbossnetwork/restricted/softwareDetail.html?softwareId=40551&product=jbportal&version=6.2.0&downloadType=patches

Note You need to log in before you can comment on or make changes to this bug.