Description of problem: Varying ctime on gluster nodes creates issues with the lock file in Solr. The same issues have also been reported with customers using elastic search. This leads to: AlreadyClosedException: Underlying file changed by an external force 2018-05-24 23:19:15.467 ERROR (coreLoadExecutor-6-thread-1) [ x:names] o.a.s.u.SolrIndexWriter Error closing IndexWriter org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2018-05-24T23:19:15.345421Z, (lock=NativeFSLock(path=/opt/solr/server/solr/mycores/names/data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-05-24T23:19:15.34573Z)) There was a consistent time xlator proposed here, it seems to have stalled out: https://bugzilla.redhat.com/show_bug.cgi?id=1318493 And the same issues with elastic search reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1430659 This is also similar to the tar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1058526 Version-Release number of selected component (if applicable): RHGS 3.3.latest CNS 3.9 How reproducible: Happens pretty consistently using Solr with 1 server. Steps to Reproduce: 1. Deploy Solr pod backed by a 1x3 replica gluster volume in OpenShift 2. 3. Actual results: AlreadyClosedException: Underlying file changed by an external force Expected results: Properly handle time synchronization across the bricks making up a volume. Additional info: I think there are 2 options; 1) Finish off the xlator work to address this type of problem Or 2) We need more comprehensive documentation around workloads and to document known use-cases that lead to problems which will not be addressed 2b) For OpenShift, I believe block also solves the ctime issues, so we should look to certify Solr for customer use on block PVs.
We have implemented 'ctime' based xlator work in upstream, and best thing for us is to validate the feature upstream, with a Solr based testbed, and then if everything is fixed, work towards ways of getting them downstream. Note that the feature needed some extra fields sent on wire, and hence will need protocol version changes, so recommendation is to wait for next major RHGS release! Marking the issue as POST, so we can start evaluating the fixes upstream!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249