Bug 1583225

Summary: [GSS] ctime sync issues with Solr
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Matthew Robson <mrobson>
Component: glusterfsAssignee: Kotresh HR <khiremat>
Status: CLOSED ERRATA QA Contact: Bala Konda Reddy M <bmekala>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.3CC: amukherj, bkunal, bmekala, khiremat, lbailey, mrobson, nchilaka, rhs-bugs, sheggodu, storage-qa-internal, vbellur
Target Milestone: ---Keywords: Rebase
Target Release: RHGS 3.5.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-6.0-1 Doc Type: Enhancement
Doc Text:
Red Hat Gluster Storage now provides option of storing time attributes of files as an extended attribute to avoid the consistency issues in replicated volumes that occurred when using back end file system time attributes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-30 12:19:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1298724    
Bug Blocks: 1696806    

Description Matthew Robson 2018-05-28 13:44:35 UTC
Description of problem:

Varying ctime on gluster nodes creates issues with the lock file in Solr. The same issues have also been reported with customers using elastic search.

This leads to: AlreadyClosedException: Underlying file changed by an external force

2018-05-24 23:19:15.467 ERROR (coreLoadExecutor-6-thread-1) [   x:names] o.a.s.u.SolrIndexWriter Error closing IndexWriter
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2018-05-24T23:19:15.345421Z, (lock=NativeFSLock(path=/opt/solr/server/solr/mycores/names/data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-05-24T23:19:15.34573Z)) 

There was a consistent time xlator proposed here, it seems to have stalled out: https://bugzilla.redhat.com/show_bug.cgi?id=1318493

And the same issues with elastic search reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1430659

This is also similar to the tar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1058526


Version-Release number of selected component (if applicable):

RHGS 3.3.latest
CNS 3.9


How reproducible:

Happens pretty consistently using Solr with 1 server.


Steps to Reproduce:
1. Deploy Solr pod backed by a 1x3 replica gluster volume in OpenShift
2. 
3.

Actual results:

AlreadyClosedException: Underlying file changed by an external force

Expected results:

Properly handle time synchronization across the bricks making up a volume.


Additional info:

I think there are 2 options;

1) Finish off the xlator work to address this type of problem

Or

2) We need more comprehensive documentation around workloads and to document known use-cases that lead to problems which will not be addressed

2b) For OpenShift, I believe block also solves the ctime issues, so we should look to certify Solr for customer use on block PVs.

Comment 5 Amar Tumballi 2018-10-30 06:30:43 UTC
We have implemented 'ctime' based xlator work in upstream, and best thing for us is to validate the feature upstream, with a Solr based testbed, and then if everything is fixed, work towards ways of getting them downstream.

Note that the feature needed some extra fields sent on wire, and hence will need protocol version changes, so recommendation is to wait for next major RHGS release!

Marking the issue as POST, so we can start evaluating the fixes upstream!

Comment 22 errata-xmlrpc 2019-10-30 12:19:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249