1583225 – [GSS] ctime sync issues with Solr

Bug 1583225 - [GSS] ctime sync issues with Solr

Summary: [GSS] ctime sync issues with Solr

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.3
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.0
Assignee:	Kotresh HR
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:	1298724
Blocks:	1696806
TreeView+	depends on / blocked

Reported:	2018-05-28 13:44 UTC by Matthew Robson
Modified:	2021-09-09 14:16 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-6.0-1
Doc Type:	Enhancement
Doc Text:	Red Hat Gluster Storage now provides option of storing time attributes of files as an extended attribute to avoid the consistency issues in replicated volumes that occurred when using back end file system time attributes.
Clone Of:
Environment:
Last Closed:	2019-10-30 12:19:38 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:3249	0	None	None	None	2019-10-30 12:20:03 UTC

Description Matthew Robson 2018-05-28 13:44:35 UTC

Description of problem:

Varying ctime on gluster nodes creates issues with the lock file in Solr. The same issues have also been reported with customers using elastic search.

This leads to: AlreadyClosedException: Underlying file changed by an external force

2018-05-24 23:19:15.467 ERROR (coreLoadExecutor-6-thread-1) [   x:names] o.a.s.u.SolrIndexWriter Error closing IndexWriter
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2018-05-24T23:19:15.345421Z, (lock=NativeFSLock(path=/opt/solr/server/solr/mycores/names/data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-05-24T23:19:15.34573Z)) 

There was a consistent time xlator proposed here, it seems to have stalled out: https://bugzilla.redhat.com/show_bug.cgi?id=1318493

And the same issues with elastic search reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1430659

This is also similar to the tar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1058526


Version-Release number of selected component (if applicable):

RHGS 3.3.latest
CNS 3.9


How reproducible:

Happens pretty consistently using Solr with 1 server.


Steps to Reproduce:
1. Deploy Solr pod backed by a 1x3 replica gluster volume in OpenShift
2. 
3.

Actual results:

AlreadyClosedException: Underlying file changed by an external force

Expected results:

Properly handle time synchronization across the bricks making up a volume.


Additional info:

I think there are 2 options;

1) Finish off the xlator work to address this type of problem

Or

2) We need more comprehensive documentation around workloads and to document known use-cases that lead to problems which will not be addressed

2b) For OpenShift, I believe block also solves the ctime issues, so we should look to certify Solr for customer use on block PVs.

Comment 5 Amar Tumballi 2018-10-30 06:30:43 UTC

We have implemented 'ctime' based xlator work in upstream, and best thing for us is to validate the feature upstream, with a Solr based testbed, and then if everything is fixed, work towards ways of getting them downstream.

Note that the feature needed some extra fields sent on wire, and hence will need protocol version changes, so recommendation is to wait for next major RHGS release!

Marking the issue as POST, so we can start evaluating the fixes upstream!

Comment 22 errata-xmlrpc 2019-10-30 12:19:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249

Note You need to log in before you can comment on or make changes to this bug.