Bug 1426548

Summary:	Openshift Logging ElasticSearch FSLocks when using GlusterFS storage backend
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Takeshi Larsson <tlarsson>
Component:	CNS-deployment	Assignee:	Michael Adam <madam>
Status:	CLOSED WONTFIX	QA Contact:	krishnaram Karthick <kramdoss>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	cns-3.4	CC:	akhakhar, anli, annair, aos-bugs, bchilds, bkunal, dmoessne, hchiramm, jarrpa, jnordell, madam, me, myllynen, pdwyer, pprakash, rcyriac, rhs-bugs, rreddy, rtalur, ssaha, tkimura, vinug
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-24 12:50:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1543779, 1573420, 1622458, 1641915, 1642792

Description Takeshi Larsson 2017-02-24 09:22:50 UTC

Description of problem:
When deploying logging using 3.4.0 images on openshift 3.4.1.5 using GlusterFS volumes as soon as the elasticsearch containers have managed to initilize with it peers if it has any, attempts to create some shards.

Log: https://paste.fedoraproject.org/paste/kKClaBFb0xj86Hml-bHI0V5M1UNdIGYhyRLivL9gydE=

This attempt seems to lock the FS and then nothing works.

Elasticsearch logging deployment works for 3.3.1~ image when glusterfs volumes has the following set:
performance.write-behind off
performance.quick-read off
performance.readdir-ahead off

the same options above were set on the PV for 3.4.0 logging deployment.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Create a PV for es-logging container on GlusterFS backend
2. Set the performance configuration for the logging volume in glusterfs as specified above and proven working for 3.3.1
3. Deploy logging on cluster making to configure it to use the pv we created for it
4. Look at pod logs for elasticsearch. Try to access kibana, kibana will say that it can not contact elasticsearch cluster
5. Attempt to curl the logging es service on port 9000, it will say "Searchguard not initialized.."



Actual results:
FSLocks on write


Expected results:
To deploy logging successfully.

Additional info:

Comment 1 Takeshi Larsson 2017-02-24 09:28:58 UTC

Another person experiencing the same issue..

https://forums.rancher.com/t/glusterfs-and-elasticsearch/2293

Comment 2 Takeshi Larsson 2017-02-27 13:28:05 UTC

Hi, adding some text: Just found out that the docs in 3.4 now even specifies the following: "Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage"

So I guess there is not much we can do except run with localstorage in that case?

Comment 3 Sayan Saha 2017-02-27 19:44:37 UTC

Yes. Local storage is the way to go for now. We'll re-test RHGS capabilities once we have iSCSI support for RWO workloads in a few months time.

Comment 5 Takayoshi Kimura 2017-02-28 00:24:53 UTC

This happens because the ES compares ctime to check the lock file unchanged:

https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/NativeFSLockFactory.java

In GlusterFS, it returns ctime from one of the multiple backend bricks so the ctime varies:

https://bugzilla.redhat.com/show_bug.cgi?id=1318493

As a result, ES believes the file is changed by someone else.

Comment 6 Sayan Saha 2017-02-28 12:31:50 UTC

Note that ES recommends using local or Direct Attached Storage with SSDs for backing storage.

Comment 9 Jeff Cantrill 2017-03-10 19:46:34 UTC

Moving to target 3.6 based on #8

Comment 26 Rubin Simons 2019-01-24 12:41:15 UTC

This should not be closed-wontfix imho. GlusterFS seems to have to work on this as evident here:

1. https://github.com/gluster/glusterfs/issues/208
2. https://github.com/gluster/glusterfs/issues/517
3. https://bugzilla.redhat.com/show_bug.cgi?id=1318493