Bug 1426548 - Openshift Logging ElasticSearch FSLocks when using GlusterFS storage backend
Summary: Openshift Logging ElasticSearch FSLocks when using GlusterFS storage backend
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: CNS-deployment
Version: cns-3.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Michael Adam
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1543779 1573420 1622458 OCS-3.11.1-devel-triage-done 1642792
TreeView+ depends on / blocked
 
Reported: 2017-02-24 09:22 UTC by Takeshi Larsson
Modified: 2021-12-10 14:56 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-24 12:50:07 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1318493 0 high CLOSED Introduce ctime-xlator to return correct (client-side set) ctime 2021-02-22 00:41:40 UTC

Description Takeshi Larsson 2017-02-24 09:22:50 UTC
Description of problem:
When deploying logging using 3.4.0 images on openshift 3.4.1.5 using GlusterFS volumes as soon as the elasticsearch containers have managed to initilize with it peers if it has any, attempts to create some shards.

Log: https://paste.fedoraproject.org/paste/kKClaBFb0xj86Hml-bHI0V5M1UNdIGYhyRLivL9gydE=

This attempt seems to lock the FS and then nothing works.

Elasticsearch logging deployment works for 3.3.1~ image when glusterfs volumes has the following set:
performance.write-behind off
performance.quick-read off
performance.readdir-ahead off

the same options above were set on the PV for 3.4.0 logging deployment.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Create a PV for es-logging container on GlusterFS backend
2. Set the performance configuration for the logging volume in glusterfs as specified above and proven working for 3.3.1
3. Deploy logging on cluster making to configure it to use the pv we created for it
4. Look at pod logs for elasticsearch. Try to access kibana, kibana will say that it can not contact elasticsearch cluster
5. Attempt to curl the logging es service on port 9000, it will say "Searchguard not initialized.."



Actual results:
FSLocks on write


Expected results:
To deploy logging successfully.

Additional info:

Comment 1 Takeshi Larsson 2017-02-24 09:28:58 UTC
Another person experiencing the same issue..

https://forums.rancher.com/t/glusterfs-and-elasticsearch/2293

Comment 2 Takeshi Larsson 2017-02-27 13:28:05 UTC
Hi, adding some text: Just found out that the docs in 3.4 now even specifies the following: "Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage"

So I guess there is not much we can do except run with localstorage in that case?

Comment 3 Sayan Saha 2017-02-27 19:44:37 UTC
Yes. Local storage is the way to go for now. We'll re-test RHGS capabilities once we have iSCSI support for RWO workloads in a few months time.

Comment 5 Takayoshi Kimura 2017-02-28 00:24:53 UTC
This happens because the ES compares ctime to check the lock file unchanged:

https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/NativeFSLockFactory.java

In GlusterFS, it returns ctime from one of the multiple backend bricks so the ctime varies:

https://bugzilla.redhat.com/show_bug.cgi?id=1318493

As a result, ES believes the file is changed by someone else.

Comment 6 Sayan Saha 2017-02-28 12:31:50 UTC
Note that ES recommends using local or Direct Attached Storage with SSDs for backing storage.

Comment 9 Jeff Cantrill 2017-03-10 19:46:34 UTC
Moving to target 3.6 based on #8

Comment 26 Rubin Simons 2019-01-24 12:41:15 UTC
This should not be closed-wontfix imho. GlusterFS seems to have to work on this as evident here:

1. https://github.com/gluster/glusterfs/issues/208
2. https://github.com/gluster/glusterfs/issues/517
3. https://bugzilla.redhat.com/show_bug.cgi?id=1318493


Note You need to log in before you can comment on or make changes to this bug.