Description of problem: When deploying logging using 3.4.0 images on openshift 3.4.1.5 using GlusterFS volumes as soon as the elasticsearch containers have managed to initilize with it peers if it has any, attempts to create some shards. Log: https://paste.fedoraproject.org/paste/kKClaBFb0xj86Hml-bHI0V5M1UNdIGYhyRLivL9gydE= This attempt seems to lock the FS and then nothing works. Elasticsearch logging deployment works for 3.3.1~ image when glusterfs volumes has the following set: performance.write-behind off performance.quick-read off performance.readdir-ahead off the same options above were set on the PV for 3.4.0 logging deployment. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Create a PV for es-logging container on GlusterFS backend 2. Set the performance configuration for the logging volume in glusterfs as specified above and proven working for 3.3.1 3. Deploy logging on cluster making to configure it to use the pv we created for it 4. Look at pod logs for elasticsearch. Try to access kibana, kibana will say that it can not contact elasticsearch cluster 5. Attempt to curl the logging es service on port 9000, it will say "Searchguard not initialized.." Actual results: FSLocks on write Expected results: To deploy logging successfully. Additional info:
Another person experiencing the same issue.. https://forums.rancher.com/t/glusterfs-and-elasticsearch/2293
Hi, adding some text: Just found out that the docs in 3.4 now even specifies the following: "Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage" So I guess there is not much we can do except run with localstorage in that case?
Yes. Local storage is the way to go for now. We'll re-test RHGS capabilities once we have iSCSI support for RWO workloads in a few months time.
This happens because the ES compares ctime to check the lock file unchanged: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/NativeFSLockFactory.java In GlusterFS, it returns ctime from one of the multiple backend bricks so the ctime varies: https://bugzilla.redhat.com/show_bug.cgi?id=1318493 As a result, ES believes the file is changed by someone else.
Note that ES recommends using local or Direct Attached Storage with SSDs for backing storage.
Moving to target 3.6 based on #8
This should not be closed-wontfix imho. GlusterFS seems to have to work on this as evident here: 1. https://github.com/gluster/glusterfs/issues/208 2. https://github.com/gluster/glusterfs/issues/517 3. https://bugzilla.redhat.com/show_bug.cgi?id=1318493