Bug 1390050

Summary: Elasticsearch get CorruptIndexException errors when running with GlusterFS persistent storage
Product: [Community] GlusterFS Reporter: Raghavendra G <rgowdapp>
Component: unclassifiedAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: mainlineCC: aos-bugs, bchilds, bkozdemb, bmchugh, bugs, csaba, jnordell, kcao22003, kdhananj, kramdoss, nbalacha, pkarampu, rgowdapp, rhs-bugs, rwheeler, storage-qa-internal, vbellur
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: GLUSTERFS_METADATA_INCONSISTENCY
Fixed In Version: glusterfs-5.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1379568 Environment:
Last Closed: 2018-10-23 15:06:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1286970    

Comment 1 Worker Ant 2016-10-31 05:33:54 UTC
REVIEW: http://review.gluster.org/15757 (performance/write-behind: Add more fops for checking dependency with cached writes) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 2 Worker Ant 2016-10-31 11:16:42 UTC
REVIEW: http://review.gluster.org/15757 (performance/write-behind: Add more fops for checking dependency with cached writes) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 3 Pranith Kumar K 2016-10-31 15:46:46 UTC
With this fix, there are no corruptions with just write-behind

Comment 4 Worker Ant 2016-11-01 06:30:09 UTC
REVIEW: http://review.gluster.org/15759 (performance/quick-read: Use generation numbers to avoid updating the cache with stale data) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 5 Worker Ant 2016-11-01 11:15:11 UTC
REVIEW: http://review.gluster.org/15759 (performance/quick-read: Use generation numbers to avoid updating the cache with stale data) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 6 Worker Ant 2016-12-12 13:05:49 UTC
COMMIT: http://review.gluster.org/15757 committed in master by Raghavendra G (rgowdapp) 
------
commit 9c769c6ee1d125b6bab513073767b628b60abeeb
Author: Raghavendra G <rgowdapp>
Date:   Mon Oct 31 10:49:09 2016 +0530

    performance/write-behind: Add more fops for checking dependency with
    cached writes
    
    Fops like readdirp, link, fallocate, discard, zerofill return iatt of
    files in their responses. This iatt can be cached by md-cache. Hence
    it is important that write-behind maintains relative ordering of these
    fops with cached writes. Failure to do so, can result in md-cache
    storing stale iatts and returning the same to applications.
    
    Change-Id: Icfe12ad807e42fe9e52a9f63e47ce63f511c6946
    BUG: 1390050
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/15757
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 7 Krutika Dhananjay 2017-01-23 09:53:13 UTC
Here's another fix in readdir-ahead - https://review.gluster.org/16419

Comment 8 Worker Ant 2017-01-24 11:17:20 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#6) for review on master by Krutika Dhananjay (kdhananj)

Comment 9 Worker Ant 2017-01-25 06:51:43 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#7) for review on master by Krutika Dhananjay (kdhananj)

Comment 10 Worker Ant 2017-01-26 02:57:56 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate dentries if they're modified while in cache) posted (#8) for review on master by Krutika Dhananjay (kdhananj)

Comment 11 Worker Ant 2017-01-31 08:57:34 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#9) for review on master by Krutika Dhananjay (kdhananj)

Comment 12 Worker Ant 2017-01-31 10:19:57 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#10) for review on master by Krutika Dhananjay (kdhananj)

Comment 13 Worker Ant 2017-02-01 09:49:48 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#11) for review on master by Krutika Dhananjay (kdhananj)

Comment 14 Worker Ant 2017-02-02 06:35:21 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#12) for review on master by Krutika Dhananjay (kdhananj)

Comment 15 Raghavendra G 2017-02-10 04:42:47 UTC
*** Bug 1286970 has been marked as a duplicate of this bug. ***

Comment 16 Worker Ant 2017-02-10 04:47:49 UTC
REVIEW: https://review.gluster.org/16591 (cluster/dht: Use int8 instead of string to pass DHT_IATT_IN_XDATA_KEY) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 17 Worker Ant 2017-02-13 06:48:21 UTC
COMMIT: https://review.gluster.org/16591 committed in master by Raghavendra G (rgowdapp) 
------
commit c6304c339104b0655473ee928659fdc4fa7cb2d9
Author: Krutika Dhananjay <kdhananj>
Date:   Thu Feb 9 21:12:17 2017 +0530

    cluster/dht: Use int8 instead of string to pass DHT_IATT_IN_XDATA_KEY
    
    It is sufficient to pass an int value as opposed to a "yes" against the
    DHT_IATT_IN_XDATA_KEY key since all posix cares about is whether the
    key is present in the dict or not. Also note that this patch does not
    violate backward compatibility since the handling of the key in posix
    remains untouched.
    
    Change-Id: I2f881494a257488709c8c1d2002f2d124ddcc089
    BUG: 1390050
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: https://review.gluster.org/16591
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 18 Worker Ant 2017-02-17 10:19:31 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#13) for review on master by Krutika Dhananjay (kdhananj)

Comment 19 Worker Ant 2017-02-20 11:07:44 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#14) for review on master by Krutika Dhananjay (kdhananj)

Comment 20 Worker Ant 2017-02-21 09:26:52 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#15) for review on master by Krutika Dhananjay (kdhananj)

Comment 21 Worker Ant 2017-02-27 12:01:03 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#16) for review on master by Krutika Dhananjay (kdhananj)

Comment 22 Worker Ant 2017-02-28 11:00:44 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#17) for review on master by Krutika Dhananjay (kdhananj)

Comment 23 Worker Ant 2017-03-02 10:15:49 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#18) for review on master by Krutika Dhananjay (kdhananj)

Comment 24 Shyamsundar 2017-03-06 17:32:36 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 25 Shyamsundar 2017-05-30 18:35:29 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 26 Worker Ant 2017-08-03 08:18:02 UTC
REVIEW: https://review.gluster.org/15759 (performance/quick-read: Use generation numbers to avoid updating the cache with stale data) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 27 Worker Ant 2017-08-03 09:06:27 UTC
REVIEW: https://review.gluster.org/15759 (performance/quick-read: Use generation numbers to avoid updating the cache with stale data) posted (#4) for review on master by Raghavendra G (rgowdapp)

Comment 28 Raghavendra G 2017-08-04 04:48:51 UTC
There are some patches on this bz still open and are relevant. Hence re-opening the bz

Comment 29 Worker Ant 2017-08-04 04:53:38 UTC
REVIEW: https://review.gluster.org/15759 (performance/quick-read: Use generation numbers to avoid updating the cache with stale data) posted (#5) for review on master by Raghavendra G (rgowdapp)

Comment 30 Worker Ant 2018-05-25 03:37:08 UTC
REVIEW: https://review.gluster.org/15759 (performance/quick-read: Use generation numbers to avoid updating the cache with stale data) posted (#6) for review on master by Raghavendra G

Comment 31 Worker Ant 2018-05-28 09:59:18 UTC
COMMIT: https://review.gluster.org/15759 committed in master by "Raghavendra G" <rgowdapp> with a commit message- performance/quick-read: Use generation numbers to avoid updating the cache with stale data

Thanks to Pranith for the example. Following is the race we are trying
to solve with this patch.

1) We have a file with content 'abc'
2) lookup and writev which replaces 'abc' with 'def' comes. Lookup
   fetches abc but yet to update the cache, and then immediately
   writev is wound which zeros out the cache. Now lookup_cbk updates
   the buffer with 'abc' even though on disk it is 'def'. Now writev
   completes and returns to application.
3) application does a readv which will be fetched from quick-read as
   'abc'.

Change-Id: I9a9cab9c99652aa6d17230a4fe4dc034ec502b1b
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Raghavendra G <rgowdapp>

Comment 32 Worker Ant 2018-07-20 14:06:47 UTC
REVIEW: https://review.gluster.org/16419 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#20) for review on master by Raghavendra G

Comment 33 Worker Ant 2018-07-28 09:28:14 UTC
COMMIT: https://review.gluster.org/16419 committed in master by "Raghavendra G" <rgowdapp> with a commit message- performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache

PROBLEM:

Entries that are readdirp'd ahead can undergo modification in terms
of writes, truncates which could modify their iatts. When a readdir
is finally wound at offset corresponding to these entries, the iatts
that are returned to the application come from readdir-ahead's cache,
which are stale by now. This problem gets further aggravated when caching
translators/modules cache and continue to serve this stale information.

FIX:

Whenever a dentry undergoes modification, in the cbk of the modification fop,
a "dirty" flag (default 0) is set in its inode ctx. When it's time for
readdir-ahead to serve these entries, it will read the inode ctx and check
if the entry is "dirty", and if it is, set the entry's attrs to all zeroes,
as an indicator to fuse, md-cache etc not to cache these attributes.

Also there is one tiny race between the entry creation and a readdirp on its
parent dir, which could cause the inode-ctx setting and inode ctx reading to
happen on two different inode objects. To prevent this, fuse-bridge is made to
drop entries for which dentry->inode is not the same as linked inode,
in readdirp cbk.

Change-Id: If7396507632b5268442ca580473d5155fee9cbef
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Krutika Dhananjay <kdhananj>
Signed-off-by: Raghavendra G <rgowdapp>

Comment 34 Worker Ant 2018-08-03 13:43:05 UTC
REVIEW: https://review.gluster.org/20634 (Revert \"performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache\") posted (#1) for review on master by Raghavendra G

Comment 35 Worker Ant 2018-08-03 22:18:43 UTC
COMMIT: https://review.gluster.org/20634 committed in master by "Shyamsundar Ranganathan" <srangana> with a commit message- Revert "performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache"

This reverts commit 7131de81f72dda0ef685ed60d0887c6e14289b8c.

With the latest master, I created a single brick volume and some files
inside it.

[root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
again"; ls -l /mnt/fuse1
umount: /mnt/fuse1: not mounted
total 0
----------. 0 root root 0 Jan  1  1970 file-1
----------. 0 root root 0 Jan  1  1970 file-2
----------. 0 root root 0 Jan  1  1970 file-3
----------. 0 root root 0 Jan  1  1970 file-4
----------. 0 root root 0 Jan  1  1970 file-5
d---------. 0 root root 0 Jan  1  1970 subdir
Trying again
total 3
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
d---------. 0 root root  0 Jan  1  1970 subdir
[root@rhgs313-6 ~]#

Conversation can be followed on gluster-devel on thread with subj:
tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected
pointed this patch as culprit.

Change-Id: I1eb46f6c196f44fde8ce991840a0e724e6f50862
Signed-off-by: Raghavendra G <rgowdapp>
Updates: bz#1390050

Comment 36 Worker Ant 2018-08-04 18:47:51 UTC
REVIEW: https://review.gluster.org/20639 (performance/readdir-ahead: Invalidate cached dentries if they're modified while in cache) posted (#1) for review on master by Raghavendra G

Comment 37 kcao22003 2018-08-12 21:58:02 UTC
Hello,

According to this bug report, this issue was already fixed in Gluster version glusterfs-3.11.0. However, we are currently using version Gluster 4.1.0 and still having the same issue. 

Does anyone run into this same problem with Gluster version above 3.11.0?

Comment 38 Raghavendra G 2018-08-13 02:00:10 UTC
(In reply to kcao22003 from comment #37)
> Hello,
> 
> According to this bug report, this issue was already fixed in Gluster
> version glusterfs-3.11.0. However, we are currently using version Gluster
> 4.1.0 and still having the same issue. 
> 
> Does anyone run into this same problem with Gluster version above 3.11.0?

We found some new issues later wrt to this use-case. Currently we are working on fixes. Is it possible to share your testcase (with complete instructions to setup elasticsearch and the test case that fails) so that it'll help us to validate our fixes wrt your usecase?

Comment 39 Worker Ant 2018-08-18 07:29:25 UTC
COMMIT: https://review.gluster.org/20639 committed in master by "Raghavendra G" <rgowdapp> with a commit message- performance/readdir-ahead: keep stats of cached dentries in sync with modifications

PROBLEM:

Stats of dentries that are readdirp'd ahead can become stale due to
fops like writes, truncate etc that modify the file pointed by
dentries. When a readdir is finally wound at offset corresponding to
these entries, the iatts that are returned to the application come
from readdir-ahead's cache, which are stale by now. This problem gets
further aggravated when caching translators/modules cache and continue
to serve this stale information.

FIX:

* Store the iatt in context of the inode pointed by dentry.
* Whenever the inode pointed by dentry undergoes modification, in cbk
  of modification fop, update the iatt stored in inode-ctx to reflect
  the modification.
* When serving a readdirp response from application, update iatts of
  dentries with the iatts stored in the context of inodes pointed by
  these dentries.
* Some fops don't have valid iatts in their responses. For eg., write
  response whose data is still cached in write-behind will have zeroed
  out stat. In this case keep only ia_type and ia_gfid and reset rest
  of the iatt members to zero.
  - fuse-bridge in this case just sends "entry" information back to
    kernel and attr is not sent.
  - gfapi sets entry->inode to NULL and zeroes out the entire stat
* There is one tiny race between the entry creation and a readdirp on
  its parent dir, which could cause the inode-ctx setting and inode
  ctx reading to happen on two different inode objects. To prevent
  this, when entry->inode doesn't eqaul to linked_inode,
  - fuse-bridge is made to send only "entry" information without
    attributes
  - gfapi sets entry->inode to NULL and zeroes out the entire stat.

Change-Id: Ia27ff49a61922e88c73a1547ad8aacc9968a69df
BUG: 1390050
Updates: bz#1390050
Signed-off-by: Krutika Dhananjay <kdhananj>
Signed-off-by: Raghavendra G <rgowdapp>

Comment 40 kcao22003 2018-09-30 00:51:16 UTC
(In reply to Raghavendra G from comment #38)
> (In reply to kcao22003 from comment #37)
> > Hello,
> > 
> > According to this bug report, this issue was already fixed in Gluster
> > version glusterfs-3.11.0. However, we are currently using version Gluster
> > 4.1.0 and still having the same issue. 
> > 
> > Does anyone run into this same problem with Gluster version above 3.11.0?
> 
> We found some new issues later wrt to this use-case. Currently we are
> working on fixes. Is it possible to share your testcase (with complete
> instructions to setup elasticsearch and the test case that fails) so that
> it'll help us to validate our fixes wrt your usecase?

We are using GlusterFS as a storage service for our Kubernetes cluster: https://github.com/gluster/gluster-kubernetes. For Elasticsearch, we are launching ES cluster as a Kubernetes StatefulSet and deploying using a Helm chart: https://github.com/clockworksoul/helm-elasticsearch.  We still run into this issue with the latest version for glusterfs: (4.1.4) and ElasticSearch (v6.2.3).

Comment 41 Shyamsundar 2018-10-23 15:06:00 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/