Bug 1566303

Summary:	Removing directories from multiple clients throws ESTALE errors
Product:	[Community] GlusterFS	Reporter:	Raghavendra G <rgowdapp>
Component:	md-cache	Assignee:	Raghavendra G <rgowdapp>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, rgowdapp, rhinduja, rhs-bugs, storage-qa-internal, tdesala
Target Milestone:	---	Keywords:	Regression
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-v4.1.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1546717
Clones:	1571593 (view as bug list)		Environment:
Last Closed:	2018-06-20 18:04:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1546717
Bug Blocks:	1503137, 1571593

Comment 1 Raghavendra G 2018-04-12 02:49:36 UTC

+++ This bug was initially created as a clone of Bug #1546717 +++

Description of problem:
=======================
Removing empty directories from multiple clients throws ESTALE errors as below,

rm: cannot remove ‘1’: Stale file handle
rm: cannot remove ‘103’: Stale file handle
rm: cannot remove ‘107’: Stale file handle
rm: cannot remove ‘11’: Stale file handle
rm: cannot remove ‘113’: Stale file handle
rm: cannot remove ‘117’: Stale file handle
rm: cannot remove ‘12’: Stale file handle
rm: cannot remove ‘123’: Stale file handle
rm: cannot remove ‘127’: Stale file handle
rm: cannot remove ‘129’: Stale file handle
rm: cannot remove ‘133’: Stale file handle

Note: rm -rf removed all the directories successfully without any issues.

Version-Release number of selected component (if applicable):
3.12.2-4.el7rhgs.x86_64

How reproducible:
always

Steps to Reproduce:
===================
1) Create a x3 volume and start it.
2) Mount it on multiple clients.
3) create few empty directories
for i in {1..1000};do mkdir $i;done
4) From multiple clients, run rm -rf *

Actual results:
===============
rm -rf throws ESTALE errors

Expected results:
=================
NO ESTALE errors

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-02-19 07:03:57 EST ---

This bug is automatically being proposed for the release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Prasad Desala on 2018-02-19 07:11:57 EST ---

Volume Name: distrepx3
Type: Distributed-Replicate
Volume ID: 10b7e4d9-9be9-497f-bc67-4e339b071848
Status: Started
Snapshot Count: 0
Number of Bricks: 5 x 3 = 15
Transport-type: tcp
Bricks:
Brick1: 10.70.42.167:/bricks/brick6/b6
Brick2: 10.70.42.177:/bricks/brick6/b6
Brick3: 10.70.42.173:/bricks/brick6/b6
Brick4: 10.70.42.176:/bricks/brick6/b6
Brick5: 10.70.42.169:/bricks/brick6/b6
Brick6: 10.70.42.166:/bricks/brick6/b6
Brick7: 10.70.42.167:/bricks/brick7/b7
Brick8: 10.70.42.177:/bricks/brick7/b7
Brick9: 10.70.42.173:/bricks/brick7/b7
Brick10: 10.70.42.176:/bricks/brick7/b7
Brick11: 10.70.42.169:/bricks/brick7/b7
Brick12: 10.70.42.166:/bricks/brick7/b7
Brick13: 10.70.42.176:/bricks/brick8/b8
Brick14: 10.70.42.169:/bricks/brick8/b8
Brick15: 10.70.42.166:/bricks/brick8/b8
Options Reconfigured:
nfs.disable: on
performance.client-io-threads: off
transport.address-family: inet
cluster.brick-multiplex: enable

Clients:
10.70.42.191 --> mount -t glusterfs 10.70.42.167:/distrepx3 /mnt/distrepx3
10.70.41.254 --> mount -t glusterfs 10.70.42.167:/distrepx3 /mnt/distrepx3
10.70.42.64 --> mount -t glusterfs 10.70.42.167:/distrepx3 /mnt/distrepx3
10.70.42.21 --> mount -t glusterfs 10.70.42.167:/distrepx3 /mnt/distrepx3

--- Additional comment from Prasad Desala on 2018-02-19 07:42:09 EST ---

I'm not hitting this issue on pure distribute volume.

--- Additional comment from Prasad Desala on 2018-02-20 05:13:45 EST ---

Seeing this issue even with directories having files in it, changing the bug summary accordingly.

--- Additional comment from Raghavendra G on 2018-02-22 03:21:56 EST ---

Looks like a regression caused by [1]. Note that patch that was reverted by [1] was a fix to bz 1245065 and this issue looks very similar to bz 1245065

[1] https://review.gluster.org/18463

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-02-22 03:51:46 EST ---

This bug report has Keywords: Regression or TestBlocker.

Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release.

Please resolve ASAP.

--- Additional comment from Raghavendra G on 2018-02-22 05:22:10 EST ---

> Prasad Desala 
> Keywords: Regression

Please note that no functionality is broken. In this bug, the directory structure is removed completely and hence no functionality is affected. However, if we are able to reproduce bz 1245065 where part of the directory structure remains, I think the issue is serious enough to be considered as a blocker.

--- Additional comment from Raghavendra G on 2018-02-27 08:28:31 EST ---

Prasad,

Can you capture fusedump and strace of rm (from all clients) when you hit this bug?

# strace -ff -T -p <pid-of-rm> -o <path-where-you-want-strace-output-saved>

To capture fuse-dump, you've to mount glusterfs with --dump-fuse option

# glusterfs --volfile-server=<volfile-server> --volfile-id=<volfile-id> --dump-fuse=<path-to-where-fuse-dump-binary-file-has-to-be-stored> /mnt/glusterfs

Please attach fuse-dump and strace (from all clients) to the bug.

regards,
Raghavendra

--- Additional comment from Prasad Desala on 2018-03-01 01:21:57 EST ---

Reproduced this issue again and captured fusedump, strace of rm from all the clients.

--- Additional comment from Raghavendra G on 2018-03-02 09:14:51 EST ---

From fuse-dump,

2018-03-01T11:25:53.579099719+05:30 "GLUSTER\xf5" RMDIR {Len:44 Opcode:11 Unique:3615 Nodeid:140513143433312 Uid:0 Gid:0 Pid:8667 Padding:0} 110 
2018-03-01T11:25:53.762465001+05:30 "GLUSTER\xf5" {Len:16 Error:-116 Unique:3615} 
2018-03-01T11:25:53.763381283+05:30 "GLUSTER\xf5" LOOKUP {Len:44 Opcode:1 Unique:3616 Nodeid:140513143433312 Uid:0 Gid:0 Pid:8667 Padding:0} 110 
2018-03-01T11:25:53.763599918+05:30 "GLUSTER\xf5" {Len:144 Error:0 Unique:3616} {Nodeid:140513144416608 Generation:0 EntryValid:1 AttrValid:1 EntryValidNsec:0 AttrValidNsec:0 Attr:{Ino:13658219387318354837 Size:4096 Blocks:8 Atime:1519883211 Mtime:1519883211 Ctime:1519883211 Atimensec:351637895 Mtimensec:351637895 Ctimensec:403637612 Mode:16877 Nlink:2 Uid:0 Gid:0 Rdev:0 Blksize:131072 Padding:0}} 
2018-03-01T11:25:53.763714221+05:30 "GLUSTER\xf5" RMDIR {Len:44 Opcode:11 Unique:3617 Nodeid:140513143433312 Uid:0 Gid:0 Pid:8667 Padding:0} 110 
2018-03-01T11:25:53.933181928+05:30 "GLUSTER\xf5" {Len:16 Error:-116 Unique:3617} 

Note that after RMDIR (unique:3615) failed with ESTALE, Lookup on the same path (unique:3616) done by VFS retry logic returned success. Due to this another RMDIR (unique:3617) was attempted which again failed with ESTALE. Failure of second rmdir forced VFS to give up failing RMDIR with ESTALE error.

Note that since first RMDIR failed with ESTALE, lookup (unique:3616) should've returned ENOENT, but its not. I think this is the bug. Had the lookup returned ENOENT, rmdir would've failed with ENOENT and cmd rm would've ignored it. I am suspecting the lookup succeeded due to stale cache in md-cache. Note that the md-cache wouldn't have witnessed RMDIR, so most likely it keeps the dentry alive.

I am going to repeat the test with md-cache turned off.

--- Additional comment from Raghavendra G on 2018-03-02 09:39:57 EST ---

With md-cache turned off, ESTALE errors are no longer seen and rm completes successfully.

The fix is, md-cache should purge the cache if any fop on an inode returns ESTALE error.

Comment 2 Worker Ant 2018-04-23 14:00:07 UTC

REVIEW: https://review.gluster.org/19926 (performance/md-cache: purge cache on ENOENT/ESTALE errors) posted (#1) for review on master by Raghavendra G

Comment 3 Worker Ant 2018-04-25 07:02:23 UTC

COMMIT: https://review.gluster.org/19926 committed in master by "Poornima G" <pgurusid> with a commit message- performance/md-cache: purge cache on ENOENT/ESTALE errors

If not, next lookup could be served from cache and can be success,
which is wrong. This can affect retry logic of VFS when it receives an
ESTALE.

Change-Id: Iad8e564d666aa4172823343f19a60c11e4416ef6
Signed-off-by: Raghavendra G <rgowdapp>
Fixes: bz#1566303

Comment 4 Shyamsundar 2018-06-20 18:04:29 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/