1546717 – Removing directories from multiple clients throws ESTALE errors

Bug 1546717 - Removing directories from multiple clients throws ESTALE errors

Summary: Removing directories from multiple clients throws ESTALE errors

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	md-cache
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Raghavendra G
QA Contact:	Prasad Desala
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1577796 (view as bug list)
Depends On:
Blocks:	1503137 1566303 1571593
TreeView+	depends on / blocked

Reported:	2018-02-19 12:03 UTC by Prasad Desala
Modified:	2018-09-17 13:25 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.12.2-9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1566303 (view as bug list)
Environment:
Last Closed:	2018-09-04 06:42:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:43:52 UTC

Description Prasad Desala 2018-02-19 12:03:53 UTC

Description of problem:
=======================
Removing empty directories from multiple clients throws ESTALE errors as below,

rm: cannot remove ‘1’: Stale file handle
rm: cannot remove ‘103’: Stale file handle
rm: cannot remove ‘107’: Stale file handle
rm: cannot remove ‘11’: Stale file handle
rm: cannot remove ‘113’: Stale file handle
rm: cannot remove ‘117’: Stale file handle
rm: cannot remove ‘12’: Stale file handle
rm: cannot remove ‘123’: Stale file handle
rm: cannot remove ‘127’: Stale file handle
rm: cannot remove ‘129’: Stale file handle
rm: cannot remove ‘133’: Stale file handle

Note: rm -rf removed all the directories successfully without any issues.

Version-Release number of selected component (if applicable):
3.12.2-4.el7rhgs.x86_64

How reproducible:
always

Steps to Reproduce:
===================
1) Create a x3 volume and start it.
2) Mount it on multiple clients.
3) create few empty directories
for i in {1..1000};do mkdir $i;done
4) From multiple clients, run rm -rf *

Actual results:
===============
rm -rf throws ESTALE errors

Expected results:
=================
NO ESTALE errors

Comment 10 Raghavendra G 2018-03-02 14:14:51 UTC

From fuse-dump,

2018-03-01T11:25:53.579099719+05:30 "GLUSTER\xf5" RMDIR {Len:44 Opcode:11 Unique:3615 Nodeid:140513143433312 Uid:0 Gid:0 Pid:8667 Padding:0} 110 
2018-03-01T11:25:53.762465001+05:30 "GLUSTER\xf5" {Len:16 Error:-116 Unique:3615} 
2018-03-01T11:25:53.763381283+05:30 "GLUSTER\xf5" LOOKUP {Len:44 Opcode:1 Unique:3616 Nodeid:140513143433312 Uid:0 Gid:0 Pid:8667 Padding:0} 110 
2018-03-01T11:25:53.763599918+05:30 "GLUSTER\xf5" {Len:144 Error:0 Unique:3616} {Nodeid:140513144416608 Generation:0 EntryValid:1 AttrValid:1 EntryValidNsec:0 AttrValidNsec:0 Attr:{Ino:13658219387318354837 Size:4096 Blocks:8 Atime:1519883211 Mtime:1519883211 Ctime:1519883211 Atimensec:351637895 Mtimensec:351637895 Ctimensec:403637612 Mode:16877 Nlink:2 Uid:0 Gid:0 Rdev:0 Blksize:131072 Padding:0}} 
2018-03-01T11:25:53.763714221+05:30 "GLUSTER\xf5" RMDIR {Len:44 Opcode:11 Unique:3617 Nodeid:140513143433312 Uid:0 Gid:0 Pid:8667 Padding:0} 110 
2018-03-01T11:25:53.933181928+05:30 "GLUSTER\xf5" {Len:16 Error:-116 Unique:3617} 

Note that after RMDIR (unique:3615) failed with ESTALE, Lookup on the same path (unique:3616) done by VFS retry logic returned success. Due to this another RMDIR (unique:3617) was attempted which again failed with ESTALE. Failure of second rmdir forced VFS to give up failing RMDIR with ESTALE error.

Note that since first RMDIR failed with ESTALE, lookup (unique:3616) should've returned ENOENT, but its not. I think this is the bug. Had the lookup returned ENOENT, rmdir would've failed with ENOENT and cmd rm would've ignored it. I am suspecting the lookup succeeded due to stale cache in md-cache. Note that the md-cache wouldn't have witnessed RMDIR, so most likely it keeps the dentry alive.

I am going to repeat the test with md-cache turned off.

Comment 11 Raghavendra G 2018-03-02 14:39:57 UTC

With md-cache turned off, ESTALE errors are no longer seen and rm completes successfully.

The fix is, md-cache should purge the cache if any fop on an inode returns ESTALE error.

Comment 14 Raghavendra G 2018-04-23 14:02:04 UTC

https://review.gluster.org/19926

Comment 17 Prasad Desala 2018-05-14 07:39:23 UTC

Verified this BZ on glusterfs version 3.12.2-9.el7rhgs.x86_64.

Followed the same steps as in the description on a data set having a) deep directory without files b) deep directories with files. rm -rf command didn't throw any ESTALE errors.

Hence, moving this BZ to Verified.

Comment 18 Raghavendra G 2018-05-24 06:39:42 UTC

*** Bug 1577796 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2018-09-04 06:42:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.