1065332 – Directory deletion fails when a replicate sub-volume goes down and comes back up again

Bug 1065332 - Directory deletion fails when a replicate sub-volume goes down and comes back up again

Summary: Directory deletion fails when a replicate sub-volume goes down and comes back...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Nithya Balachandran
QA Contact:	RajeshReddy
Docs Contact:
URL:
Whiteboard:	triaged, dht-fixed
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-02-14 11:23 UTC by Sachidananda Urs
Modified:	2016-07-13 22:34 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-24 05:06:08 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sachidananda Urs 2014-02-14 11:23:26 UTC

Description of problem:

Directory deletion fails when one of the replicate subvolumes goes to read-only and becomes read-write again.

Consider a case when cluster.quorum-type is set to fixed, and cluster.quorum-count is set to 2 in a n x 2 replicate volume. When one of the node goes down, the corresponding replica sub-volume goes read-only. When the node is back online, the replica sub-volume becomes read-write. During this duration (while the replica sub-volume was read-only), if there was any `rm -rf' being executed that would delete the directories from the subvolumes which were online. However, after the node is up, when the sub-volume becomes read-write the deletion of these directories still fail.

rm -rf on NFS mount:

rm: cannot remove `7/linux-3.13.3/arch/sparc/include/asm': Stale file handle
rm: cannot remove `7/linux-3.13.3/arch/sparc/include/uapi/asm': Stale file handle

On FUSE:
[root@rafr-4]# rm -rf 11
rm: cannot remove `11/linux-3.13.3/arch/ia64/include/asm': Directory not empty
rm: cannot remove `11/linux-3.13.3/arch/arm/include/asm': Directory not empty
rm: cannot remove `11/linux-3.13.3/arch/arm64/include/asm': Directory not empty

Version-Release number of selected component (if applicable):
glusterfs 3.4afr2.2

How reproducible:
Always

Steps to Reproduce:
1. Create a 2x2 replicate setup
2. Set the following volume options: cluster.quorum-type fixed, cluster.quorum-count 2
3. Create some huge data on client. And run rm -rf on the data created
4. Bring down network interface on one of the nodes (ifdown)
5. After some considerable time, ifup the node
6. Cancel the rm -rf and run rm -rf again on the data

rm -rf fails

Actual results:
Directory deletion fails

Comment 1 Sachidananda Urs 2014-02-14 11:29:59 UTC

Please find sosreports here: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1065332/

Comment 3 Krutika Dhananjay 2014-06-30 05:04:45 UTC

Sachidananda,

Could you change the sosreports' permission so I can access them?

Thanks in advance,
Krutika

Comment 4 Sachidananda Urs 2014-06-30 07:17:59 UTC

Krutika Dhananjay,

I've changed the permissions. 
You're welcome,

Sachidananda.

Comment 5 Krutika Dhananjay 2014-07-14 15:23:46 UTC

I am able to recreate this bug consistently even with glusterfs untar on mount point as the method of creating data on a 2x2 volume.

ROOT CAUSE ANALYSIS:

What rm -rf does in a nutshell:
------------------------------
As part of rm -rf on the mount point, first of all, readdirs are performed starting from root (STEP-0), regular files under the directories are unlinked (say STEP-1). And then, rmdir is performed on their directories (call it STEP-2).

How DHT does rmdir:
------------------
Now, the way DHT does rmdir is by first winding RMDIR FOP on all but the hashed sub-volume of the concerned directory. And then once that is done, the RMDIR is finally wound on the hashed sub-volume.

Observations:
------------
What Pranith and I observed was that there were few directories (for instance /glusterfs-3.5qa2/contrib/libexecinfo, /glusterfs-3.5qa2/contrib/rbtree etc) whose cached sub-volume happened to be that replicate xlator which was not in quorum. In this case, dht_rmdir() on this was failing with EROFS (as expected). Despite seeing this error, after STEP 1, DHT still goes ahead and winds an RMDIR on the hashed subvolume - the result : the directory is removed from the hashed sub-volume but is still present on the remaining subvols of DHT.

Now after bringing the downed brick back up (that is after quorum is restored), when rm -rf is attempted again, as part of STEP-0, READDIRPs are issued on the directories. And the way dht_readdirp() works is by taking into account only those directory entries whose hashed-subvolume happens to be the same as the sub-volume on which the current readdirp was performed. In this example, READDIRP on the parent of the directories libexecinfo and rbtree (i.e /glusterfs-3.5qa2/contrib) returned no entries (barring . and ..) on the hashed sub-volume and the names 'libexecinfo' and 'rbtree' from the cached sub-volumes.
Since these entries were found on cached sub-volume alone, dht readdirp ignores them and treats the parent directory to be empty. This causes a subsequent RMDIR on the parent to fail eventually with ENOTEMPTY.

I will try the same test case on NFS mount point and update the bug with the RCA.

Comment 6 Krutika Dhananjay 2014-07-15 07:16:41 UTC

Two updates:

1. I tried the same test case on an NFS mount 3 times and the result: I got the same error as in fuse mount - ENOTEMPTY. And the root cause of this behavior is same as the one described in comment #5.

2. Turns out Susant had already sent a patch in dht_rmdir() in April, which fixes this issue and is currently under review: http://review.gluster.org/#/c/7460/. I applied this patch and ran the test again, and everything worked fine.

Comment 7 Pranith Kumar K 2014-09-20 09:47:21 UTC

Assigning bug to Susanth/dht-component as per https://bugzilla.redhat.com/show_bug.cgi?id=1065332#c6

Comment 8 Susant Kumar Palai 2015-12-24 06:45:06 UTC

Triage-update: Need to refresh http://review.gluster.org/#/c/7460/ and test.

Comment 9 Nithya Balachandran 2016-06-08 05:00:24 UTC

This should have been fixed by http://review.gluster.org/#/c/14060/

We will need to retest on RHGS 3.1.3 and confirm.

Comment 10 krishnaram Karthick 2016-06-21 06:56:56 UTC

The issue reported is no more seen in 3.1.3 build. Tried the test mentioned in steps to reproduce for couple of times. rm -rf deletes all directories as expected.

Note You need to log in before you can comment on or make changes to this bug.