Description of problem: Directory deletion fails when one of the replicate subvolumes goes to read-only and becomes read-write again. Consider a case when cluster.quorum-type is set to fixed, and cluster.quorum-count is set to 2 in a n x 2 replicate volume. When one of the node goes down, the corresponding replica sub-volume goes read-only. When the node is back online, the replica sub-volume becomes read-write. During this duration (while the replica sub-volume was read-only), if there was any `rm -rf' being executed that would delete the directories from the subvolumes which were online. However, after the node is up, when the sub-volume becomes read-write the deletion of these directories still fail. rm -rf on NFS mount: rm: cannot remove `7/linux-3.13.3/arch/sparc/include/asm': Stale file handle rm: cannot remove `7/linux-3.13.3/arch/sparc/include/uapi/asm': Stale file handle On FUSE: [root@rafr-4]# rm -rf 11 rm: cannot remove `11/linux-3.13.3/arch/ia64/include/asm': Directory not empty rm: cannot remove `11/linux-3.13.3/arch/arm/include/asm': Directory not empty rm: cannot remove `11/linux-3.13.3/arch/arm64/include/asm': Directory not empty Version-Release number of selected component (if applicable): glusterfs 3.4afr2.2 How reproducible: Always Steps to Reproduce: 1. Create a 2x2 replicate setup 2. Set the following volume options: cluster.quorum-type fixed, cluster.quorum-count 2 3. Create some huge data on client. And run rm -rf on the data created 4. Bring down network interface on one of the nodes (ifdown) 5. After some considerable time, ifup the node 6. Cancel the rm -rf and run rm -rf again on the data rm -rf fails Actual results: Directory deletion fails
Please find sosreports here: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1065332/
Sachidananda, Could you change the sosreports' permission so I can access them? Thanks in advance, Krutika
Krutika Dhananjay, I've changed the permissions. You're welcome, Sachidananda.
I am able to recreate this bug consistently even with glusterfs untar on mount point as the method of creating data on a 2x2 volume. ROOT CAUSE ANALYSIS: What rm -rf does in a nutshell: ------------------------------ As part of rm -rf on the mount point, first of all, readdirs are performed starting from root (STEP-0), regular files under the directories are unlinked (say STEP-1). And then, rmdir is performed on their directories (call it STEP-2). How DHT does rmdir: ------------------ Now, the way DHT does rmdir is by first winding RMDIR FOP on all but the hashed sub-volume of the concerned directory. And then once that is done, the RMDIR is finally wound on the hashed sub-volume. Observations: ------------ What Pranith and I observed was that there were few directories (for instance /glusterfs-3.5qa2/contrib/libexecinfo, /glusterfs-3.5qa2/contrib/rbtree etc) whose cached sub-volume happened to be that replicate xlator which was not in quorum. In this case, dht_rmdir() on this was failing with EROFS (as expected). Despite seeing this error, after STEP 1, DHT still goes ahead and winds an RMDIR on the hashed subvolume - the result : the directory is removed from the hashed sub-volume but is still present on the remaining subvols of DHT. Now after bringing the downed brick back up (that is after quorum is restored), when rm -rf is attempted again, as part of STEP-0, READDIRPs are issued on the directories. And the way dht_readdirp() works is by taking into account only those directory entries whose hashed-subvolume happens to be the same as the sub-volume on which the current readdirp was performed. In this example, READDIRP on the parent of the directories libexecinfo and rbtree (i.e /glusterfs-3.5qa2/contrib) returned no entries (barring . and ..) on the hashed sub-volume and the names 'libexecinfo' and 'rbtree' from the cached sub-volumes. Since these entries were found on cached sub-volume alone, dht readdirp ignores them and treats the parent directory to be empty. This causes a subsequent RMDIR on the parent to fail eventually with ENOTEMPTY. I will try the same test case on NFS mount point and update the bug with the RCA.
Two updates: 1. I tried the same test case on an NFS mount 3 times and the result: I got the same error as in fuse mount - ENOTEMPTY. And the root cause of this behavior is same as the one described in comment #5. 2. Turns out Susant had already sent a patch in dht_rmdir() in April, which fixes this issue and is currently under review: http://review.gluster.org/#/c/7460/. I applied this patch and ran the test again, and everything worked fine.
Assigning bug to Susanth/dht-component as per https://bugzilla.redhat.com/show_bug.cgi?id=1065332#c6
Triage-update: Need to refresh http://review.gluster.org/#/c/7460/ and test.
This should have been fixed by http://review.gluster.org/#/c/14060/ We will need to retest on RHGS 3.1.3 and confirm.
The issue reported is no more seen in 3.1.3 build. Tried the test mentioned in steps to reproduce for couple of times. rm -rf deletes all directories as expected.