Bug 1347529

Summary:	rm -rf to a dir gives directory not empty(ENOTEMPTY) error
Product:	[Community] GlusterFS	Reporter:	Nithya Balachandran <nbalacha>
Component:	distribute	Assignee:	Nithya Balachandran <nbalacha>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	3.8.0	CC:	bkunal, bugs, mselvaga, nbalacha, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1330032	Environment:
Last Closed:	2016-07-08 14:43:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1329514, 1330032
Bug Blocks:	1331933

Description Nithya Balachandran 2016-06-17 07:40:59 UTC

+++ This bug was initially created as a clone of Bug #1330032 +++

+++ This bug was initially created as a clone of Bug #1329514 +++

Description of problem:
2x2 distributed replicate volume.

1) Customer were using fuse client. When they were deleting a directory (which looked empty when "ls" was done on the mount point) they were getting ENOTEMPTY errors.

2) Installed glusterfs-debuginfo package, and then attached to the glusterfs client process via gdb.

3) When gdb was attached and breakpoints were put, found that the directory being removed was not empty on all the nodes. 

The directory being removed (name "Dir45") was empty on one distribute subvolumes. The other subvolume had an empty sub directory inside it (the name of the subdirectory is "bucket7").

"ls" of Dir45 was not showing bucket7 to be present. But backend had that subdirectory. When ls on the subdirectory was explicitly done, then it was healed by distribute and "ls" on its parent directory (i.e. Dir45) started showing that entry.

Version-Release number of selected component (if applicable):
3.1.1

--- Additional comment from Nithya Balachandran on 2016-04-25 06:31:34 EDT ---

Steps to reproduce the issue:

1. Create a 2x2 dist-rep volume.
2. Set cluster.quorum-type to auto.
3. NFS mount the volume
4. mkdir -p dir1/dir2/dir3/dir4
5. Kill the first brick process for the non-hashed subvol for dir4
6. Try to delete dir4 : rmdir dir1/dir2/dir3/dir4

Expected results:
rmdir fails with EROFS. The directory should exist on all the bricks.

Actual results:
rmdir fails with EROFS but the directory is deleted from the hashed subvol.




RCA:


In dht_rmdir_cbk:
                if (op_ret == -1) {
                        if ((op_errno != ENOENT) && (op_errno != ESTALE)) {
                                local->op_errno = op_errno;
                                local->op_ret = -1;

                                if (op_errno != EACCES)
                                        local->need_selfheal = 1;  <-- local->need_selfheal is set to 1 as op_errno is EROFS.
                        }

                        gf_uuid_unparse(local->loc.gfid, gfid);

                        gf_msg_debug (this->name, op_errno,
                                      "rmdir on %s for %s failed."
                                      "(gfid = %s)",
                                      prev->this->name, local->loc.path,
                                      gfid);
                        goto unlock;
                }


However, local->fop_succeeded is still 0 as there are only 2 subvols.


                } else if (this_call_cnt) {
                        /* If non-hashed subvol's have responded, proceed */

---> No check is performed here to see if the fop succeeded. rmdir is wound to the hashed subvol and succeeds.

                        local->need_selfheal = 0;
                        STACK_WIND (frame, dht_rmdir_hashed_subvol_cbk,
                                    local->hashed_subvol,
                                    local->hashed_subvol->fops->rmdir,
                                    &local->loc, local->flags, NULL);
                } else if (!this_call_cnt) {


Now dir4 exists on the non-hashed subvol but not the hashed subvol.
ls dir1/dir2/dir3 shows no entries but rmdir dir1/dir2/dir3 returns ENOTEMPTY.

--- Additional comment from Vijay Bellur on 2016-04-25 06:39:46 EDT ---

REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Vijay Bellur on 2016-04-27 12:59:50 EDT ---

REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#2) for review on master by N Balachandran (nbalacha)

--- Additional comment from Vijay Bellur on 2016-04-28 06:05:02 EDT ---

REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#3) for review on master by N Balachandran (nbalacha)

--- Additional comment from Vijay Bellur on 2016-05-02 05:17:10 EDT ---

COMMIT: http://review.gluster.org/14060 committed in master by Raghavendra G (rgowdapp) 
------
commit 78c1c6002f0b11afa997a14f8378c04f257ea1c5
Author: N Balachandran <nbalacha>
Date:   Mon Apr 25 16:02:10 2016 +0530

    cluster/dht: Handle rmdir failure correctly
    
    DHT did not handle rmdir failures on non-hashed subvols
    correctly in a 2x2 dist-rep volume, causing the
    directory do be deleted from the hashed subvol.
    Also fixed an issue where the dht_selfheal_restore
    errcodes were overwriting the rmdir error codes.
    
    Change-Id: If2c6f8dc8ee72e3e6a7e04a04c2108243faca468
    BUG: 1330032
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: http://review.gluster.org/14060
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 1 Vijay Bellur 2016-06-17 07:46:07 UTC

REVIEW: http://review.gluster.org/14751 (cluster/dht: Handle rmdir failure correctly) posted (#1) for review on release-3.8 by N Balachandran (nbalacha)

Comment 2 Vijay Bellur 2016-06-19 05:48:49 UTC

COMMIT: http://review.gluster.org/14751 committed in release-3.8 by Raghavendra G (rgowdapp) 
------
commit 89b5867fc7dc20ccb3af512d2723b1021d852381
Author: N Balachandran <nbalacha>
Date:   Fri Jun 17 13:03:14 2016 +0530

    cluster/dht: Handle rmdir failure correctly
    
    DHT did not handle rmdir failures on non-hashed subvols
    correctly in a 2x2 dist-rep volume, causing the
    directory do be deleted from the hashed subvol.
    Also fixed an issue where the dht_selfheal_restore
    errcodes were overwriting the rmdir error codes.
    
    > Change-Id: If2c6f8dc8ee72e3e6a7e04a04c2108243faca468
    > BUG: 1330032
    > Signed-off-by: N Balachandran <nbalacha>
    > Reviewed-on: http://review.gluster.org/14060
    > Smoke: Gluster Build System <jenkins.com>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.com>
    > Reviewed-by: Raghavendra G <rgowdapp>
    (cherry picked from commit 78c1c6002f0b11afa997a14f8378c04f257ea1c5)
    
    Change-Id: Id3f7c8fd515586d09f1f29c2eceddfee2ef8ec55
    BUG: 1347529
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: http://review.gluster.org/14751
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 3 Niels de Vos 2016-07-08 14:43:30 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.1, please open a new bug report.

glusterfs-3.8.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.packaging/156
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user