1331933 – rm -rf to a dir gives directory not empty(ENOTEMPTY) error

Bug 1331933 - rm -rf to a dir gives directory not empty(ENOTEMPTY) error

Summary: rm -rf to a dir gives directory not empty(ENOTEMPTY) error

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	3.7.11
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Raghavendra G
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1329514 1330032 1347529
Blocks:
TreeView+	depends on / blocked

Reported:	2016-04-30 05:56 UTC by Raghavendra G
Modified:	2016-06-28 12:16 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.7.12
Clone Of:	1330032
Environment:
Last Closed:	2016-06-28 11:43:59 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Raghavendra G 2016-04-30 05:56:27 UTC

+++ This bug was initially created as a clone of Bug #1330032 +++

+++ This bug was initially created as a clone of Bug #1329514 +++

Description of problem:
2x2 distributed replicate volume.

1) Customer were using fuse client. When they were deleting a directory (which looked empty when "ls" was done on the mount point) they were getting ENOTEMPTY errors.

2) Installed glusterfs-debuginfo package, and then attached to the glusterfs client process via gdb.

3) When gdb was attached and breakpoints were put, found that the directory being removed was not empty on all the nodes. 

The directory being removed (name "Dir45") was empty on one distribute subvolumes. The other subvolume had an empty sub directory inside it (the name of the subdirectory is "bucket7").

"ls" of Dir45 was not showing bucket7 to be present. But backend had that subdirectory. When ls on the subdirectory was explicitly done, then it was healed by distribute and "ls" on its parent directory (i.e. Dir45) started showing that entry.

Version-Release number of selected component (if applicable):
3.1.1

--- Additional comment from Nithya Balachandran on 2016-04-25 06:31:34 EDT ---

Steps to reproduce the issue:

1. Create a 2x2 dist-rep volume.
2. Set cluster.quorum-type to auto.
3. NFS mount the volume
4. mkdir -p dir1/dir2/dir3/dir4
5. Kill the first brick process for the non-hashed subvol for dir4
6. Try to delete dir4 : rmdir dir1/dir2/dir3/dir4

Expected results:
rmdir fails with EROFS. The directory should exist on all the bricks.

Actual results:
rmdir fails with EROFS but the directory is deleted from the hashed subvol.




RCA:


In dht_rmdir_cbk:
                if (op_ret == -1) {
                        if ((op_errno != ENOENT) && (op_errno != ESTALE)) {
                                local->op_errno = op_errno;
                                local->op_ret = -1;

                                if (op_errno != EACCES)
                                        local->need_selfheal = 1;  <-- local->need_selfheal is set to 1 as op_errno is EROFS.
                        }

                        gf_uuid_unparse(local->loc.gfid, gfid);

                        gf_msg_debug (this->name, op_errno,
                                      "rmdir on %s for %s failed."
                                      "(gfid = %s)",
                                      prev->this->name, local->loc.path,
                                      gfid);
                        goto unlock;
                }


However, local->fop_succeeded is still 0 as there are only 2 subvols.


                } else if (this_call_cnt) {
                        /* If non-hashed subvol's have responded, proceed */

---> No check is performed here to see if the fop succeeded. rmdir is wound to the hashed subvol and succeeds.

                        local->need_selfheal = 0;
                        STACK_WIND (frame, dht_rmdir_hashed_subvol_cbk,
                                    local->hashed_subvol,
                                    local->hashed_subvol->fops->rmdir,
                                    &local->loc, local->flags, NULL);
                } else if (!this_call_cnt) {


Now dir4 exists on the non-hashed subvol but not the hashed subvol.
ls dir1/dir2/dir3 shows no entries but rmdir dir1/dir2/dir3 returns ENOTEMPTY.

--- Additional comment from Vijay Bellur on 2016-04-25 06:39:46 EDT ---

REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Vijay Bellur on 2016-04-27 12:59:50 EDT ---

REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#2) for review on master by N Balachandran (nbalacha)

--- Additional comment from Vijay Bellur on 2016-04-28 06:05:02 EDT ---

REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#3) for review on master by N Balachandran (nbalacha)

Comment 1 Vijay Bellur 2016-04-30 05:57:12 UTC

REVIEW: http://review.gluster.org/14123 (cluster/dht: Handle rmdir failure correctly) posted (#1) for review on release-3.7 by Raghavendra G (rgowdapp)

Comment 2 Vijay Bellur 2016-05-03 04:40:43 UTC

COMMIT: http://review.gluster.org/14123 committed in release-3.7 by Raghavendra G (rgowdapp) 
------
commit 534ca9864907a677551e7aff57e5c021d035303d
Author: N Balachandran <nbalacha>
Date:   Mon Apr 25 16:02:10 2016 +0530

    cluster/dht: Handle rmdir failure correctly
    
    DHT did not handle rmdir failures on non-hashed subvols
    correctly in a 2x2 dist-rep volume, causing the
    directory do be deleted from the hashed subvol.
    Also fixed an issue where the dht_selfheal_restore
    errcodes were overwriting the rmdir error codes.
    
    Change-Id: If2c6f8dc8ee72e3e6a7e04a04c2108243faca468
    BUG: 1331933
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: http://review.gluster.org/14123
    Smoke: Gluster Build System <jenkins.com>
    Tested-by: Raghavendra G <rgowdapp>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 3 Kaushal 2016-06-28 12:16:09 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.