Bug 1330032 - rm -rf to a dir gives directory not empty(ENOTEMPTY) error
Summary: rm -rf to a dir gives directory not empty(ENOTEMPTY) error
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Nithya Balachandran
QA Contact:
Depends On: 1329514
Blocks: 1331933 1347529
TreeView+ depends on / blocked
Reported: 2016-04-25 10:19 UTC by Nithya Balachandran
Modified: 2017-03-27 18:12 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.9.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 1329514
: 1331933 1347529 (view as bug list)
Last Closed: 2017-03-27 18:12:32 UTC
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:

Attachments (Terms of Use)

Description Nithya Balachandran 2016-04-25 10:19:49 UTC
+++ This bug was initially created as a clone of Bug #1329514 +++

Description of problem:
2x2 distributed replicate volume.

1) Customer were using fuse client. When they were deleting a directory (which looked empty when "ls" was done on the mount point) they were getting ENOTEMPTY errors.

2) Installed glusterfs-debuginfo package, and then attached to the glusterfs client process via gdb.

3) When gdb was attached and breakpoints were put, found that the directory being removed was not empty on all the nodes. 

The directory being removed (name "Dir45") was empty on one distribute subvolumes. The other subvolume had an empty sub directory inside it (the name of the subdirectory is "bucket7").

"ls" of Dir45 was not showing bucket7 to be present. But backend had that subdirectory. When ls on the subdirectory was explicitly done, then it was healed by distribute and "ls" on its parent directory (i.e. Dir45) started showing that entry.

Version-Release number of selected component (if applicable):

Comment 1 Nithya Balachandran 2016-04-25 10:31:34 UTC
Steps to reproduce the issue:

1. Create a 2x2 dist-rep volume.
2. Set cluster.quorum-type to auto.
3. NFS mount the volume
4. mkdir -p dir1/dir2/dir3/dir4
5. Kill the first brick process for the non-hashed subvol for dir4
6. Try to delete dir4 : rmdir dir1/dir2/dir3/dir4

Expected results:
rmdir fails with EROFS. The directory should exist on all the bricks.

Actual results:
rmdir fails with EROFS but the directory is deleted from the hashed subvol.


In dht_rmdir_cbk:
                if (op_ret == -1) {
                        if ((op_errno != ENOENT) && (op_errno != ESTALE)) {
                                local->op_errno = op_errno;
                                local->op_ret = -1;

                                if (op_errno != EACCES)
                                        local->need_selfheal = 1;  <-- local->need_selfheal is set to 1 as op_errno is EROFS.

                        gf_uuid_unparse(local->loc.gfid, gfid);

                        gf_msg_debug (this->name, op_errno,
                                      "rmdir on %s for %s failed."
                                      "(gfid = %s)",
                                      prev->this->name, local->loc.path,
                        goto unlock;

However, local->fop_succeeded is still 0 as there are only 2 subvols.

                } else if (this_call_cnt) {
                        /* If non-hashed subvol's have responded, proceed */

---> No check is performed here to see if the fop succeeded. rmdir is wound to the hashed subvol and succeeds.

                        local->need_selfheal = 0;
                        STACK_WIND (frame, dht_rmdir_hashed_subvol_cbk,
                                    &local->loc, local->flags, NULL);
                } else if (!this_call_cnt) {

Now dir4 exists on the non-hashed subvol but not the hashed subvol.
ls dir1/dir2/dir3 shows no entries but rmdir dir1/dir2/dir3 returns ENOTEMPTY.

Comment 2 Vijay Bellur 2016-04-25 10:39:46 UTC
REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#1) for review on master by N Balachandran (nbalacha@redhat.com)

Comment 3 Vijay Bellur 2016-04-27 16:59:50 UTC
REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#2) for review on master by N Balachandran (nbalacha@redhat.com)

Comment 4 Vijay Bellur 2016-04-28 10:05:02 UTC
REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#3) for review on master by N Balachandran (nbalacha@redhat.com)

Comment 5 Vijay Bellur 2016-05-02 09:17:10 UTC
COMMIT: http://review.gluster.org/14060 committed in master by Raghavendra G (rgowdapp@redhat.com) 
commit 78c1c6002f0b11afa997a14f8378c04f257ea1c5
Author: N Balachandran <nbalacha@redhat.com>
Date:   Mon Apr 25 16:02:10 2016 +0530

    cluster/dht: Handle rmdir failure correctly
    DHT did not handle rmdir failures on non-hashed subvols
    correctly in a 2x2 dist-rep volume, causing the
    directory do be deleted from the hashed subvol.
    Also fixed an issue where the dht_selfheal_restore
    errcodes were overwriting the rmdir error codes.
    Change-Id: If2c6f8dc8ee72e3e6a7e04a04c2108243faca468
    BUG: 1330032
    Signed-off-by: N Balachandran <nbalacha@redhat.com>
    Reviewed-on: http://review.gluster.org/14060
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Raghavendra G <rgowdapp@redhat.com>

Comment 6 Shyamsundar 2017-03-27 18:12:32 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report.

glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.