+++ This bug was initially created as a clone of Bug #1330032 +++ +++ This bug was initially created as a clone of Bug #1329514 +++ Description of problem: 2x2 distributed replicate volume. 1) Customer were using fuse client. When they were deleting a directory (which looked empty when "ls" was done on the mount point) they were getting ENOTEMPTY errors. 2) Installed glusterfs-debuginfo package, and then attached to the glusterfs client process via gdb. 3) When gdb was attached and breakpoints were put, found that the directory being removed was not empty on all the nodes. The directory being removed (name "Dir45") was empty on one distribute subvolumes. The other subvolume had an empty sub directory inside it (the name of the subdirectory is "bucket7"). "ls" of Dir45 was not showing bucket7 to be present. But backend had that subdirectory. When ls on the subdirectory was explicitly done, then it was healed by distribute and "ls" on its parent directory (i.e. Dir45) started showing that entry. Version-Release number of selected component (if applicable): 3.1.1 --- Additional comment from Nithya Balachandran on 2016-04-25 06:31:34 EDT --- Steps to reproduce the issue: 1. Create a 2x2 dist-rep volume. 2. Set cluster.quorum-type to auto. 3. NFS mount the volume 4. mkdir -p dir1/dir2/dir3/dir4 5. Kill the first brick process for the non-hashed subvol for dir4 6. Try to delete dir4 : rmdir dir1/dir2/dir3/dir4 Expected results: rmdir fails with EROFS. The directory should exist on all the bricks. Actual results: rmdir fails with EROFS but the directory is deleted from the hashed subvol. RCA: In dht_rmdir_cbk: if (op_ret == -1) { if ((op_errno != ENOENT) && (op_errno != ESTALE)) { local->op_errno = op_errno; local->op_ret = -1; if (op_errno != EACCES) local->need_selfheal = 1; <-- local->need_selfheal is set to 1 as op_errno is EROFS. } gf_uuid_unparse(local->loc.gfid, gfid); gf_msg_debug (this->name, op_errno, "rmdir on %s for %s failed." "(gfid = %s)", prev->this->name, local->loc.path, gfid); goto unlock; } However, local->fop_succeeded is still 0 as there are only 2 subvols. } else if (this_call_cnt) { /* If non-hashed subvol's have responded, proceed */ ---> No check is performed here to see if the fop succeeded. rmdir is wound to the hashed subvol and succeeds. local->need_selfheal = 0; STACK_WIND (frame, dht_rmdir_hashed_subvol_cbk, local->hashed_subvol, local->hashed_subvol->fops->rmdir, &local->loc, local->flags, NULL); } else if (!this_call_cnt) { Now dir4 exists on the non-hashed subvol but not the hashed subvol. ls dir1/dir2/dir3 shows no entries but rmdir dir1/dir2/dir3 returns ENOTEMPTY. --- Additional comment from Vijay Bellur on 2016-04-25 06:39:46 EDT --- REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#1) for review on master by N Balachandran (nbalacha) --- Additional comment from Vijay Bellur on 2016-04-27 12:59:50 EDT --- REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#2) for review on master by N Balachandran (nbalacha) --- Additional comment from Vijay Bellur on 2016-04-28 06:05:02 EDT --- REVIEW: http://review.gluster.org/14060 (cluster/dht: Handle rmdir failure correctly) posted (#3) for review on master by N Balachandran (nbalacha)
REVIEW: http://review.gluster.org/14123 (cluster/dht: Handle rmdir failure correctly) posted (#1) for review on release-3.7 by Raghavendra G (rgowdapp)
COMMIT: http://review.gluster.org/14123 committed in release-3.7 by Raghavendra G (rgowdapp) ------ commit 534ca9864907a677551e7aff57e5c021d035303d Author: N Balachandran <nbalacha> Date: Mon Apr 25 16:02:10 2016 +0530 cluster/dht: Handle rmdir failure correctly DHT did not handle rmdir failures on non-hashed subvols correctly in a 2x2 dist-rep volume, causing the directory do be deleted from the hashed subvol. Also fixed an issue where the dht_selfheal_restore errcodes were overwriting the rmdir error codes. Change-Id: If2c6f8dc8ee72e3e6a7e04a04c2108243faca468 BUG: 1331933 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: http://review.gluster.org/14123 Smoke: Gluster Build System <jenkins.com> Tested-by: Raghavendra G <rgowdapp> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report. glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user