Directory removal and self-heal operations could occur simultaneously. This could cause a number of problems such as directories that had been intentionally removed being healed, access problems, and difficulty deleting parent directories that appeared empty but were not. The rmdir and self-heal operations now block while executing, so these problems no longer occur.
Description of problem:
=======================
In a distribute-replicate volume when 'rm -rf *' is performed from multiple mounts, the directories are removed from some sub-volumes but are not removed from other sub-volumes.
Because of this , "rm -rf <directory>" fails with "directory not empty". When we do "ls -l <directory>" the directory is empty.
Version-Release number of selected component (if applicable):
==============================================================
glusterfs 3.6.0.22 built on Jun 23 2014 10:33:07
How reproducible:
====================
Often
Steps to Reproduce:
====================
1. Create distribute-replicate volume. Start the volume.
2. Create 2 fuse mounts and 2 nfs mount or all 4 fuse mounts.
3. Create directories ( mkdir -p A{1..1000}/B{1..20}/C{1..20} )
4. From all the mount points execute "rm -rf *"
Actual results:
====================
root@dj [Jul-02-2014- 1:14:38] >rm -rf *
rm: cannot remove `A11': Directory not empty
rm: cannot remove `A111': Directory not empty
rm: cannot remove `A137': Directory not empty
rm: cannot remove `A151/B18': Directory not empty
rm: cannot remove `A153': Directory not empty
rm: cannot remove `A163': Directory not empty
rm: cannot remove `A204': Directory not empty
rm: cannot remove `A480/B16': Directory not empty
On sub-volume1:
===================
brick1:
~~~~~~~~~
root@rhs-client11 [Jul-02-2014-14:40:48] >ls -l /rhs/device0/rep_brick1/A11
total 0
drwxr-xr-x 3 root root 15 Jul 2 12:32 B19
root@rhs-client11 [Jul-02-2014-14:40:50] >
brick2:
~~~~~~~~~
root@rhs-client12 [Jul-02-2014-14:40:48] >ls -l /rhs/device0/rep_brick2/A11
total 0
drwxr-xr-x 3 root root 15 Jul 2 12:32 B19
root@rhs-client12 [Jul-02-2014-14:40:50] >
On sub-volume2:
====================
brick3:
~~~~~~~
root@rhs-client13 [Jul-02-2014-14:40:48] >ls -l /rhs/device0/rep_brick3/A11
total 0
root@rhs-client13 [Jul-02-2014-14:40:50] >
brick4:
~~~~~~~~
root@rhs-client14 [Jul-02-2014-14:40:48] >ls -l /rhs/device0/rep_brick4/A11
total 0
root@rhs-client14 [Jul-02-2014-14:40:50] >
root@rhs-client14 [Jul-02-2014-14:40:51] >
Expected results:
==================
The directories should be removed from all the subvolumes.
Additional info:
==================
root@mia [Jul-02-2014-14:42:57] >gluster v info rep
Volume Name: rep
Type: Distributed-Replicate
Volume ID: d8d69cec-8bdd-4c9d-b5f5-972b36716b0b
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/device0/rep_brick1
Brick2: rhs-client12:/rhs/device0/rep_brick2
Brick3: rhs-client13:/rhs/device0/rep_brick3
Brick4: rhs-client14:/rhs/device0/rep_brick4
Options Reconfigured:
features.uss: disable
server.statedump-path: /var/run/gluster/statedumps
features.barrier: disable
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
root@mia [Jul-02-2014-14:43:01] >
root@mia [Jul-02-2014-14:43:02] >gluster v status rep
Status of volume: rep
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/device0/rep_brick1 49154 Y 2890
Brick rhs-client12:/rhs/device0/rep_brick2 49154 Y 5472
Brick rhs-client13:/rhs/device0/rep_brick3 49153 Y 2869
Brick rhs-client14:/rhs/device0/rep_brick4 49153 Y 5433
NFS Server on localhost 2049 Y 32441
Self-heal Daemon on localhost N/A Y 27961
NFS Server on rhs-client13 2049 Y 20245
Self-heal Daemon on rhs-client13 N/A Y 2858
NFS Server on 10.70.36.35 2049 Y 20399
Self-heal Daemon on 10.70.36.35 N/A Y 2885
NFS Server on rhs-client12 2049 Y 11226
Self-heal Daemon on rhs-client12 N/A Y 5494
NFS Server on rhs-client14 2049 Y 11211
Self-heal Daemon on rhs-client14 N/A Y 5455
Task Status of Volume rep
------------------------------------------------------------------------------
There are no active volume tasks
root@mia [Jul-02-2014-14:43:05] >
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
Comment 22krishnaram Karthick
2016-05-12 10:03:10 UTC
Verified the fix in build - glusterfs-3.7.9-4 on both NFS and Fuse mounts separately. The issue reported in this bug was not seen. i.e., 'directory not empty' errors were not seen.
tests that were run to validate the fix:
1) parallel rm -rf from different mount points on the same directory with lots of sub-dirs
2) rm -rf + lookups from different mount points on the same directory with lots of sub-dirs
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2016:1240
Description of problem: ======================= In a distribute-replicate volume when 'rm -rf *' is performed from multiple mounts, the directories are removed from some sub-volumes but are not removed from other sub-volumes. Because of this , "rm -rf <directory>" fails with "directory not empty". When we do "ls -l <directory>" the directory is empty. Version-Release number of selected component (if applicable): ============================================================== glusterfs 3.6.0.22 built on Jun 23 2014 10:33:07 How reproducible: ==================== Often Steps to Reproduce: ==================== 1. Create distribute-replicate volume. Start the volume. 2. Create 2 fuse mounts and 2 nfs mount or all 4 fuse mounts. 3. Create directories ( mkdir -p A{1..1000}/B{1..20}/C{1..20} ) 4. From all the mount points execute "rm -rf *" Actual results: ==================== root@dj [Jul-02-2014- 1:14:38] >rm -rf * rm: cannot remove `A11': Directory not empty rm: cannot remove `A111': Directory not empty rm: cannot remove `A137': Directory not empty rm: cannot remove `A151/B18': Directory not empty rm: cannot remove `A153': Directory not empty rm: cannot remove `A163': Directory not empty rm: cannot remove `A204': Directory not empty rm: cannot remove `A480/B16': Directory not empty On sub-volume1: =================== brick1: ~~~~~~~~~ root@rhs-client11 [Jul-02-2014-14:40:48] >ls -l /rhs/device0/rep_brick1/A11 total 0 drwxr-xr-x 3 root root 15 Jul 2 12:32 B19 root@rhs-client11 [Jul-02-2014-14:40:50] > brick2: ~~~~~~~~~ root@rhs-client12 [Jul-02-2014-14:40:48] >ls -l /rhs/device0/rep_brick2/A11 total 0 drwxr-xr-x 3 root root 15 Jul 2 12:32 B19 root@rhs-client12 [Jul-02-2014-14:40:50] > On sub-volume2: ==================== brick3: ~~~~~~~ root@rhs-client13 [Jul-02-2014-14:40:48] >ls -l /rhs/device0/rep_brick3/A11 total 0 root@rhs-client13 [Jul-02-2014-14:40:50] > brick4: ~~~~~~~~ root@rhs-client14 [Jul-02-2014-14:40:48] >ls -l /rhs/device0/rep_brick4/A11 total 0 root@rhs-client14 [Jul-02-2014-14:40:50] > root@rhs-client14 [Jul-02-2014-14:40:51] > Expected results: ================== The directories should be removed from all the subvolumes. Additional info: ================== root@mia [Jul-02-2014-14:42:57] >gluster v info rep Volume Name: rep Type: Distributed-Replicate Volume ID: d8d69cec-8bdd-4c9d-b5f5-972b36716b0b Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhs-client11:/rhs/device0/rep_brick1 Brick2: rhs-client12:/rhs/device0/rep_brick2 Brick3: rhs-client13:/rhs/device0/rep_brick3 Brick4: rhs-client14:/rhs/device0/rep_brick4 Options Reconfigured: features.uss: disable server.statedump-path: /var/run/gluster/statedumps features.barrier: disable performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable root@mia [Jul-02-2014-14:43:01] > root@mia [Jul-02-2014-14:43:02] >gluster v status rep Status of volume: rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device0/rep_brick1 49154 Y 2890 Brick rhs-client12:/rhs/device0/rep_brick2 49154 Y 5472 Brick rhs-client13:/rhs/device0/rep_brick3 49153 Y 2869 Brick rhs-client14:/rhs/device0/rep_brick4 49153 Y 5433 NFS Server on localhost 2049 Y 32441 Self-heal Daemon on localhost N/A Y 27961 NFS Server on rhs-client13 2049 Y 20245 Self-heal Daemon on rhs-client13 N/A Y 2858 NFS Server on 10.70.36.35 2049 Y 20399 Self-heal Daemon on 10.70.36.35 N/A Y 2885 NFS Server on rhs-client12 2049 Y 11226 Self-heal Daemon on rhs-client12 N/A Y 5494 NFS Server on rhs-client14 2049 Y 11211 Self-heal Daemon on rhs-client14 N/A Y 5455 Task Status of Volume rep ------------------------------------------------------------------------------ There are no active volume tasks root@mia [Jul-02-2014-14:43:05] >