Description of problem: ======================= While running the geo-replication automation (snapshot+geo-rep) which does the following in sequence: 1. Creates geo-rep between master and slave 2. for i in {create,chmod,chown,chgrp,symlink,hardlink,truncate,rename,rm -rf} ; 2.a: $i on master 2.b: Let the sync happen to Slave 2.c: Check the number of files to be equal via "find . | wc -l" on master and slave 2.d: Once the count matches, calculate the arequal checksum 2.e: Move to other fop After the rm, the slave count do not match with the master and the errors reported as directory not empty: [2017-06-01 14:28:02.448498] W [resource(slave):733:entry_ops] <top>: Recursive remove 9d197476-ed88-4bf4-8060-414d3a481599 => .gfid/59b8b057-fb3d-4d90-9fbc-8ef205dc1101/level05failed: Directory not empty [2017-06-01 14:28:02.449425] W [syncdutils(slave):506:errno_wrap] <top>: reached maximum retries (['9d197476-ed88-4bf4-8060-414d3a481599', '.gfid/59b8b057-fb3d-4d90-9fbc-8ef205dc1101/level05', '.gfid/59b8b057-fb3d-4d90-9fbc-8ef205dc1101/level05'])...[Errno 39] Directory not empty: '.gfid/59b8b057-fb3d-4d90-9fbc-8ef205dc1101/level05/level15' [2017-06-01 14:28:02.449795] W [resource(slave):733:entry_ops] <top>: Recursive remove 9d197476-ed88-4bf4-8060-414d3a481599 => .gfid/59b8b057-fb3d-4d90-9fbc-8ef205dc1101/level05failed: Directory not empty [2017-06-01 14:28:55.316672] W [syncdutils(slave):506:errno_wrap] <top>: reached maximum retries (['59b8b057-fb3d-4d90-9fbc-8ef205dc1101', '.gfid/00000000-0000-0000-0000-000000000001/thread1', '.gfid/00000000-0000-0000-0000-000000000001/thread1'])...[Errno 39] Directory not empty: '.gfid/00000000-0000-0000-0000-000000000001/thread1/level05/level15' [2017-06-01 14:28:55.317033] W [resource(slave):733:entry_ops] <top>: Recursive remove 59b8b057-fb3d-4d90-9fbc-8ef205dc1101 => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty [2017-06-01 14:28:55.331442] W [syncdutils(slave):506:errno_wrap] <top>: reached maximum retries (['59b8b057-fb3d-4d90-9fbc-8ef205dc1101', '.gfid/00000000-0000-0000-0000-000000000001/thread1', '.gfid/00000000-0000-0000-0000-000000000001/thread1'])...[Errno 39] Directory not empty: '.gfid/00000000-0000-0000-0000-000000000001/thread1/level05/level15' [2017-06-01 14:28:55.331787] W [resource(slave):733:entry_ops] <top>: Recursive remove 59b8b057-fb3d-4d90-9fbc-8ef205dc1101 => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty Directory structure of the slave is: [root@dhcp42-10 slave]# ls -lR .: total 4 drwxr-xr-x. 3 root root 4096 Jun 1 19:58 thread1 ./thread1: total 4 drwxr-xr-x. 3 root root 4096 Jun 1 19:57 level05 ./thread1/level05: total 4 drwx-wxr-x. 2 42131 16284 4096 Jun 1 19:57 level15 ./thread1/level05/level15: total 0 [root@dhcp42-10 slave]# ls on the absolute path and than removal resolves the issue. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-18.4.el7rhgs.x86_64 How reproducible: ================= Rare, seen it once in whole 3.2.0 and again in 3.2.0_async. Total number of times this case would have been executed > 30
After discussing this with Rahul, I am moving this to 3.3.0-beyond. Rahul will try to reproduce this during the regression cycles after enabling debug logs.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0658