Description of problem: In a consistent scenario, when rm -rf is performed on Master volume (Fuse/NFS). The slave logs the below errors and fails to remove from the slave volume. Geo-Rep continue to retry removal and after a while the files/directories do get remove. [2015-06-24 17:10:10.844609] W [resource(slave):692:entry_ops] <top>: Recursive remove 270bb38f-fd2e-4cad-af38-200beb35fd68 => .gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profilesfailed: Directory not empty [2015-06-24 17:10:10.857244] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['270bb38f-fd2e-4cad-af38-200beb35fd68', '.gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profiles', '.gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profiles'])...[Errno 39] Directory not empty: '.gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profiles' [2015-06-24 17:10:10.857528] W [resource(slave):692:entry_ops] <top>: Recursive remove 270bb38f-fd2e-4cad-af38-200beb35fd68 => .gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profilesfailed: Directory not empty [2015-06-24 17:10:13.361917] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['270bb38f-fd2e-4cad-af38-200beb35fd68', '.gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profiles', '.gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profiles'])...[Errno 39] Directory not empty: '.gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profiles' [2015-06-24 17:10:13.362207] W [resource(slave):692:entry_ops] <top>: Recursive remove 270bb38f-fd2e-4cad-af38-200beb35fd68 => .gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/tune-profilesfailed: Directory not empty [2015-06-24 17:10:18.390331] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops [], [ENOTEMPTY, ESTALE, ENODATA]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir errno_wrap(os.rmdir, [path], [ENOENT, ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/e32ea6ee-9f46-46f2-8816-51648960fc0f/alternatives' [2015-06-24 17:10:18.398015] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. [2015-06-24 17:10:18.398405] I [syncdutils(slave):220:finalize] <top>: exiting. Other Errors logged are: ========================= grep "OSError" /var/log/glusterfs/geo-replication-slaves/9c0db153-6b18-4b92-bcbd-8448fba042ce\:gluster%3A%2F%2F127.0.0.1%3Aslave.log OSError: [Errno 107] Transport endpoint is not connected: '.gfid/00546903-6a61-4ede-a703-7a00a5f3b22f/X11/fontpath.d' OSError: [Errno 107] Transport endpoint is not connected: '.gfid/72fc70a8-ecad-4f2e-80a6-605ab1d5681e/redhat-lsb' raise OSError(errn, os.strerror(errn)) OSError: [Errno 117] Structure needs cleaning OSError: [Errno 107] Transport endpoint is not connected: '.gfid/547f2de5-7971-4323-837e-6ecf308a36c9/cluster/cman-notify.d' raise OSError(errn, os.strerror(errn)) OSError: [Errno 117] Structure needs cleaning OSError: [Errno 117] Structure needs cleaning: '.gfid/53c7d4b5-a4cb-4b77-bac8-d9476b77dec1/rhsm/pluginconf.d' Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.7.1-5.el6rhs.x86_64 How reproducible: ================= Always Steps to Reproduce: =================== 1. Create Master Cluster with 4 nodes 2. Create Slave Cluster with 2 nodes 3. Create and Start Master volume (4x2) 4. Create and Start Slave volume (2x2) 5. Create and Start Meta Volume (1x3) 6. Create password-less ssh between node1 of master to node1 of slave 7. Create geo-rep session between master and slave 8. Config the session to use_meta_volume true 9. Start the geo-rep session 10. Mount the master and slave volume on client (Fuse & NFS) 11. From the fuse mount of master volume create data. I used: for i in {1..10}; do cp -rf /etc etc.$i ; done for i in {1..100}; do dd if=/dev/zero of=$i bs=10M count=1 ; done for i in {1..10}; do cp -rf /etc r$i ; done 12. From NFS mount of master volume create data. I used: for i in {11..20}; do cp -rf /etc arm.$i ; done for i in {1..200}; do dd if=/dev/zero of=nfs.$i bs=1M count=1 ; done 13. Wait for files to sync to slave. Mount the slave volume and check arequal/ ls -lRT | wc etc. 14. Once the files are synced successfully. Do "rm -rf arm.*" from fuse mount and "rm -rf r*" In a while you should start seeing lot of errors on Master log file and Slave log file. Master File Location: ===================== /var/log/glusterfs/geo-replication/master/ Slave File Location: ==================== /var/log/glusterfs/geo-replication-slaves/
One of the main issue of "Directory not empty" error on slave is the race between the changelogs which are written to the slave. Eg: A volume has 2 subvols. There is a single directory dir1 with a single file file1 hashing to subvol2. changelog for subvol1 has - rmdir(Dir1) changelog for subvol2 has - rm file1 followed by rmdir(Dir1) However if changelog for subvol1 is carried out before subvol2, it would result in deleting Dir1, without deleting file1. Hence we get "Directory not empty" error on slave.
Doc text is edited. Please sign off to be included in Known Issues.
Included the edited text.
For records: Hitting this bug with build: glusterfs-3.7.5-15.el7rhgs.x86_64 {3.1.2}
*** This bug has been marked as a duplicate of bug 1310194 ***