Description of problem: If there is lot of data creation and deletion happening, there will be lot of failed to sync logs in geo-rep log file. like this, [2013-07-11 11:02:15.113795] W [master(/bricks/brick3):837:regjob] _GMaster: failed to sync .gfid/bf06b56b-94ae-4617- 9d9e-1d8618ee246e [2013-07-11 11:02:15.116051] W [master(/bricks/brick3):837:regjob] _GMaster: failed to sync .gfid/cd342722-2e99-4372- 9257-2a2e80a241f1 [2013-07-11 11:02:15.118213] W [master(/bricks/brick3):837:regjob] _GMaster: failed to sync .gfid/b4581b84-e9d9-419a- 9b56-b77903526505 There will be few trace-backs like , ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [2013-07-11 11:02:16.35012] E [repce(/bricks/brick3):188:__call__] RepceClient: call 3272:140181893072640:1373520735.2 (entry_ops) failed on peer with OSError [2013-07-11 11:02:16.35907] E [syncdutils(/bricks/brick3):206:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 133, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 510, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1060, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 525, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 928, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 908, in process self.process_change(change) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 899, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__ raise res OSError: [Errno 11] Resource temporarily unavailable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Which results in that particular session going to faulty for sometimes. Version-Release number of selected component (if applicable):3.4.0.12rhs.beta3-1.el6rhs.x86_64 How reproducible: Observed it once. Steps to Reproduce: 1.Create and start a geo-rep relationship between master and slave. 2.On master create and remove files in loop, overnight. We can use " while : ;do ./crefi -n 100 --multi -b 10 -d 10 --random --max=500K --min=10 <MNT_PNT>; sleep(500); rm -rf <MNT_PNT>/* ; done Actual results: The logs have lot failed to sync messages, and results in one of the session not syncing Expected results: Even if there are some failures, it should revive itself quickly, and start syncing. Additional info: The slave logs had something like this , [2013-07-11 06:02:00.031158] W [fuse-bridge.c:2334:fuse_create_cbk] 0-glusterfs-fuse: 139561: <gfid:00000000-0000-0000-0000-00000000000d>/c29369ac-db3a-4a33-8ade-973820d01f15 => -1 (No such file or directory) [2013-07-11 06:02:00.031363] W [defaults.c:1291:default_release] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta3/xlator/cluster/distribute.so(dht_create+0x390) [0x7f7071f4e740] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta3/xlator/cluster/distribute.so(dht_local_wipe+0xa7) [0x7f7071f38f67] (-->/usr/lib64/libglusterfs.so.0(fd_unref+0x13b) [0x3956a3928b]))) 0-fuse: xlator does not implement release_cbk [2013-07-11 06:02:00.074124] W [fuse-bridge.c:2334:fuse_create_cbk] 0-glusterfs-fuse: 139564: <gfid:00000000-0000-0000-0000-00000000000d>/c3953efa-9dc5-44dc-ad07-506a6355acbb => -1 (No such file or directory) [2013-07-11 06:02:00.074335] W [defaults.c:1291:default_release] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta3/xlator/cluster/distribute.so(dht_create+0x390) [0x7f7071f4e740] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta3/xlator/cluster/distribute.so(dht_local_wipe+0xa7) [0x7f7071f38f67] (-->/usr/lib64/libglusterfs.so.0(fd_unref+0x13b) [0x3956a3928b]))) 0-fuse: xlator does not implement release_cbk [2013-07-11 06:02:00.080759] W [fuse-bridge.c:2334:fuse_create_cbk] 0-glusterfs-fuse: 139567: <gfid:00000000-0000-0000-0000-00000000000d>/c3f97a3c-856a-43bc-8ca2-012a4d82a258 => -1 (No such file or directory) [2013-07-11 06:02:00.080970] W [defaults.c:1291:default_release] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta3/xlator/cluster/distribute.so(dht_create+0x390) [0x7f7071f4e740] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta3/xlator/cluster/distribute.so(dht_local_wipe+0xa7) [0x7f7071f38f67] (-->/usr/lib64/libglusterfs.so.0(fd_unref+0x13b) [0x3956a3928b]))) 0-fuse: xlator does not implement release_cbk
There are many these entries in gsyncd auxiliary mount client logs: 583aaa20-e2e9-4e78-ac0f-83cf5ee31d75:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2013-07-12 12:12:24.444442] W [fuse-bridge.c:2334:fuse_create_cbk] 0-glusterfs-fuse: 659827: <gfid:00000000-0000-0000-0000-00000000000d>/aebb1cad-ec43-4532-a9d2-de24671c65b5 => -1 (No such file or directory) 583aaa20-e2e9-4e78-ac0f-83cf5ee31d75:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2013-07-12 12:12:32.621215] W [fuse-bridge.c:2334:fuse_create_cbk] 0-glusterfs-fuse: 660472: <gfid:00000000-0000-0000-0000-00000000000d>/aebb1cad-ec43-4532-a9d2-de24671c65b5 => -1 (No such file or directory) 583aaa20-e2e9-4e78-ac0f-83cf5ee31d75:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2013-07-12 12:12:46.993507] W [fuse-bridge.c:2334:fuse_create_cbk] 0-glusterfs-fuse: 662023: <gfid:00000000-0000-0000-0000-00000000000d>/aebb1cad-ec43-4532-a9d2-de24671c65b5 => -1 (No such file or directory) 583aaa20-e2e9-4e78-ac0f-83cf5ee31d75:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2013-07-12 12:13:08.500603] W [fuse-bridge.c:2334:fuse_create_cbk] 0-glusterfs-fuse: 664403: <gfid:00000000-0000-0000-0000-00000000000d>/aebb1cad-ec43-4532-a9d2-de24671c65b5 => -1 (No such file or directory) 583aaa20-e2e9-4e78-ac0f-83cf5ee31d75:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2013-07-12 12:13:33.706942] W [fuse-bridge.c:2334:fuse_create_cbk] 0-glusterfs-fuse: 667261: <gfid:00000000-0000-0000-0000-00000000000d>/aebb1cad-ec43-4532-a9d2-de24671c65b5 => -1 (No such file or directory) ------------------------------------------------------------------------- It's from fuse_create_cbk() mentioning a create failed because of missing parent gfid '00000000-0000-0000-0000-00000000000d' Shouldn't this be the root gfid (0x1) instead of the virtual gfid (0xd) ?
*** Bug 983572 has been marked as a duplicate of this bug. ***
verified on glusterfs-3.4.0.17rhs-1.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html