Description of problem: gsyncd process crashed while removing files after remove brick on the master. This happened while removing files from master after the add-brick , rebalance and remove-brick had happened on the master volume. Version-Release number of selected component (if applicable):glusterfs-3.4.0.39rhs-1 How reproducible: Didn't try to reproduce. Steps to Reproduce: 1.Create and start a geo-rep relationship between master and slave. 2.put some data on master and let it sync to slave. 3.add nodes to the cluster and add-bricks to the volume. 4.start creating data on master and parallally start rebalance. 5.let the data to sync and rebalance to complete. 6. Check the geo-rep status. 7. start creating data on master and parallely start remove brick of the bricks added. 8. let the data to sync and remove brick to complete. 9. check the geo-rep status. 10.wait for some time. 11. Start removing files on the master. 12. Check the geo-rep status. Actual results: The geo-rep status for active replica nodes went to faulty and also while removing file it for directory not enpty errors. Expected results: removing of files should happen properly. Additional info: Backtrace in geo-rep log . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2013-11-06 17:08:29.881779] E [repce(/bricks/brick1):188:__call__] RepceClient: call 31582:139962570012416:1383737909.84 (entry_ops) failed on peer with OSError [2013-11-06 17:08:29.882508] E [syncdutils(/bricks/brick1):207:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 535, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1134, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 437, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 858, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 815, in process if self.process_change(change, done, retry): File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 780, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__ raise res OSError: [Errno 61] No data available [2013-11-06 17:08:29.885715] I [syncdutils(/bricks/brick1):159:finalize] <top>: exiting. [2013-11-06 17:08:29.896702] I [monitor(monitor):81:set_state] Monitor: new state: faulty [2013-11-06 17:08:39.910705] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while doing "rm -rf" on master mount point, some failed with errors m: cannot remove `/mnt/master/level08/level18/level28/level38/level48/level58/level68/level78/level88/level98': Directory not empty rm: cannot remove `/mnt/master/level09/level19/level29/level39/level49/level59/level69/level79/level89': Directory not empty corresponding client logs. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2013-11-06 11:28:18.902560] I [client.c:2103:client_rpc_notify] 7-master-client-3: disconnected from 10.70.43.158:491 54. Client process will keep trying to connect to glusterd until brick's port is available. [2013-11-06 11:28:18.902601] E [afr-common.c:3919:afr_notify] 7-master-replicate-1: All subvolumes are down. Going off line until atleast one of them comes back up. [2013-11-06 11:38:32.425362] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-0: remote operation failed: Directory not empty [2013-11-06 11:38:32.425698] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-1: remote operation failed: Directory not empty [2013-11-06 11:38:44.159290] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed: Directory not empty [2013-11-06 11:38:44.159412] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed: Directory not empty [2013-11-06 11:39:12.312085] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed: Directory not empty [2013-11-06 11:39:12.312149] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed: Directory not empty [2013-11-06 11:39:12.315267] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 8-master-client-3: remote operation failed: File exists. Path: /level03/level13/level23/level33/level43/level53/level63/level73/level83 [2013-11-06 11:39:12.315313] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 8-master-client-2: remote operation failed: File exists. Path: /level03/level13/level23/level33/level43/level53/level63/level73/level83 [2013-11-06 11:39:33.039777] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-2: remote operation failed: Directory not empty [2013-11-06 11:39:33.040168] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed: Directory not empty [2013-11-06 11:40:03.002733] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 8-master-client-3: remote operation failed: <>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
again, same as bug 1028343 in back trace. Which is fixed now in .42rhs build. Can this be tested?
considering bg128343 is VERIFIED, moving this bug to ON_QA.
This bug was mainly for gsyncd crashing with "No data available" and on the build glusterfs-3.4.0.44rhs-1 , the gsyncd crash doesn't happen, but the failure of rm is still there. Hence moving this bug as verified and tracking the other issue with this Bug 1030438
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html