Description of problem: gsyncd crashed in syncdutils.py while removing a file. I have observed this crash many time, while removing different files. Python back trace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2014-01-16 15:20:54.420363] I [master(/bricks/master_brick1):451:crawlwrap] _GMaster: 20 crawls, 0 turns [2014-01-16 15:21:37.910284] E [syncdutils(/bricks/master_brick1):240:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1157, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 476, in crawlwrap time.sleep(self.sleep_interval) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 331, in <lambda> def set_term_handler(hook=lambda *a: finalize(*a, **{'exval': 1})): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 184, in finalize shutil.rmtree(gconf.ssh_ctl_dir) File "/usr/lib64/python2.6/shutil.py", line 217, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "/usr/lib64/python2.6/shutil.py", line 215, in rmtree os.remove(fullname) OSError: [Errno 2] No such file or directory: '/tmp/gsyncd-aux-ssh-8CWIhl/061fc87d252b63093ab9bfb765588973.sock' [2014-01-16 15:21:37.911117] E [syncdutils(/bricks/master_brick1):223:log_raise_exception] <top>: connection to peer is broken [2014-01-16 15:21:37.917700] E [resource(/bricks/master_brick1):204:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-8CWIhl/061fc87d252b63093ab9bfb765588973.sock root.43.174 /nonexistent/gsyncd --session-owner 47fa81ef-44a3-4fb6-b58e-cb4a81fa5b44 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying: [2014-01-16 15:21:37.918075] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.858181] I [socket.c:3505:socket_init] 0-glusterfs: SSL support is NOT enabled [2014-01-16 15:21:37.918354] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.858259] I [socket.c:3520:socket_init] 0-glusterfs: using system polling thread [2014-01-16 15:21:37.918692] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.859676] I [socket.c:3505:socket_init] 0-glusterfs: SSL support is NOT enabled >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version-Release number of selected component (if applicable): glusterfs-3.4.0.57rhs-1 How reproducible: Doesn't happen everytime Steps to Reproduce: Don't know exact steps. 1.create and start a geo-rep relationship between master and slave. 2.start creating files on master and slave. 3. check the geo-rep logs. Actual results: gsyncd crashed while removing some file Expected results: gsyncd should never crash. Additional info:
Upstream patch sent: http://review.gluster.org/#/c/9792/
Upstream patch is merged.
Have tried remove cases along with killing worker with build: glusterfs-3.7.1-9.el6rhs.x86_64 Didn't see this crash. Moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html