Bug 1198101 - dist-geo-rep : gsyncd crashed in syncdutils.py while removing a file.
Summary: dist-geo-rep : gsyncd crashed in syncdutils.py while removing a file.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Aravinda VK
QA Contact:
URL:
Whiteboard:
Depends On: 1054154
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-03 11:55 UTC by Aravinda VK
Modified: 2015-05-14 17:35 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.7dev-0.821.git0934432.el6
Clone Of: 1054154
Environment:
Last Closed: 2015-05-14 17:26:50 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Aravinda VK 2015-03-03 11:55:02 UTC
+++ This bug was initially created as a clone of Bug #1054154 +++

Description of problem: gsyncd crashed in syncdutils.py while removing a file. I have observed this crash many time, while removing different files.

Python back trace
 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-16 15:20:54.420363] I [master(/bricks/master_brick1):451:crawlwrap] _GMaster: 20 crawls, 0 turns
[2014-01-16 15:21:37.910284] E [syncdutils(/bricks/master_brick1):240:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1157, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 476, in crawlwrap
    time.sleep(self.sleep_interval)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 331, in <lambda>
    def set_term_handler(hook=lambda *a: finalize(*a, **{'exval': 1})):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 184, in finalize
    shutil.rmtree(gconf.ssh_ctl_dir)
  File "/usr/lib64/python2.6/shutil.py", line 217, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "/usr/lib64/python2.6/shutil.py", line 215, in rmtree
    os.remove(fullname)
OSError: [Errno 2] No such file or directory: '/tmp/gsyncd-aux-ssh-8CWIhl/061fc87d252b63093ab9bfb765588973.sock'
[2014-01-16 15:21:37.911117] E [syncdutils(/bricks/master_brick1):223:log_raise_exception] <top>: connection to peer is broken
[2014-01-16 15:21:37.917700] E [resource(/bricks/master_brick1):204:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-8CWIhl/061fc87d252b63093ab9bfb765588973.sock root.43.174 /nonexistent/gsyncd --session-owner 47fa81ef-44a3-4fb6-b58e-cb4a81fa5b44 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying:
[2014-01-16 15:21:37.918075] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.858181] I [socket.c:3505:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-01-16 15:21:37.918354] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.858259] I [socket.c:3520:socket_init] 0-glusterfs: using system polling thread
[2014-01-16 15:21:37.918692] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.859676] I [socket.c:3505:socket_init] 0-glusterfs: SSL support is NOT enabled


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


How reproducible: Doesn't happen everytime


Steps to Reproduce:
Don't know exact steps.
1.create and start a geo-rep relationship between master and slave. 
2.start creating files on master and slave. 
3. check the geo-rep logs. 

Actual results: gsyncd crashed while removing some file 


Expected results: gsyncd should never crash.

Comment 1 Anand Avati 2015-03-03 11:58:41 UTC
REVIEW: http://review.gluster.org/9792 (geo-rep: Handle ENOENT during cleanup) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 2 Anand Avati 2015-03-04 16:46:50 UTC
REVIEW: http://review.gluster.org/9792 (geo-rep: Handle ENOENT during cleanup) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 3 Anand Avati 2015-03-06 02:27:01 UTC
COMMIT: http://review.gluster.org/9792 committed in master by Vijay Bellur (vbellur) 
------
commit 2452d284b38061981d7fbd7e5a7bd15808d13c21
Author: Aravinda VK <avishwan>
Date:   Tue Mar 3 17:22:30 2015 +0530

    geo-rep: Handle ENOENT during cleanup
    
    shutil.rmtree was failing to remove file if file was not
    exists. Added error handling function to ignore ENOENT if
    a file/dir not present.
    
    BUG: 1198101
    Change-Id: I1796db2642f81d9e2b5e52c6be34b4ad6f1c9786
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/9792
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Prashanth Pai <ppai>
    Reviewed-by: Venky Shankar <vshankar>
    Reviewed-by: Kotresh HR <khiremat>
    Reviewed-by: Saravanakumar Arumugam <sarumuga>

Comment 4 Niels de Vos 2015-05-14 17:26:50 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:28:27 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:35:16 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.