1054154 – dist-geo-rep : gsyncd crashed in syncdutils.py while removing a file.

Bug 1054154 - dist-geo-rep : gsyncd crashed in syncdutils.py while removing a file.

Summary: dist-geo-rep : gsyncd crashed in syncdutils.py while removing a file.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Aravinda VK
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:	usability
Depends On:
Blocks:	1198101 1202842 1223636
TreeView+	depends on / blocked

Reported:	2014-01-16 11:03 UTC by Vijaykumar Koppad
Modified:	2015-07-29 04:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.7.0-2.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1198101 (view as bug list)
Environment:
Last Closed:	2015-07-29 04:33:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Vijaykumar Koppad 2014-01-16 11:03:03 UTC

Description of problem: gsyncd crashed in syncdutils.py while removing a file. I have observed this crash many time, while removing different files.

Python back trace
 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-16 15:20:54.420363] I [master(/bricks/master_brick1):451:crawlwrap] _GMaster: 20 crawls, 0 turns
[2014-01-16 15:21:37.910284] E [syncdutils(/bricks/master_brick1):240:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1157, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 476, in crawlwrap
    time.sleep(self.sleep_interval)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 331, in <lambda>
    def set_term_handler(hook=lambda *a: finalize(*a, **{'exval': 1})):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 184, in finalize
    shutil.rmtree(gconf.ssh_ctl_dir)
  File "/usr/lib64/python2.6/shutil.py", line 217, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "/usr/lib64/python2.6/shutil.py", line 215, in rmtree
    os.remove(fullname)
OSError: [Errno 2] No such file or directory: '/tmp/gsyncd-aux-ssh-8CWIhl/061fc87d252b63093ab9bfb765588973.sock'
[2014-01-16 15:21:37.911117] E [syncdutils(/bricks/master_brick1):223:log_raise_exception] <top>: connection to peer is broken
[2014-01-16 15:21:37.917700] E [resource(/bricks/master_brick1):204:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-8CWIhl/061fc87d252b63093ab9bfb765588973.sock root.43.174 /nonexistent/gsyncd --session-owner 47fa81ef-44a3-4fb6-b58e-cb4a81fa5b44 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying:
[2014-01-16 15:21:37.918075] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.858181] I [socket.c:3505:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-01-16 15:21:37.918354] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.858259] I [socket.c:3520:socket_init] 0-glusterfs: using system polling thread
[2014-01-16 15:21:37.918692] E [resource(/bricks/master_brick1):207:logerr] Popen: ssh> [2014-01-15 12:33:49.859676] I [socket.c:3505:socket_init] 0-glusterfs: SSL support is NOT enabled


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable): glusterfs-3.4.0.57rhs-1


How reproducible: Doesn't happen everytime


Steps to Reproduce:
Don't know exact steps.
1.create and start a geo-rep relationship between master and slave. 
2.start creating files on master and slave. 
3. check the geo-rep logs. 

Actual results: gsyncd crashed while removing some file 


Expected results: gsyncd should never crash. 
 

Additional info:

Comment 3 Aravinda VK 2015-03-03 12:01:33 UTC

Upstream patch sent: http://review.gluster.org/#/c/9792/

Comment 4 Aravinda VK 2015-03-06 03:20:29 UTC

Upstream patch is merged.

Comment 9 Rahul Hinduja 2015-07-17 12:32:06 UTC

Have tried remove cases along with killing worker with build: glusterfs-3.7.1-9.el6rhs.x86_64 

Didn't see this crash. Moving this bug to verified state.

Comment 11 errata-xmlrpc 2015-07-29 04:33:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.