Bug 1010327 - Dist-geo-rep : session status is defunct after syncdutils.py errors in log
Dist-geo-rep : session status is defunct after syncdutils.py errors in log
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
medium Severity medium
: ---
: RHGS 3.1.0
Assigned To: Aravinda VK
Rahul Hinduja
:
Depends On:
Blocks: 1202842 1223636
  Show dependency treegraph
 
Reported: 2013-09-20 10:07 EDT by Rachana Patel
Modified: 2015-07-29 00:29 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.7.0-2.el6rhs
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-29 00:29:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rachana Patel 2013-09-20 10:07:41 EDT
Description of problem:
Dist-geo-rep : after remove-brick commit operation, one geo rep instance get killed and syncdutils.py errors  are found in log. geo rep session is defunct after that

Version-Release number of selected component (if applicable):
3.4.0.33rhs-1.el6rhs.x86_64

How reproducible:
haven't tried

Steps to Reproduce:
1.  create and start dist-rep volume and mount it.Start creating data on master volume from mount point. 

mount point:-
mount | grep remove_xsync
10.70.35.179:/remove_xsync on /mnt/remove_xsync type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
10.70.35.179:/remove_xsync on /mnt/remove_xsync_nfs type nfs (rw,addr=10.70.35.179)

2, create and start geo rep session between master and slave volume.

3. remove brick(s) from master volume with start option.

--> gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 start

4. once remove-brick is completed perform commit operation
 gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 status
 gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 commit

[root@old5 ~]# gluster v info remove_change
 
Volume Name: remove_change
Type: Distributed-Replicate
Volume ID: eb500199-37d4-4cb9-96ed-ae5bc1bf2498
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.179:/rhs/brick3/c1
Brick2: 10.70.35.235:/rhs/brick3/c1
Brick3: 10.70.35.179:/rhs/brick3/c2
Brick4: 10.70.35.235:/rhs/brick3/c2
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on

5. after few time status was defunct and log has Traceback as below
[root@old6 ~]# gluster vol geo remove_xsync status
	NODE                           MASTER          SLAVE                               HEALTH     UPTIME         
---------------------------------------------------------------------------------------------------------
old6.lab.eng.blr.redhat.com    remove_xsync    ssh://10.70.37.195::remove_xsync    defunct    N/A            
old5.lab.eng.blr.redhat.com    remove_xsync    ssh://10.70.37.195::remove_xsync    Stable     16:11:35   


log snippet:-
[2013-09-16 14:58:43.673831] E [syncdutils(monitor):207:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 233, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 203, in wmon
    cpid, _ = self.monitor(w, argv, cpids)
  File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 161, in monitor
    self.terminate()
  File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 89, in terminate
    set_term_handler(lambda *a: set_term_handler())
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 299, in set_term_handler
    signal(SIGTERM, hook)
ValueError: signal only works in main thread
[2013-09-16 14:58:44.734586] E [syncdutils(monitor):207:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 233, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 203, in wmon
    cpid, _ = self.monitor(w, argv, cpids)
  File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 161, in monitor
    self.terminate()
  File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 89, in terminate
    set_term_handler(lambda *a: set_term_handler())
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 299, in set_term_handler
    signal(SIGTERM, hook)
ValueError: signal only works in main thread
[2013-09-16 14:58:47.82674] I [syncdutils(monitor):159:finalize] <top>: exiting.

Actual results:
status was defunct and log has Traceback

Expected results:
log should not have traceback . If process was killed due to some reason, it should have entry for that.
Not able to get reason behind defunct

Additional info:
Comment 9 Rahul Hinduja 2015-07-16 08:29:10 EDT
Verified with build: glusterfs-3.7.1-10.el6rhs.x86_64

We have additional step to stop the geo-rep session before commit. Didn't observe the status going to defunct state. Also similar bugs 1002991 and 1044420 are moved to verified. 

Moving this bug to verified state too. Will create or reopen the bug with proper steps to reproduce incase we hit again.
Comment 12 errata-xmlrpc 2015-07-29 00:29:08 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.