Bug 999825 - Dist-geo-rep : worker process dies and started again frequently
Summary: Dist-geo-rep : worker process dies and started again frequently
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: ---
Assignee: Venky Shankar
QA Contact: amainkar
URL:
Whiteboard:
Depends On:
Blocks: 1003803
TreeView+ depends on / blocked
 
Reported: 2013-08-22 08:37 UTC by Rachana Patel
Modified: 2015-04-20 11:58 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.4.0.24rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1003803 (view as bug list)
Environment:
Last Closed: 2013-09-23 22:38:51 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Rachana Patel 2013-08-22 08:37:54 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Rachana Patel 2013-08-22 08:41:50 UTC

Description of problem:
Dist-geo-rep : worker process dies and started again frequently

Version-Release number of selected component (if applicable):
3.4.0.20rhs-2.el6_4.x86_64

How reproducible:
always

Steps to Reproduce:
1.master cluster - 5 node ; volume - master1 (3x2)
mounted as FUSE on client and created data
[root@rhs-client22 nufa]# df -h /mnt/master1
Filesystem            Size  Used Avail Use% Mounted on
10.70.37.128:master1  150G  126G   25G  84% /mnt/master1

2. created geo rep session between master and slave cluster


3. check status after some times - restarted gsyncd worker after some times

[root@DVM1 nufa]# gluster volume geo master1 10.70.37.219::slave1 status
NODE                           MASTER     SLAVE                   HEALTH    UPTIME         
---------------------------------------------------------------------------------------
DVM1.lab.eng.blr.redhat.com    master1    10.70.37.219::slave1    Stable    00:03:45       
DVM2.lab.eng.blr.redhat.com    master1    10.70.37.219::slave1    Stable    01:48:11       
DVM5.lab.eng.blr.redhat.com    master1    10.70.37.219::slave1    Stable    00:20:17       
DVM4.lab.eng.blr.redhat.com    master1    10.70.37.219::slave1    faulty    N/A            
DVM6.lab.eng.blr.redhat.com    master1    10.70.37.219::slave1    Stable    01:48:11     

log snippet:-

[2013-08-22 06:01:17.362781] I [monitor(monitor):81:set_state] Monitor: new state: Stable
[2013-08-22 06:04:34.478760] I [master(/rhs/brick1):878:crawl] _GMaster: processing xsync changelog /var/run/gluster/master1/ssh%3A%2F%
2Froot%4010.70.37.219%3Agluster%3A%2F%2F127.0.0.1%3Aslave1/85acebcd7c65ee7c4550f76de44279a9/xsync/XSYNC-CHANGELOG.1377131419
[2013-08-22 06:04:49.218578] E [syncdutils(/rhs/brick1):206:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 133, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 513, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1059, in service_loop
    g1.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 369, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 880, in crawl
    self.process([self.fname()], done)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 734, in process
    if self.process_change(change, done, retry):
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 696, in process_change
    entries.append(edct(ty, stat=st, entry=en, gfid=gfid, link=os.readlink(en)))
OSError: [Errno 2] No such file or directory: '.gfid/572fabcb-e34f-4d09-889e-c2e99b0765ac/sbin-ip6tables-save.x86_64'
[2013-08-22 06:04:49.221683] I [syncdutils(/rhs/brick1):158:finalize] <top>: exiting.
[2013-08-22 06:04:49.236047] I [monitor(monitor):81:set_state] Monitor: new state: faulty



Actual results:


Expected results:


Additional info:

Comment 4 Amar Tumballi 2013-08-27 10:18:02 UTC
https://code.engineering.redhat.com/gerrit/#/c/12027

Comment 5 Amar Tumballi 2013-08-27 10:29:34 UTC
https://code.engineering.redhat.com/gerrit/#/c/12029

Comment 6 Rachana Patel 2013-09-08 16:16:20 UTC
not able to reproduce with 3.4.0.32rhs-1.el6_4.x86_64 hence marking as verified

Comment 7 Scott Haines 2013-09-23 22:38:51 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 8 Scott Haines 2013-09-23 22:41:31 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.