Hide Forgot
Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Description of problem: Dist-geo-rep : worker process dies and started again frequently Version-Release number of selected component (if applicable): 3.4.0.20rhs-2.el6_4.x86_64 How reproducible: always Steps to Reproduce: 1.master cluster - 5 node ; volume - master1 (3x2) mounted as FUSE on client and created data [root@rhs-client22 nufa]# df -h /mnt/master1 Filesystem Size Used Avail Use% Mounted on 10.70.37.128:master1 150G 126G 25G 84% /mnt/master1 2. created geo rep session between master and slave cluster 3. check status after some times - restarted gsyncd worker after some times [root@DVM1 nufa]# gluster volume geo master1 10.70.37.219::slave1 status NODE MASTER SLAVE HEALTH UPTIME --------------------------------------------------------------------------------------- DVM1.lab.eng.blr.redhat.com master1 10.70.37.219::slave1 Stable 00:03:45 DVM2.lab.eng.blr.redhat.com master1 10.70.37.219::slave1 Stable 01:48:11 DVM5.lab.eng.blr.redhat.com master1 10.70.37.219::slave1 Stable 00:20:17 DVM4.lab.eng.blr.redhat.com master1 10.70.37.219::slave1 faulty N/A DVM6.lab.eng.blr.redhat.com master1 10.70.37.219::slave1 Stable 01:48:11 log snippet:- [2013-08-22 06:01:17.362781] I [monitor(monitor):81:set_state] Monitor: new state: Stable [2013-08-22 06:04:34.478760] I [master(/rhs/brick1):878:crawl] _GMaster: processing xsync changelog /var/run/gluster/master1/ssh%3A%2F% 2Froot%4010.70.37.219%3Agluster%3A%2F%2F127.0.0.1%3Aslave1/85acebcd7c65ee7c4550f76de44279a9/xsync/XSYNC-CHANGELOG.1377131419 [2013-08-22 06:04:49.218578] E [syncdutils(/rhs/brick1):206:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 133, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 513, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1059, in service_loop g1.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 369, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 880, in crawl self.process([self.fname()], done) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 734, in process if self.process_change(change, done, retry): File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 696, in process_change entries.append(edct(ty, stat=st, entry=en, gfid=gfid, link=os.readlink(en))) OSError: [Errno 2] No such file or directory: '.gfid/572fabcb-e34f-4d09-889e-c2e99b0765ac/sbin-ip6tables-save.x86_64' [2013-08-22 06:04:49.221683] I [syncdutils(/rhs/brick1):158:finalize] <top>: exiting. [2013-08-22 06:04:49.236047] I [monitor(monitor):81:set_state] Monitor: new state: faulty Actual results: Expected results: Additional info:
https://code.engineering.redhat.com/gerrit/#/c/12027
https://code.engineering.redhat.com/gerrit/#/c/12029
not able to reproduce with 3.4.0.32rhs-1.el6_4.x86_64 hence marking as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html