Upstream Patch: http://review.gluster.org/#/c/15064/2 (master)
Upstream mainline : http://review.gluster.org/15064 Upstream 3.8 : http://review.gluster.org/15138 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
Thanks to kotresh in helping this race condition verification. With 3.1.3 it crashes and remains faulty. With 3.2.0 it goes to hybrid and continuous. Moving this bug to verified state. On 3.1.3 Set stime to less than start time available as follows. [root@dhcp42-195 ~]# getfattr -d -m . -e hex /rhs/brick3/b3/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick3/b3/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.5c513245-edca-436e-a98f-9367f1105a6a.stime=0x59d24c220008dc36 trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.xtime=0x58c24c2c000acca9 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0xd820a210d7a74b6c807fbe7e5d5fbad6 [root@dhcp42-195 ~]# getfattr -d -m . -e hex /rhs/brick3/b3/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick3/b3/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.5c513245-edca-436e-a98f-9367f1105a6a.stime=0x59d24c220008dc36 trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.xtime=0x58c24c2c000acca9 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0xd820a210d7a74b6c807fbe7e5d5fbad6 [root@dhcp42-195 ~]# less /rhs/brick3/b3/.glusterfs/changelogs/htime/HTIME.1489086377 1489086392 <<<< start time available [root@dhcp42-195 ~]# python Python 2.7.5 (default, Oct 11 2015, 17:47:16) [GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xattr >>> import os >>> xattr.set("/rhs/brick3/b3/", "trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.5c513245-edca-436e-a98f-9367f1105a6a.stime", struct.pack("!II", 1489086300, 0)) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'struct' is not defined >>> import struct >>> xattr.set("/rhs/brick3/b3/", "trusted.glusterfs.d820a210-d7a7-4b6c-807f-be7e5d5fbad6.5c513245-edca-436e-a98f-9367f1105a6a.stime", struct.pack("!II", 1489086300, 0)) >>> KeyboardInterrupt >>> quit Use quit() or Ctrl-D (i.e. EOF) to exit >>> [root@dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------ 10.70.42.195 r /rhs/brick3/b3 root 10.70.43.124::s N/A Faulty N/A N/A 10.70.42.195 r /rhs/brick2/b2 root 10.70.43.124::s 10.70.43.124 Active Changelog Crawl 2017-03-10 01:48:15 [root@dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s stop Stopping geo-replication session between r & 10.70.43.124::s has been successful [root@dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s start Starting geo-replication session between r & 10.70.43.124::s has been successful [root@dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED -------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.195 r /rhs/brick3/b3 root 10.70.43.124::s N/A Initializing... N/A N/A 10.70.42.195 r /rhs/brick2/b2 root 10.70.43.124::s N/A Initializing... N/A N/A [root@dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.195 r /rhs/brick3/b3 root 10.70.43.124::s 10.70.43.124 Active History Crawl 2017-03-10 01:48:12 10.70.42.195 r /rhs/brick2/b2 root 10.70.43.124::s 10.70.43.124 Active History Crawl 2017-03-10 01:48:15 [root@dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.195 r /rhs/brick3/b3 root 10.70.43.124::s N/A Faulty N/A N/A 10.70.42.195 r /rhs/brick2/b2 root 10.70.43.124::s 10.70.43.124 Active History Crawl 2017-03-10 01:48:15 [root@dhcp42-195 ~]# gluster volume geo-replication r 10.70.43.124::s status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.195 r /rhs/brick3/b3 root 10.70.43.124::s N/A Faulty N/A N/A 10.70.42.195 r /rhs/brick2/b2 root 10.70.43.124::s 10.70.43.124 Active History Crawl 2017-03-10 01:48:15 [root@dhcp42-195 ~]# less /var/log/glusterfs/geo-replication/r/ssh%3A%2F%2Froot%4010.70.43.124%3Agluster%3A%2F%2F127.0.0.1%3As.log [root@dhcp42-195 ~]# It crashes with following traceback: [2017-03-10 07:40:44.911781] I [monitor(monitor):343:monitor] Monitor: worker(/rhs/brick3/b3) died in startup phase [2017-03-10 07:40:55.161108] I [monitor(monitor):266:monitor] Monitor: ------------------------------------------------------------ [2017-03-10 07:40:55.161722] I [monitor(monitor):267:monitor] Monitor: starting gsyncd worker [2017-03-10 07:40:55.412802] I [gsyncd(/rhs/brick3/b3):710:main_i] <top>: syncing: gluster://localhost:r -> ssh://root.43.124:gluster://localhost:s [2017-03-10 07:40:55.417292] I [changelogagent(agent):73:__init__] ChangelogAgent: Agent listining... [2017-03-10 07:40:58.478969] I [master(/rhs/brick3/b3):83:gmaster_builder] <top>: setting up xsync change detection mode [2017-03-10 07:40:58.479608] I [master(/rhs/brick3/b3):367:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:40:58.481811] I [master(/rhs/brick3/b3):83:gmaster_builder] <top>: setting up changelog change detection mode [2017-03-10 07:40:58.482242] I [master(/rhs/brick3/b3):367:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:40:58.484389] I [master(/rhs/brick3/b3):83:gmaster_builder] <top>: setting up changeloghistory change detection mode [2017-03-10 07:40:58.484896] I [master(/rhs/brick3/b3):367:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:41:00.517904] I [master(/rhs/brick3/b3):1249:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/r/ssh%3A%2F%2Froot%4010.70.43.124%3Agluster%3A%2F%2F127.0.0.1%3As/c4853a41d2b23f970bf0b4e046255cd5/xsync [2017-03-10 07:41:00.518466] I [resource(/rhs/brick3/b3):1491:service_loop] GLUSTER: Register time: 1489131660 [2017-03-10 07:41:00.547487] I [master(/rhs/brick3/b3):510:crawlwrap] _GMaster: primary master with volume id d820a210-d7a7-4b6c-807f-be7e5d5fbad6 ... [2017-03-10 07:41:00.553791] I [master(/rhs/brick3/b3):519:crawlwrap] _GMaster: crawl interval: 1 seconds [2017-03-10 07:41:00.562241] I [master(/rhs/brick3/b3):1163:crawl] _GMaster: starting history crawl... turns: 1, stime: (1489086300, 0), etime: 1489131660 [2017-03-10 07:41:00.563826] E [repce(agent):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54, in history num_parallel) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 100, in cl_history_changelog cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 2] No such file or directory [2017-03-10 07:41:00.565355] E [repce(/rhs/brick3/b3):207:__call__] RepceClient: call 11457:140177899616064:1489131660.56 (history) failed on peer with ChangelogException [2017-03-10 07:41:00.565733] E [resource(/rhs/brick3/b3):1506:service_loop] GLUSTER: Changelog History Crawl failed, [Errno 2] No such file or directory [2017-03-10 07:41:00.566288] I [syncdutils(/rhs/brick3/b3):220:finalize] <top>: exiting. [2017-03-10 07:41:00.571651] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2017-03-10 07:41:00.572334] I [syncdutils(agent):220:finalize] <top>: exiting. [2017-03-10 07:41:01.483324] I [monitor(monitor):343:monitor] Monitor: worker(/rhs/brick3/b3) died in startup phase [2017-03-10 07:41:11.734217] I [monitor(monitor):266:monitor] Monitor: ------------------------------------------------------------ [2017-03-10 07:41:11.734749] I [monitor(monitor):267:monitor] Monitor: starting gsyncd worker [2017-03-10 07:41:11.922350] I [gsyncd(/rhs/brick3/b3):710:main_i] <top>: syncing: gluster://localhost:r -> ssh://root.43.124:gluster://localhost:s [2017-03-10 07:41:11.922379] I [changelogagent(agent):73:__init__] ChangelogAgent: Agent listining... [2017-03-10 07:41:14.980911] I [master(/rhs/brick3/b3):83:gmaster_builder] <top>: setting up xsync change detection mode [2017-03-10 07:41:14.981561] I [master(/rhs/brick3/b3):367:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:41:14.983578] I [master(/rhs/brick3/b3):83:gmaster_builder] <top>: setting up changelog change detection mode [2017-03-10 07:41:14.983989] I [master(/rhs/brick3/b3):367:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:41:14.985780] I [master(/rhs/brick3/b3):83:gmaster_builder] <top>: setting up changeloghistory change detection mode [2017-03-10 07:41:14.986200] I [master(/rhs/brick3/b3):367:__init__] _GMaster: using 'rsync' as the sync engine ====================================================================================================================================================== On 3.2: 1489129736 is the start time available. Set it to 1489129600. It does not crash it goes to hybrid crawl instead. [root@dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs stop Stopping geo-replication session between ra & 10.70.41.211::rs has been successful [root@dhcp42-7 ~]# python Python 2.7.5 (default, Aug 2 2016, 04:20:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xattr >>> import os >>> import struct >>> xattr.set("/rhs/brick1/b1", "trusted.glusterfs.04979d53-79f1-47a9-9eb8-8b19c6765368.39db963b-d234-4409-8b06-bb5bcea75911.stime", struct.pack("!II", 1489129600, 0)) >>> [root@dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------ 10.70.42.7 ra /rhs/brick1/b1 root 10.70.41.211::rs N/A Stopped N/A N/A 10.70.42.7 ra /rhs/brick2/b2 root 10.70.41.211::rs N/A Stopped N/A N/A [root@dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs start Starting geo-replication session between ra & 10.70.41.211::rs has been successful [root@dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status ^[[A MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED -------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.7 ra /rhs/brick1/b1 root 10.70.41.211::rs N/A Initializing... N/A N/A 10.70.42.7 ra /rhs/brick2/b2 root 10.70.41.211::rs N/A Initializing... N/A N/A [root@dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status [root@dhcp42-7 ~]# [root@dhcp42-7 ~]# [root@dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.7 ra /rhs/brick1/b1 root 10.70.41.211::rs 10.70.41.211 Active Hybrid Crawl 2017-03-10 07:09:25 10.70.42.7 ra /rhs/brick2/b2 root 10.70.41.211::rs 10.70.41.211 Active History Crawl 2017-03-10 07:09:25 [root@dhcp42-7 ~]# gluster vol geo-rep ra 10.70.41.211::rs status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------ 10.70.42.7 ra /rhs/brick1/b1 root 10.70.41.211::rs 10.70.41.211 Active Changelog Crawl 2017-03-10 07:09:13 10.70.42.7 ra /rhs/brick2/b2 root 10.70.41.211::rs 10.70.41.211 Active History Crawl 2017-03-10 07:09:25 Does not crash: [2017-03-10 07:44:56.887825] I [gsyncd(/rhs/brick1/b1):747:main_i] <top>: syncing: gluster://localhost:ra -> ssh://root.41.211:gluster://localhost:rs [2017-03-10 07:45:08.826819] I [master(/rhs/brick2/b2):83:gmaster_builder] <top>: setting up xsync change detection mode [2017-03-10 07:45:08.827837] I [master(/rhs/brick2/b2):369:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:45:08.827426] I [master(/rhs/brick1/b1):83:gmaster_builder] <top>: setting up xsync change detection mode [2017-03-10 07:45:08.829051] I [master(/rhs/brick1/b1):369:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:45:08.831597] I [master(/rhs/brick2/b2):83:gmaster_builder] <top>: setting up changelog change detection mode [2017-03-10 07:45:08.832722] I [master(/rhs/brick2/b2):369:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:45:08.836221] I [master(/rhs/brick1/b1):83:gmaster_builder] <top>: setting up changelog change detection mode [2017-03-10 07:45:08.837248] I [master(/rhs/brick1/b1):369:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:45:08.838807] I [master(/rhs/brick2/b2):83:gmaster_builder] <top>: setting up changeloghistory change detection mode [2017-03-10 07:45:08.839443] I [master(/rhs/brick2/b2):369:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:45:08.839672] I [master(/rhs/brick1/b1):83:gmaster_builder] <top>: setting up changeloghistory change detection mode [2017-03-10 07:45:08.840241] I [master(/rhs/brick1/b1):369:__init__] _GMaster: using 'rsync' as the sync engine [2017-03-10 07:45:10.901714] I [master(/rhs/brick2/b2):1253:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/ra/ssh%3A%2F%2Froot%4010.70.41.211%3Agluster%3A%2F%2F127.0.0.1%3Ars/764586b145d7206a154a778f64bd2f50/xsync [2017-03-10 07:45:10.902595] I [resource(/rhs/brick2/b2):1533:service_loop] GLUSTER: Register time: 1489131910 [2017-03-10 07:45:10.919552] I [master(/rhs/brick2/b2):512:crawlwrap] _GMaster: primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ... [2017-03-10 07:45:10.919812] I [master(/rhs/brick1/b1):1253:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/ra/ssh%3A%2F%2Froot%4010.70.41.211%3Agluster%3A%2F%2F127.0.0.1%3Ars/c19b89ac45352ab8c894d210d136dd56/xsync [2017-03-10 07:45:10.920842] I [resource(/rhs/brick1/b1):1533:service_loop] GLUSTER: Register time: 1489131910 [2017-03-10 07:45:10.930277] I [master(/rhs/brick2/b2):521:crawlwrap] _GMaster: crawl interval: 1 seconds [2017-03-10 07:45:10.930512] I [master(/rhs/brick1/b1):512:crawlwrap] _GMaster: primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ... [2017-03-10 07:45:10.955785] I [master(/rhs/brick1/b1):521:crawlwrap] _GMaster: crawl interval: 1 seconds [2017-03-10 07:45:10.962408] I [master(/rhs/brick2/b2):1167:crawl] _GMaster: starting history crawl... turns: 1, stime: (1489129765, 0), etime: 1489131910 [2017-03-10 07:45:10.972481] I [master(/rhs/brick1/b1):1167:crawl] _GMaster: starting history crawl... turns: 1, stime: (-1, 0), etime: 1489131910 [2017-03-10 07:45:10.972976] I [master(/rhs/brick1/b1):1170:crawl] _GMaster: stime not available, abandoning history crawl [2017-03-10 07:45:10.973434] I [resource(/rhs/brick1/b1):1548:service_loop] GLUSTER: No stime available, using xsync crawl [2017-03-10 07:45:10.977698] I [master(/rhs/brick1/b1):512:crawlwrap] _GMaster: primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ... [2017-03-10 07:45:10.984404] I [master(/rhs/brick1/b1):521:crawlwrap] _GMaster: crawl interval: 60 seconds [2017-03-10 07:45:10.992049] I [master(/rhs/brick1/b1):1275:crawl] _GMaster: starting hybrid crawl..., stime: (-1, 0) [2017-03-10 07:45:11.968451] I [master(/rhs/brick2/b2):1196:crawl] _GMaster: slave's time: (1489129765, 0) [2017-03-10 07:45:12.15741] I [master(/rhs/brick1/b1):1285:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/ra/ssh%3A%2F%2Froot%4010.70.41.211%3Agluster%3A%2F%2F127.0.0.1%3Ars/c19b89ac45352ab8c894d210d136dd56/xsync/XSYNC-CHANGELOG.1489131910 [2017-03-10 07:45:13.115169] I [master(/rhs/brick1/b1):1282:crawl] _GMaster: finished hybrid crawl syncing, stime: (1489129753, 210362) [2017-03-10 07:45:13.119362] I [master(/rhs/brick1/b1):512:crawlwrap] _GMaster: primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ... [2017-03-10 07:45:13.126692] I [master(/rhs/brick1/b1):521:crawlwrap] _GMaster: crawl interval: 3 seconds [2017-03-10 07:45:13.235455] I [master(/rhs/brick2/b2):1210:crawl] _GMaster: finished history crawl syncing, endtime: 1489131896, stime: (1489129765, 0) [2017-03-10 07:45:25.250497] I [master(/rhs/brick2/b2):1167:crawl] _GMaster: starting history crawl... turns: 2, stime: (1489129765, 0), etime: 1489131925 [2017-03-10 07:45:25.256480] I [master(/rhs/brick2/b2):1210:crawl] _GMaster: finished history crawl syncing, endtime: 1489131911, stime: (1489129765, 0) [2017-03-10 07:45:25.261801] I [master(/rhs/brick2/b2):512:crawlwrap] _GMaster: primary master with volume id 04979d53-79f1-47a9-9eb8-8b19c6765368 ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html