Description of problem: After upgrading the system from 3.2.0 to 3.3.0 geo-replication status appears faulty with following traceback [2017-04-06 11:32:00.796592] E [syncdutils(/rhs/brick1/b1):296:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 779, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1572, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 570, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1169, in crawl if not data_stime or data_stime == URXTIME: NameError: global name 'data_stime' is not defined [2017-04-06 11:32:00.800887] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>: exiting. [root@localhost ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------- 10.70.43.179 vol0 /rhs/brick1/b1 root ssh://10.70.43.87::vol1 N/A Faulty N/A N/A 10.70.43.179 vol0 /rhs/brick2/b3 root ssh://10.70.43.87::vol1 N/A Faulty N/A N/A 10.70.42.90 vol0 /rhs/brick1/b2 root ssh://10.70.43.87::vol1 10.70.43.87 Passive N/A N/A 10.70.42.90 vol0 /rhs/brick2/b4 root ssh://10.70.43.87::vol1 10.70.43.87 Passive N/A N/A [root@localhost ~]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-geo-replication-3.8.4-21.el6rhs.x86_64 How reproducible: ================= Always Steps to Reproduce: =================== 1. Create a geo-replication setup with 3.2.0. builds 2. Stop the geo-replication session to continue upgrade 3. Follow the inservice upgrade path to upgrade to 3.3.0 4. Start the geo-replication session Actual results: =============== Geo-replication session becomes faulty Expected results: ================= All the workers should be either active or passive
Following code in master.py was causing this issue. diff between 3.2.0 master.py and 3.3.0 master.py reveals these additional lines: if not data_stime or data_stime == URXTIME: raise NoStimeAvailable() After commenting and restart geo-replication. It works.
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102726/
verified with build: glusterfs-geo-replication-3.8.4-22.el6rhs.x86_64 After upgrading Master/Slave cluster from 3.2.0 to 3.3.0 latest version. Able to start geo-replication, it goes into history crawl and becomes changelog. It is working as expecting. Moving the bug to verified state. [root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol start Starting geo-replication session between firstvol & 10.70.43.185::secvol has been successful [root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED -------------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.43.30 firstvol /rochelle/brick1/b2 root 10.70.43.185::secvol N/A Initializing... N/A N/A 10.70.43.30 firstvol /rochelle/brick5/b3 root 10.70.43.185::secvol N/A Initializing... N/A N/A 10.70.43.148 firstvol /rochelle/brick2/b2 root 10.70.43.185::secvol N/A Initializing... N/A N/A 10.70.43.148 firstvol /rochelle/brick6/b3 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A [root@localhost ~]# [root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.43.30 firstvol /rochelle/brick1/b2 root 10.70.43.185::secvol 10.70.43.185 Active History Crawl 2017-04-10 22:53:07 10.70.43.30 firstvol /rochelle/brick5/b3 root 10.70.43.185::secvol 10.70.43.185 Active History Crawl 2017-04-10 22:53:08 10.70.43.148 firstvol /rochelle/brick2/b2 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A 10.70.43.148 firstvol /rochelle/brick6/b3 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A [root@localhost ~]# [root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.43.30 firstvol /rochelle/brick1/b2 root 10.70.43.185::secvol 10.70.43.185 Active Changelog Crawl 2017-04-10 22:53:07 10.70.43.30 firstvol /rochelle/brick5/b3 root 10.70.43.185::secvol 10.70.43.185 Active Changelog Crawl 2017-04-10 22:53:08 10.70.43.148 firstvol /rochelle/brick2/b2 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A 10.70.43.148 firstvol /rochelle/brick6/b3 root 10.70.43.185::secvol 10.70.43.158 Passive N/A N/A [root@localhost ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774