Description of problem: ======================= Observed a scenario where lasy sync became zero post upgrade/reboot during hystory crawl. Before upgrade started, the sync was "changelog crawl" with last sync time as: "2017-07-21 12:51:55". However after upgrade and starting the geo-rep, the last sync for few workers were shown as "0". The corresponding status file shows "0" [root@dhcp42-79 ~]# gluster volume geo-replication master 10.70.41.209::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.79 master /rhs/brick1/b1 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.79 master /rhs/brick2/b5 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.79 master /rhs/brick3/b9 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.74 master /rhs/brick1/b3 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.42.74 master /rhs/brick2/b7 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.42.74 master /rhs/brick3/b11 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.41.217 master /rhs/brick1/b4 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.41.217 master /rhs/brick2/b8 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.41.217 master /rhs/brick3/b12 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.43.210 master /rhs/brick1/b2 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A 10.70.43.210 master /rhs/brick2/b6 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A 10.70.43.210 master /rhs/brick3/b10 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A [root@dhcp42-79 ~]# [root@dhcp42-79 ~]# date Sun Jul 23 11:04:25 IST 2017 [root@dhcp42-79 ~]# [root@dhcp42-74 ~]# cd /var/lib/glusterd/geo-replication/master_10.70.41.209_slave/ [root@dhcp42-74 master_10.70.41.209_slave]# ls brick_%2Frhs%2Fbrick1%2Fb3.status brick_%2Frhs%2Fbrick2%2Fb7.status brick_%2Frhs%2Fbrick3%2Fb11.status gsyncd.conf monitor.pid monitor.status [root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick1%2Fb3.status {"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 583, "slave_node": "10.70.41.202", "data": 2083, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# [root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick2%2Fb7.status {"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 584, "slave_node": "10.70.41.202", "data": 2059, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# [root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick3%2Fb11.status {"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 586, "slave_node": "10.70.41.202", "data": 2101, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# [root@dhcp42-74 master_10.70.41.209_slave]# cat monitor.status Started[root@dhcp42-74 master_10.70.41.209_slave]# The status remained same for more than 10 mins until one batch did not sync MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.79 master /rhs/brick1/b1 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.79 master /rhs/brick2/b5 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.79 master /rhs/brick3/b9 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.41.217 master /rhs/brick1/b4 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.41.217 master /rhs/brick2/b8 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.41.217 master /rhs/brick3/b12 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.42.74 master /rhs/brick1/b3 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.42.74 master /rhs/brick2/b7 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.42.74 master /rhs/brick3/b11 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.43.210 master /rhs/brick1/b2 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A 10.70.43.210 master /rhs/brick2/b6 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A 10.70.43.210 master /rhs/brick3/b10 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A Sun Jul 23 11:14:50 IST 2017 Version-Release number of selected component (if applicable): ============================================================= glusterfs-geo-replication-3.8.4-35.el7rhgs.x86_64 How reproducible: ================= I remember seeing this only once before upon stop/start. Have tried upgrade twice and seen this once. Steps to Reproduce: =================== No specific steps, the systems were upgraded and as part of upgrade geo-replication was stopped/started. Actual results: =============== Last sync is "0" Expected results: ================= Last sync should be what it was before geo-rep stopped. Looks like brick status file was overwritten with "0" as last synced.
Upstream Patch: https://review.gluster.org/18468 (master)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607