Bug 1474012
| Summary: | [geo-rep]: Incorrect last sync "0" during hystory crawl after upgrade/stop-start | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | |
| Component: | geo-replication | Assignee: | Kotresh HR <khiremat> | |
| Status: | CLOSED ERRATA | QA Contact: | Rochelle <rallan> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.3 | CC: | csaba, rhs-bugs, sheggodu, storage-qa-internal | |
| Target Milestone: | --- | |||
| Target Release: | RHGS 3.4.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | rebase | |||
| Fixed In Version: | glusterfs-3.12.2-1 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1500346 (view as bug list) | Environment: | ||
| Last Closed: | 2018-09-04 06:34:19 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1569490, 1575490, 1577862, 1611104 | |||
| Bug Blocks: | 1500346, 1500853, 1503134 | |||
Upstream Patch: https://review.gluster.org/18468 (master) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |
Description of problem: ======================= Observed a scenario where lasy sync became zero post upgrade/reboot during hystory crawl. Before upgrade started, the sync was "changelog crawl" with last sync time as: "2017-07-21 12:51:55". However after upgrade and starting the geo-rep, the last sync for few workers were shown as "0". The corresponding status file shows "0" [root@dhcp42-79 ~]# gluster volume geo-replication master 10.70.41.209::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.79 master /rhs/brick1/b1 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.79 master /rhs/brick2/b5 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.79 master /rhs/brick3/b9 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.74 master /rhs/brick1/b3 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.42.74 master /rhs/brick2/b7 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.42.74 master /rhs/brick3/b11 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.41.217 master /rhs/brick1/b4 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.41.217 master /rhs/brick2/b8 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.41.217 master /rhs/brick3/b12 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.43.210 master /rhs/brick1/b2 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A 10.70.43.210 master /rhs/brick2/b6 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A 10.70.43.210 master /rhs/brick3/b10 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A [root@dhcp42-79 ~]# [root@dhcp42-79 ~]# date Sun Jul 23 11:04:25 IST 2017 [root@dhcp42-79 ~]# [root@dhcp42-74 ~]# cd /var/lib/glusterd/geo-replication/master_10.70.41.209_slave/ [root@dhcp42-74 master_10.70.41.209_slave]# ls brick_%2Frhs%2Fbrick1%2Fb3.status brick_%2Frhs%2Fbrick2%2Fb7.status brick_%2Frhs%2Fbrick3%2Fb11.status gsyncd.conf monitor.pid monitor.status [root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick1%2Fb3.status {"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 583, "slave_node": "10.70.41.202", "data": 2083, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# [root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick2%2Fb7.status {"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 584, "slave_node": "10.70.41.202", "data": 2059, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# [root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick3%2Fb11.status {"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 586, "slave_node": "10.70.41.202", "data": 2101, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# [root@dhcp42-74 master_10.70.41.209_slave]# cat monitor.status Started[root@dhcp42-74 master_10.70.41.209_slave]# The status remained same for more than 10 mins until one batch did not sync MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ---------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.79 master /rhs/brick1/b1 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.79 master /rhs/brick2/b5 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.42.79 master /rhs/brick3/b9 root 10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21 12:51:55 10.70.41.217 master /rhs/brick1/b4 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.41.217 master /rhs/brick2/b8 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.41.217 master /rhs/brick3/b12 root 10.70.41.209::slave 10.70.42.177 Passive N/A N/A 10.70.42.74 master /rhs/brick1/b3 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.42.74 master /rhs/brick2/b7 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.42.74 master /rhs/brick3/b11 root 10.70.41.209::slave 10.70.41.202 Active History Crawl N/A 10.70.43.210 master /rhs/brick1/b2 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A 10.70.43.210 master /rhs/brick2/b6 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A 10.70.43.210 master /rhs/brick3/b10 root 10.70.41.209::slave 10.70.41.194 Passive N/A N/A Sun Jul 23 11:14:50 IST 2017 Version-Release number of selected component (if applicable): ============================================================= glusterfs-geo-replication-3.8.4-35.el7rhgs.x86_64 How reproducible: ================= I remember seeing this only once before upon stop/start. Have tried upgrade twice and seen this once. Steps to Reproduce: =================== No specific steps, the systems were upgraded and as part of upgrade geo-replication was stopped/started. Actual results: =============== Last sync is "0" Expected results: ================= Last sync should be what it was before geo-rep stopped. Looks like brick status file was overwritten with "0" as last synced.