Bug 1474012 - [geo-rep]: Incorrect last sync "0" during hystory crawl after upgrade/stop-start
Summary: [geo-rep]: Incorrect last sync "0" during hystory crawl after upgrade/stop-start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication
Version: rhgs-3.3
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: RHGS 3.4.0
Assignee: Kotresh HR
QA Contact: Rochelle
URL:
Whiteboard: rebase
Depends On: 1569490 1575490 1577862 1611104
Blocks: 1500346 1500853 1503134
TreeView+ depends on / blocked
 
Reported: 2017-07-23 06:27 UTC by Rahul Hinduja
Modified: 2018-09-14 05:36 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.12.2-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1500346 (view as bug list)
Environment:
Last Closed: 2018-09-04 06:34:19 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:36:23 UTC

Description Rahul Hinduja 2017-07-23 06:27:21 UTC
Description of problem:
=======================

Observed a scenario where lasy sync became zero post upgrade/reboot during hystory crawl. Before upgrade started, the sync was "changelog crawl" with last sync time as: "2017-07-21 12:51:55". However after upgrade and starting the geo-rep, the last sync for few workers were shown as "0". The corresponding status file shows "0"

[root@dhcp42-79 ~]# gluster volume geo-replication master 10.70.41.209::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK       SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS     LAST_SYNCED                  
----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.79     master        /rhs/brick1/b1     root          10.70.41.209::slave    10.70.41.209    Active     History Crawl    2017-07-21 12:51:55          
10.70.42.79     master        /rhs/brick2/b5     root          10.70.41.209::slave    10.70.41.209    Active     History Crawl    2017-07-21 12:51:55          
10.70.42.79     master        /rhs/brick3/b9     root          10.70.41.209::slave    10.70.41.209    Active     History Crawl    2017-07-21 12:51:55          
10.70.42.74     master        /rhs/brick1/b3     root          10.70.41.209::slave    10.70.41.202    Active     History Crawl    N/A                          
10.70.42.74     master        /rhs/brick2/b7     root          10.70.41.209::slave    10.70.41.202    Active     History Crawl    N/A                          
10.70.42.74     master        /rhs/brick3/b11    root          10.70.41.209::slave    10.70.41.202    Active     History Crawl    N/A                          
10.70.41.217    master        /rhs/brick1/b4     root          10.70.41.209::slave    10.70.42.177    Passive    N/A              N/A                          
10.70.41.217    master        /rhs/brick2/b8     root          10.70.41.209::slave    10.70.42.177    Passive    N/A              N/A                          
10.70.41.217    master        /rhs/brick3/b12    root          10.70.41.209::slave    10.70.42.177    Passive    N/A              N/A                          
10.70.43.210    master        /rhs/brick1/b2     root          10.70.41.209::slave    10.70.41.194    Passive    N/A              N/A                          
10.70.43.210    master        /rhs/brick2/b6     root          10.70.41.209::slave    10.70.41.194    Passive    N/A              N/A                          
10.70.43.210    master        /rhs/brick3/b10    root          10.70.41.209::slave    10.70.41.194    Passive    N/A              N/A                          
[root@dhcp42-79 ~]# 
[root@dhcp42-79 ~]# date
Sun Jul 23 11:04:25 IST 2017
[root@dhcp42-79 ~]#


[root@dhcp42-74 ~]# cd /var/lib/glusterd/geo-replication/master_10.70.41.209_slave/
[root@dhcp42-74 master_10.70.41.209_slave]# ls
brick_%2Frhs%2Fbrick1%2Fb3.status  brick_%2Frhs%2Fbrick2%2Fb7.status  brick_%2Frhs%2Fbrick3%2Fb11.status  gsyncd.conf  monitor.pid  monitor.status
[root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick1%2Fb3.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 583, "slave_node": "10.70.41.202", "data": 2083, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# 
[root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick2%2Fb7.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 584, "slave_node": "10.70.41.202", "data": 2059, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# 
[root@dhcp42-74 master_10.70.41.209_slave]# cat brick_%2Frhs%2Fbrick3%2Fb11.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta": 0, "failures": 0, "entry": 586, "slave_node": "10.70.41.202", "data": 2101, "worker_status": "Active", "crawl_status": "History Crawl", "checkpoint_completion_time": 0}[root@dhcp42-74 master_10.70.41.209_slave]# 
[root@dhcp42-74 master_10.70.41.209_slave]# cat monitor.status
Started[root@dhcp42-74 master_10.70.41.209_slave]# 


The status remained same for more than 10 mins until one batch did not sync



MASTER NODE     MASTER VOL    MASTER BRICK       SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS     LAST_SYNCED                  
----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.79     master        /rhs/brick1/b1     root          10.70.41.209::slave    10.70.41.209    Active     History Crawl    2017-07-21 12:51:55          
10.70.42.79     master        /rhs/brick2/b5     root          10.70.41.209::slave    10.70.41.209    Active     History Crawl    2017-07-21 12:51:55          
10.70.42.79     master        /rhs/brick3/b9     root          10.70.41.209::slave    10.70.41.209    Active     History Crawl    2017-07-21 12:51:55          
10.70.41.217    master        /rhs/brick1/b4     root          10.70.41.209::slave    10.70.42.177    Passive    N/A              N/A                          
10.70.41.217    master        /rhs/brick2/b8     root          10.70.41.209::slave    10.70.42.177    Passive    N/A              N/A                          
10.70.41.217    master        /rhs/brick3/b12    root          10.70.41.209::slave    10.70.42.177    Passive    N/A              N/A                          
10.70.42.74     master        /rhs/brick1/b3     root          10.70.41.209::slave    10.70.41.202    Active     History Crawl    N/A                          
10.70.42.74     master        /rhs/brick2/b7     root          10.70.41.209::slave    10.70.41.202    Active     History Crawl    N/A                          
10.70.42.74     master        /rhs/brick3/b11    root          10.70.41.209::slave    10.70.41.202    Active     History Crawl    N/A                          
10.70.43.210    master        /rhs/brick1/b2     root          10.70.41.209::slave    10.70.41.194    Passive    N/A              N/A                          
10.70.43.210    master        /rhs/brick2/b6     root          10.70.41.209::slave    10.70.41.194    Passive    N/A              N/A                          
10.70.43.210    master        /rhs/brick3/b10    root          10.70.41.209::slave    10.70.41.194    Passive    N/A              N/A                          
Sun Jul 23 11:14:50 IST 2017


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-geo-replication-3.8.4-35.el7rhgs.x86_64


How reproducible:
=================

I remember seeing this only once before upon stop/start. Have tried upgrade twice and seen this once. 

Steps to Reproduce:
===================

No specific steps, the systems were upgraded and as part of upgrade geo-replication was stopped/started.

Actual results:
===============

Last sync is "0"


Expected results:
=================

Last sync should be what it was before geo-rep stopped. Looks like brick status file was overwritten with "0" as last synced.

Comment 4 Kotresh HR 2017-10-10 12:43:50 UTC
Upstream Patch:

https://review.gluster.org/18468  (master)

Comment 8 errata-xmlrpc 2018-09-04 06:34:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.