Bug 1575490

Summary: [geo-rep]: Upgrade fails, session in FAULTY state
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: amukherj, bugs, csaba, rallan, rhinduja, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-5.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1569490
: 1577862 1611104 (view as bug list) Environment:
Last Closed: 2018-10-23 15:07:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1569490    
Bug Blocks: 1474012, 1577862, 1611104    

Description Kotresh HR 2018-05-07 05:54:53 UTC
Description of problem:
=======================
While upgrading from gluster version 3.8 to v.3.12 encountered a FAULTY session where there was only one worker ACTIVE.

[root@dhcp42-53 master]# gluster volume geo-replication master 10.70.42.164::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS    CRAWL STATUS     LAST_SYNCED          
------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.53     master        /rhs/brick1/b1    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A                  
10.70.42.53     master        /rhs/brick2/b4    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A                  
10.70.42.138    master        /rhs/brick1/b3    root          10.70.42.164::slave    10.70.42.164    Active    History Crawl    N/A                  
10.70.42.138    master        /rhs/brick2/b6    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A                  
10.70.42.160    master        /rhs/brick1/b2    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A                  
10.70.42.160    master        /rhs/brick2/b5    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A  



Traceback in geo-rep logs:
--------------------------------
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 802, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1676, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1470, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1370, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1123, in process_change
    entry_stime_to_update[0])
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 200, in set_field
    return self._update(merger)
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 161, in _update
    data = mergerfunc(data)
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 194, in merger
    if data[key] == value:
KeyError: 'last_synced_entry'


Version-Release number of selected component (if applicable):
=============================================================



How reproducible:
=================
1/1


Actual results:
===============
Session is FAULTY.

Expected results:
=================
Session should not be FAULTY.

Comment 1 Worker Ant 2018-05-07 06:06:22 UTC
REVIEW: https://review.gluster.org/19969 (geo-rep: Fix upgrade issue) posted (#1) for review on master by Kotresh HR

Comment 2 Worker Ant 2018-05-07 10:17:41 UTC
COMMIT: https://review.gluster.org/19969 committed in master by "Aravinda VK" <avishwan> with a commit message- geo-rep: Fix upgrade issue

Cause and Analysis:
The last synced changelog for entry operations is
marked in current version to avoid re-processing
of already processed entry operations in a batch
during crash/restart of geo-rep. This was not
present in previous versoins.

The marker is maintained in the dictionary with the
key 'last_synced_entry' and dictionary is persisted
into status file. So upgrading to current version in
which the marker is present was failing with KeyError.

Solution:
Load the dictionary with default keys first which
contains all the keys including latest ones and then
load the values from status file instead of doing
otherwise.

fixes: bz#1575490
Change-Id: Ic654e6f9a3c97f616761f1362f890352a2186fb4
Signed-off-by: Kotresh HR <khiremat>

Comment 3 Worker Ant 2018-05-15 03:03:58 UTC
REVISION POSTED: https://review.gluster.org/20018 (geo-rep: Fix upgrade issue) posted (#2) for review on release-3.12 by Kotresh HR

Comment 4 Worker Ant 2018-08-02 05:03:54 UTC
REVISION POSTED: https://review.gluster.org/20606 (geo-rep: Fix upgrade issue) posted (#2) for review on release-4.1 by Kotresh HR

Comment 5 Worker Ant 2018-08-02 05:03:54 UTC
REVISION POSTED: https://review.gluster.org/20606 (geo-rep: Fix upgrade issue) posted (#2) for review on release-4.1 by Kotresh HR

Comment 6 Shyamsundar 2018-10-23 15:07:33 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/