1611104 – [geo-rep]: Upgrade fails, session in FAULTY state

Bug 1611104 - [geo-rep]: Upgrade fails, session in FAULTY state

Summary: [geo-rep]: Upgrade fails, session in FAULTY state

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Kotresh HR
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1569490 1575490
Blocks:	1474012 1503137 1577862
TreeView+	depends on / blocked

Reported:	2018-08-02 04:51 UTC by Kotresh HR
Modified:	2018-08-29 12:44 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-4.1.3
Clone Of:	1575490
Environment:
Last Closed:	2018-08-29 12:44:28 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kotresh HR 2018-08-02 04:51:27 UTC

+++ This bug was initially created as a clone of Bug #1575490 +++

Description of problem:
=======================
While upgrading from gluster version 3.8 to v.3.12 encountered a FAULTY session where there was only one worker ACTIVE.

[root@dhcp42-53 master]# gluster volume geo-replication master 10.70.42.164::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS    CRAWL STATUS     LAST_SYNCED          
------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.53     master        /rhs/brick1/b1    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A                  
10.70.42.53     master        /rhs/brick2/b4    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A                  
10.70.42.138    master        /rhs/brick1/b3    root          10.70.42.164::slave    10.70.42.164    Active    History Crawl    N/A                  
10.70.42.138    master        /rhs/brick2/b6    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A                  
10.70.42.160    master        /rhs/brick1/b2    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A                  
10.70.42.160    master        /rhs/brick2/b5    root          10.70.42.164::slave    N/A             Faulty    N/A              N/A  



Traceback in geo-rep logs:
--------------------------------
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 802, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1676, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1470, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1370, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1123, in process_change
    entry_stime_to_update[0])
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 200, in set_field
    return self._update(merger)
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 161, in _update
    data = mergerfunc(data)
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 194, in merger
    if data[key] == value:
KeyError: 'last_synced_entry'


Version-Release number of selected component (if applicable):
=============================================================



How reproducible:
=================
1/1


Actual results:
===============
Session is FAULTY.

Expected results:
=================
Session should not be FAULTY.

--- Additional comment from Worker Ant on 2018-05-07 02:06:22 EDT ---

REVIEW: https://review.gluster.org/19969 (geo-rep: Fix upgrade issue) posted (#1) for review on master by Kotresh HR

--- Additional comment from Worker Ant on 2018-05-07 06:17:41 EDT ---

COMMIT: https://review.gluster.org/19969 committed in master by "Aravinda VK" <avishwan> with a commit message- geo-rep: Fix upgrade issue

Cause and Analysis:
The last synced changelog for entry operations is
marked in current version to avoid re-processing
of already processed entry operations in a batch
during crash/restart of geo-rep. This was not
present in previous versoins.

The marker is maintained in the dictionary with the
key 'last_synced_entry' and dictionary is persisted
into status file. So upgrading to current version in
which the marker is present was failing with KeyError.

Solution:
Load the dictionary with default keys first which
contains all the keys including latest ones and then
load the values from status file instead of doing
otherwise.

fixes: bz#1575490
Change-Id: Ic654e6f9a3c97f616761f1362f890352a2186fb4
Signed-off-by: Kotresh HR <khiremat>

--- Additional comment from Worker Ant on 2018-05-14 23:03:58 EDT ---

REVISION POSTED: https://review.gluster.org/20018 (geo-rep: Fix upgrade issue) posted (#2) for review on release-3.12 by Kotresh HR

Comment 1 Worker Ant 2018-08-02 05:03:58 UTC

REVIEW: https://review.gluster.org/20606 (geo-rep: Fix upgrade issue) posted (#2) for review on release-4.1 by Kotresh HR

Comment 2 Worker Ant 2018-08-15 18:40:58 UTC

COMMIT: https://review.gluster.org/20606 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- geo-rep: Fix upgrade issue

Cause and Analysis:
The last synced changelog for entry operations is
marked in current version to avoid re-processing
of already processed entry operations in a batch
during crash/restart of geo-rep. This was not
present in previous versoins.

The marker is maintained in the dictionary with the
key 'last_synced_entry' and dictionary is persisted
into status file. So upgrading to current version in
which the marker is present was failing with KeyError.

Solution:
Load the dictionary with default keys first which
contains all the keys including latest ones and then
load the values from status file instead of doing
otherwise.

Backport of:
 > BUG: 1575490
 > Change-Id: Ic654e6f9a3c97f616761f1362f890352a2186fb4
 > Signed-off-by: Kotresh HR <khiremat>
 (cherry picked from commit 23c1385b5f6f6103e820d15ecfe1df31940fdb45)

fixes: bz#1611104
Change-Id: Ic654e6f9a3c97f616761f1362f890352a2186fb4
Signed-off-by: Kotresh HR <khiremat>
(cherry picked from commit 23c1385b5f6f6103e820d15ecfe1df31940fdb45)

Comment 3 Shyamsundar 2018-08-29 12:44:28 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.1.3, please open a new bug report.

glusterfs-4.1.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-August/000111.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.