1439708 – [geo-rep]: Geo-replication goes to faulty after upgrade from 3.2.0 to 3.3.0

Bug 1439708 - [geo-rep]: Geo-replication goes to faulty after upgrade from 3.2.0 to 3.3.0

Summary: [geo-rep]: Geo-replication goes to faulty after upgrade from 3.2.0 to 3.3.0

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Kotresh HR
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1417151
TreeView+	depends on / blocked

Reported:	2017-04-06 12:16 UTC by Rochelle
Modified:	2017-09-21 04:37 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.8.4-22
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-21 04:37:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Rochelle 2017-04-06 12:16:59 UTC

Description of problem:

After upgrading the system from 3.2.0 to 3.3.0 geo-replication status appears faulty with following traceback 



[2017-04-06 11:32:00.796592] E [syncdutils(/rhs/brick1/b1):296:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 779, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1572, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 570, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1169, in crawl
    if not data_stime or data_stime == URXTIME:
NameError: global name 'data_stime' is not defined
[2017-04-06 11:32:00.800887] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>: exiting.


[root@localhost ~]# gluster volume geo-replication status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                      SLAVE NODE     STATUS     CRAWL STATUS    LAST_SYNCED          
---------------------------------------------------------------------------------------------------------------------------------------------
10.70.43.179    vol0          /rhs/brick1/b1    root          ssh://10.70.43.87::vol1    N/A            Faulty     N/A             N/A                  
10.70.43.179    vol0          /rhs/brick2/b3    root          ssh://10.70.43.87::vol1    N/A            Faulty     N/A             N/A                  
10.70.42.90     vol0          /rhs/brick1/b2    root          ssh://10.70.43.87::vol1    10.70.43.87    Passive    N/A             N/A                  
10.70.42.90     vol0          /rhs/brick2/b4    root          ssh://10.70.43.87::vol1    10.70.43.87    Passive    N/A             N/A                  
[root@localhost ~]# 



Version-Release number of selected component (if applicable): 
=============================================================

glusterfs-geo-replication-3.8.4-21.el6rhs.x86_64

How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create a geo-replication setup with 3.2.0. builds
2. Stop the geo-replication session to continue upgrade 
3. Follow the inservice upgrade path to upgrade to 3.3.0
4. Start the geo-replication session

Actual results:
===============

Geo-replication session becomes faulty 

Expected results:
=================

All the workers should be either active or passive

Comment 4 Rahul Hinduja 2017-04-06 12:34:55 UTC

Following code in master.py was causing this issue. diff between 3.2.0 master.py and 3.3.0 master.py reveals these additional lines:

        if not data_stime or data_stime == URXTIME:
            raise NoStimeAvailable()


After commenting and restart geo-replication. It works.

Comment 6 Atin Mukherjee 2017-04-07 05:27:10 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102726/

Comment 9 Rahul Hinduja 2017-04-12 10:25:53 UTC

verified with build: glusterfs-geo-replication-3.8.4-22.el6rhs.x86_64

After upgrading Master/Slave cluster from  3.2.0 to 3.3.0 latest version. Able to start geo-replication, it goes into history crawl and becomes changelog. It is working as expecting. Moving the bug to verified state. 

[root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol start
Starting geo-replication session between firstvol & 10.70.43.185::secvol has been successful
[root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status
 
MASTER NODE     MASTER VOL    MASTER BRICK           SLAVE USER    SLAVE                   SLAVE NODE      STATUS             CRAWL STATUS    LAST_SYNCED          
--------------------------------------------------------------------------------------------------------------------------------------------------------
10.70.43.30     firstvol      /rochelle/brick1/b2    root          10.70.43.185::secvol    N/A             Initializing...    N/A             N/A                  
10.70.43.30     firstvol      /rochelle/brick5/b3    root          10.70.43.185::secvol    N/A             Initializing...    N/A             N/A                  
10.70.43.148    firstvol      /rochelle/brick2/b2    root          10.70.43.185::secvol    N/A             Initializing...    N/A             N/A                  
10.70.43.148    firstvol      /rochelle/brick6/b3    root          10.70.43.185::secvol    10.70.43.158    Passive            N/A             N/A                  
[root@localhost ~]# 

[root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status
 
MASTER NODE     MASTER VOL    MASTER BRICK           SLAVE USER    SLAVE                   SLAVE NODE      STATUS     CRAWL STATUS     LAST_SYNCED                  
---------------------------------------------------------------------------------------------------------------------------------------------------------
10.70.43.30     firstvol      /rochelle/brick1/b2    root          10.70.43.185::secvol    10.70.43.185    Active     History Crawl    2017-04-10 22:53:07          
10.70.43.30     firstvol      /rochelle/brick5/b3    root          10.70.43.185::secvol    10.70.43.185    Active     History Crawl    2017-04-10 22:53:08          
10.70.43.148    firstvol      /rochelle/brick2/b2    root          10.70.43.185::secvol    10.70.43.158    Passive    N/A              N/A                          
10.70.43.148    firstvol      /rochelle/brick6/b3    root          10.70.43.185::secvol    10.70.43.158    Passive    N/A              N/A                          
[root@localhost ~]# 


[root@localhost ~]# gluster volume geo-replication firstvol 10.70.43.185::secvol status
 
MASTER NODE     MASTER VOL    MASTER BRICK           SLAVE USER    SLAVE                   SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------
10.70.43.30     firstvol      /rochelle/brick1/b2    root          10.70.43.185::secvol    10.70.43.185    Active     Changelog Crawl    2017-04-10 22:53:07          
10.70.43.30     firstvol      /rochelle/brick5/b3    root          10.70.43.185::secvol    10.70.43.185    Active     Changelog Crawl    2017-04-10 22:53:08          
10.70.43.148    firstvol      /rochelle/brick2/b2    root          10.70.43.185::secvol    10.70.43.158    Passive    N/A                N/A                          
10.70.43.148    firstvol      /rochelle/brick6/b3    root          10.70.43.185::secvol    10.70.43.158    Passive    N/A                N/A                          
[root@localhost ~]#

Comment 11 errata-xmlrpc 2017-09-21 04:37:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.