Bug 1445591

Summary: Unable to take snapshot on a geo-replicated volume, even after stopping the session
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Atin Mukherjee <amukherj>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, asrivast, csaba, divya, khiremat, olim, rcyriac, rhinduja, rhs-bugs, sanandpa, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.2.0 Async   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-18.1 Doc Type: Bug Fix
Doc Text:
Previously, creation of snapshot sometimes failed on a geo-replicated volume, even after stopping the session. This was due to a bug in the way the gusterd builds up state of in-memory active geo-replication sessions. With this fix, you can successfully create snapshots of a geo-replicated volume.
Story Points: ---
Clone Of: 1416024 Environment:
Last Closed: 2017-06-08 09:36:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1443977    
Bug Blocks:    

Comment 2 Atin Mukherjee 2017-04-26 04:42:02 UTC
upstream patch : https://review.gluster.org/17093

Comment 3 Kotresh HR 2017-05-04 09:34:19 UTC
Downstream 3.2 patch:

https://code.engineering.redhat.com/gerrit/#/c/105181/

Comment 8 Rahul Hinduja 2017-05-24 11:59:46 UTC
Validated with build: glusterfs-geo-replication-3.8.4-18.1.el7rhgs.x86_64

With Stopped and restart of glusterd:
=====================================

[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.37.152    master        /rhs/brick1/b1    root          10.70.37.155::slave    10.70.37.157    Active     Changelog Crawl    2017-05-24 16:28:04          
10.70.37.152    master        /rhs/brick2/b3    root          10.70.37.155::slave    10.70.37.157    Passive    N/A                N/A                          
10.70.37.153    master        /rhs/brick1/b2    root          10.70.37.155::slave    10.70.37.155    Passive    N/A                N/A                          
10.70.37.153    master        /rhs/brick2/b4    root          10.70.37.155::slave    10.70.37.155    Active     Changelog Crawl    2017-05-24 16:28:04          
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# service glusterd restart
Redirecting to /bin/systemctl restart  glusterd.service
[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave stop
Stopping geo-replication session between master & 10.70.37.155::slave has been successful
[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE    STATUS     CRAWL STATUS    LAST_SYNCED          
----------------------------------------------------------------------------------------------------------------------------------------
10.70.37.152    master        /rhs/brick1/b1    root          10.70.37.155::slave    N/A           Stopped    N/A             N/A                  
10.70.37.152    master        /rhs/brick2/b3    root          10.70.37.155::slave    N/A           Stopped    N/A             N/A                  
10.70.37.153    master        /rhs/brick1/b2    root          10.70.37.155::slave    N/A           Stopped    N/A             N/A                  
10.70.37.153    master        /rhs/brick2/b4    root          10.70.37.155::slave    N/A           Stopped    N/A             N/A                  
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# gluster snapshot create snap1 master
snapshot create: success: Snap snap1_GMT-2017.05.24-10.59.03 created successfully
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# gluster snapshot list
snap1_GMT-2017.05.24-10.59.03
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave start
Starting geo-replication session between master & 10.70.37.155::slave has been successful
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS     LAST_SYNCED                  
---------------------------------------------------------------------------------------------------------------------------------------------------
10.70.37.152    master        /rhs/brick1/b1    root          10.70.37.155::slave    10.70.37.157    Passive    N/A              N/A                          
10.70.37.152    master        /rhs/brick2/b3    root          10.70.37.155::slave    10.70.37.157    Passive    N/A              N/A                          
10.70.37.153    master        /rhs/brick1/b2    root          10.70.37.155::slave    10.70.37.155    Active     History Crawl    2017-05-24 16:28:07          
10.70.37.153    master        /rhs/brick2/b4    root          10.70.37.155::slave    10.70.37.155    Active     History Crawl    2017-05-24 16:28:07          
[root@dhcp37-152 scripts]# 


With Pause and restart of glusterd:
===================================

[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS     LAST_SYNCED                  
---------------------------------------------------------------------------------------------------------------------------------------------------
10.70.37.152    master        /rhs/brick1/b1    root          10.70.37.155::slave    10.70.37.157    Passive    N/A              N/A                          
10.70.37.152    master        /rhs/brick2/b3    root          10.70.37.155::slave    10.70.37.157    Passive    N/A              N/A                          
10.70.37.153    master        /rhs/brick1/b2    root          10.70.37.155::slave    10.70.37.155    Active     History Crawl    2017-05-24 16:28:07          
10.70.37.153    master        /rhs/brick2/b4    root          10.70.37.155::slave    10.70.37.155    Active     History Crawl    2017-05-24 16:28:07          
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# service glusterd restart
Redirecting to /bin/systemctl restart  glusterd.service
[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave pause
Pausing geo-replication session between master & 10.70.37.155::slave has been successful
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# 
[root@dhcp37-152 scripts]# gluster snapshot create snap2 master
snapshot create: success: Snap snap2_GMT-2017.05.24-11.00.29 created successfully
[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE    STATUS    CRAWL STATUS    LAST_SYNCED          
---------------------------------------------------------------------------------------------------------------------------------------
10.70.37.152    master        /rhs/brick1/b1    root          10.70.37.155::slave    N/A           Paused    N/A             N/A                  
10.70.37.152    master        /rhs/brick2/b3    root          10.70.37.155::slave    N/A           Paused    N/A             N/A                  
10.70.37.153    master        /rhs/brick1/b2    root          10.70.37.155::slave    N/A           Paused    N/A             N/A                  
10.70.37.153    master        /rhs/brick2/b4    root          10.70.37.155::slave    N/A           Paused    N/A             N/A                  
[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave resume
Resuming geo-replication session between master & 10.70.37.155::slave has been successful
[root@dhcp37-152 scripts]# gluster volume geo-replication master 10.70.37.155::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.37.152    master        /rhs/brick1/b1    root          10.70.37.155::slave    10.70.37.157    Passive    N/A                N/A                          
10.70.37.152    master        /rhs/brick2/b3    root          10.70.37.155::slave    10.70.37.157    Passive    N/A                N/A                          
10.70.37.153    master        /rhs/brick1/b2    root          10.70.37.155::slave    10.70.37.155    Active     Changelog Crawl    2017-05-24 16:28:07          
10.70.37.153    master        /rhs/brick2/b4    root          10.70.37.155::slave    10.70.37.155    Active     Changelog Crawl    2017-05-24 16:28:07          
[root@dhcp37-152 scripts]# 


Validated the basic validation, it works. Moving this bug to verified state.

Comment 9 Divya 2017-05-29 09:50:48 UTC
Kotresh,

Could you review and sign-off the edited doc text?

Comment 10 Kotresh HR 2017-05-29 10:43:07 UTC
(In reply to Divya from comment #9)
> Kotresh,
> 
> Could you review and sign-off the edited doc text?

Minor comment. Snapshot was not always failing. You can add the word 'sometimes'.
Rest of it looks good

Comment 11 Divya 2017-05-29 10:59:59 UTC
(In reply to Kotresh HR from comment #10)
> (In reply to Divya from comment #9)
> > Kotresh,
> > 
> > Could you review and sign-off the edited doc text?
> 
> Minor comment. Snapshot was not always failing. You can add the word
> 'sometimes'.
> Rest of it looks good

Added "sometimes". Thanks for the review.

Comment 13 errata-xmlrpc 2017-06-08 09:36:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1418