Bug 1380122

Summary: Labelled geo-rep checkpoints hide geo-replication status
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Chris Blum <cblum>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, avishwan, csaba, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1388401 (view as bug list) Environment:
Last Closed: 2017-03-23 05:50:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528, 1388401, 1395626, 1395627, 1395628    

Description Chris Blum 2016-09-28 17:30:35 UTC
Description of problem:
When assigning a label to a checkpoint, geo-replication status will throw:
No active geo-replication sessions between [masternode] and [geo-rep target]

Version-Release number of selected component (if applicable):
[root@RHGS1 rep01]# rpm -qa | grep gluster
glusterfs-rdma-3.7.9-10.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-10.el7rhgs.x86_64
glusterfs-libs-3.7.9-10.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-10.el7rhgs.x86_64
python-gluster-3.7.9-10.el7rhgs.noarch
glusterfs-fuse-3.7.9-10.el7rhgs.x86_64
glusterfs-cli-3.7.9-10.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
glusterfs-ganesha-3.7.9-10.el7rhgs.x86_64
glusterfs-3.7.9-10.el7rhgs.x86_64
glusterfs-api-3.7.9-10.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
samba-vfs-glusterfs-4.4.3-7.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.5.el7rhgs.noarch
glusterfs-server-3.7.9-10.el7rhgs.x86_64
[root@RHGS1 rep01]# gluster --version
glusterfs 3.7.9 built on Jun 10 2016 06:32:42
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

How reproducible:
Always

Steps to Reproduce:
1. Set up a geo-replication session
2. Once started, create a checkpoint like this:
# gluster volume geo-replication rep01 RHGS3::slave config checkpoint chris
3. Run # gluster volume geo-replication rep01 RHGS3::slave status

Actual results:
No active geo-replication sessions between rep01 and RHGS3::slave

Expected results:
MASTER NODE    MASTER VOL    MASTER BRICK         SLAVE USER    SLAVE           SLAVE NODE    STATUS     CRAWL STATUS       LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------
RHGS1          rep01         /rhs/brick1/rep01    root          RHGS3::slave    RHGS3         Active     Changelog Crawl    2016-09-28 13:13:45
RHGS2          rep01         /rhs/brick1/rep01    root          RHGS3::slave    RHGS4         Passive    N/A                N/A

Additional info:

It seems like the geo-replication continues even though status says there is no active connection :(

[root@RHGS1 rep01]# gluster volume geo-replication rep01 RHGS3::slave status

MASTER NODE    MASTER VOL    MASTER BRICK         SLAVE USER    SLAVE           SLAVE NODE    STATUS     CRAWL STATUS       LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------
RHGS1          rep01         /rhs/brick1/rep01    root          RHGS3::slave    RHGS3         Active     Changelog Crawl    2016-09-28 13:13:45
RHGS2          rep01         /rhs/brick1/rep01    root          RHGS3::slave    RHGS4         Passive    N/A                N/A
[root@RHGS1 rep01]# gluster volume geo-replication rep01 RHGS3::slave config checkpoint chris
geo-replication config updated successfully
[root@RHGS1 rep01]# gluster volume geo-replication rep01 RHGS3::slave status
No active geo-replication sessions between rep01 and RHGS3::slave
[root@RHGS1 rep01]# gluster volume geo-replication rep01 RHGS3::slave config checkpoint now
geo-replication config updated successfully
[root@RHGS1 rep01]# gluster volume geo-replication rep01 RHGS3::slave status

MASTER NODE    MASTER VOL    MASTER BRICK         SLAVE USER    SLAVE           SLAVE NODE    STATUS     CRAWL STATUS       LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------
RHGS1          rep01         /rhs/brick1/rep01    root          RHGS3::slave    RHGS3         Active     Changelog Crawl    2016-09-28 13:13:45
RHGS2          rep01         /rhs/brick1/rep01    root          RHGS3::slave    RHGS4         Passive    N/A                N/A

Comment 2 Aravinda VK 2016-09-29 06:20:52 UTC
Set the Checkpoint for current time using,

gluster volume geo-replication rep01 RHGS3::slave config checkpoint now

As mentioned in the description we need to validate for other inputs.(other than now)

Comment 3 Aravinda VK 2016-10-25 09:19:22 UTC
Added validation for label format. Now Geo-rep checkpoint label will accept only valid date with format "YYYY-MM-DD HH:MM:SS" For example, "2016-10-25 14:30:45"

Upstream patch sent to fix the issue
http://review.gluster.org/15721

Comment 4 Chris Blum 2016-10-25 15:26:05 UTC
I don't approve with this... A 'label' should be a text string that I can assign (like "pre-prod"), not a very strictly defined date-time-stamp.
Why is it not possible to use a string here?

Comment 5 Aravinda VK 2016-10-25 15:57:09 UTC
(In reply to Chris Blum from comment #4)
> I don't approve with this... A 'label' should be a text string that I can
> assign (like "pre-prod"), not a very strictly defined date-time-stamp.
> Why is it not possible to use a string here?

Currently Geo-replication uses checkpoint date to find sync is complete till that time or not. Checkpoint completion means everything created in Master before the checkpoint time is synced to slave.


Example usage of Checkpoint:
    gluster volume geo-replication rep01 RHGS3::slave config checkpoint "2016-10-25 20:00:00"

Watch the Checkpoint status using Geo-rep status command, If the status says Checkpoint completed=Yes then it means all the files created/modified in Master Volume before 2016-10-25 20:00:00 are synced to Slave volume.



May be I am missing something here. What is the usecase of non date checkpoint? We can enhance Geo-replication to support that usecase.

Comment 6 Chris Blum 2016-10-25 17:17:21 UTC
OK that makes more sense then - so the label is then implemented so that I can find out if things 5 days ago have been properly synced to the other side?
Will the 'checkpoint completed' timestamp then show me when the files 5 days ago have been synced? Because why else would I be interested in an earlier date other than now if not?

Comment 7 Aravinda VK 2016-10-26 04:56:25 UTC
(In reply to Chris Blum from comment #6)
> OK that makes more sense then - so the label is then implemented so that I
> can find out if things 5 days ago have been properly synced to the other
> side?
> Will the 'checkpoint completed' timestamp then show me when the files 5 days
> ago have been synced? Because why else would I be interested in an earlier
> date other than now if not?

With the "last synced" column in status output, so earlier checkpoint date is not much useful. If last synced time from all Active workers are more than required time then it can be considered as checkpoint completed.

Labeled checkpoint is more useful to set future times. For example, checkpoint is required for midnight current day. Instead of setting checkpoint at midnight using now, it can be set using label.

Comment 10 Aravinda VK 2016-11-16 10:26:37 UTC
Upstream Patches:
Mainline:    http://review.gluster.org/15721
Release 3.7: http://review.gluster.org/15856
Release 3.8: http://review.gluster.org/15855
Release 3.9: http://review.gluster.org/15854

Downstream Patch:
https://code.engineering.redhat.com/gerrit/90316

Comment 12 Rahul Hinduja 2017-03-05 13:12:43 UTC
Verified with the build: glusterfs-geo-replication-3.8.4-17.el7rhgs.x86_64

checkpoint do not accept values other than now and format (Y-m-d H:M:S)

3.1.3:
======

[root@dhcp42-195 scripts]# gluster volume geo-replication master 10.70.43.63::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                 SLAVE NODE      STATUS    CRAWL STATUS       LAST_SYNCED                  
---------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.195    master        /rhs/brick1/b1    root          10.70.43.63::slave    10.70.42.54     Active    Changelog Crawl    2017-03-05 07:23:15          
10.70.42.195    master        /rhs/brick2/b4    root          10.70.43.63::slave    10.70.42.54     Active    Changelog Crawl    2017-03-05 07:23:15          
10.70.43.93     master        /rhs/brick1/b3    root          10.70.43.63::slave    10.70.43.178    Active    Changelog Crawl    2017-03-05 07:23:15          
10.70.43.93     master        /rhs/brick2/b6    root          10.70.43.63::slave    10.70.43.178    Active    Changelog Crawl    2017-03-05 07:23:15          
10.70.43.124    master        /rhs/brick1/b2    root          10.70.43.63::slave    10.70.43.63     Active    Changelog Crawl    2017-03-05 07:23:23          
10.70.43.124    master        /rhs/brick2/b5    root          10.70.43.63::slave    10.70.43.63     Active    Changelog Crawl    2017-03-05 07:23:15          
[root@dhcp42-195 scripts]# gluster volume geo-replication master 10.70.43.63::slave config checkpoint rahul
geo-replication config updated successfully
[root@dhcp42-195 scripts]# gluster volume geo-replication master 10.70.43.63::slave config checkpoint 
rahul
[root@dhcp42-195 scripts]# gluster volume geo-replication master 10.70.43.63::slave status
No active geo-replication sessions between master and 10.70.43.63::slave
[root@dhcp42-195 scripts]# 


3.2.0:
======

[root@dhcp42-7 scripts]# gluster volume geo-replication master 10.70.43.249::slave config checkpoint rahul
Invalid Checkpoint label. Use format "Y-m-d H:M:S", Example: 2016-10-25 15:30:45
Usage: volume geo-replication [<VOLNAME>] [<SLAVE-URL>] {create [[ssh-port n] [[no-verify]|[push-pem]]] [force]|start [force]|stop [force]|pause [force]|resume [force]|config|status [detail]|delete [reset-sync-time]} [options...]
[root@dhcp42-7 scripts]# 


Moving the bug to verified state.

Comment 16 errata-xmlrpc 2017-03-23 05:50:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html