Bug 1437244 - geo-rep not detecting changes
Summary: geo-rep not detecting changes
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.10
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-29 21:30 UTC by jeremiah
Modified: 2018-06-20 18:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-20 18:26:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Master volume log 1 (335 bytes, text/plain)
2017-03-30 07:41 UTC, jeremiah
no flags Details
Master volume log 2 (77.50 KB, text/plain)
2017-03-30 07:42 UTC, jeremiah
no flags Details
Master volume log 3 (7.62 KB, text/plain)
2017-03-30 07:42 UTC, jeremiah
no flags Details
Updated master log 1 (66.37 KB, text/plain)
2017-04-26 23:50 UTC, jeremiah
no flags Details
Updated master log 2 (160 bytes, text/plain)
2017-04-26 23:50 UTC, jeremiah
no flags Details
Updated master log 3 (8.27 KB, text/plain)
2017-04-26 23:51 UTC, jeremiah
no flags Details

Description jeremiah 2017-03-29 21:30:54 UTC
Description of problem: Initial sync is successful, however, further filesystem changes are not detected or synced.

Version-Release number of selected component (if applicable): Both 3.8 & 3.10

How reproducible: 100%

Steps to Reproduce: Setup is comprised of two servers, both running fully updated CentOS 7. No SELinux.

* ill: Local server. Master volume named "foobar".
* aws: Remote server. Slave volume named "foobar".

* Both servers are running ntpd.
* Their clocks are in the same time zone and in sync.
* Passwordless SSH is setup and working.
* common_secret.pem.pub was generated on the local host "ill".
* 'create push-pem' was successfully on the local host "ill". Verified remote side had the two expected "command=" entries in ~/.ssh/authorized_keys.
* geo-rep 'start' session between two volumes was successfully created.
* The local filesystem successfully does an initial sync to the remote filesystem.
* No other changes are detected or synced

* 'status detail' looks clean but LAST_SYNCED never changes:

MASTER NODE: ill.franz.com
MASTER VOL: foobar
MASTER BRICK: /gv0/foobar
SLAVE USER: root
SLAVE NODE: aws.franz.com::foobar
STATUS: Active
CRAWL STATUS: Changelog Crawl
LAST_SYNCED: 2017-03-29 13:44:23


Actual results: No changes detected or synced


Expected results: Changes to be detected and synced


Additional info: I went with the default 'config' options. However, during troubleshooting I noticed that some of the default values seem suspicious/wrong. 

For example, 'remote_gsyncd' is set to '/nonexistent/gsyncd'.

Also, some of the variable values are incorrectly guessed. For example, 'gluster_log_file' is guessed as:

/var/log/glusterfs/geo-replication/foobar/ssh%3A%2F%2Froot%4054.165.144.9%3Agluster%3A%2F%2F127.0.0.1%3Afoobar.gluster.log

but the real file is:

/var/log/glusterfs/geo-replication/foobar/ssh%3A%2F%2Froot%4054.165.144.9%3Agluster%3A%2F%2F127.0.0.1%3Afoobar.%2Fgv0%2Ffoobar.gluster.log

I did try updating these variables to what seemed like more correct values but none of my changes had any effect on my problem.

I tried changing the 'change_detector' to xsync. No change in behavior.

I tried with both ext4 & xfs filesystems. No change in behavior.

I tried setting a checkpoint. No change in behavior.

I'm out of ideas at this point but happy to try anything & provide logs. Thanks so much for your time & help debugging!

Comment 1 jeremiah 2017-03-30 07:41:56 UTC
Created attachment 1267436 [details]
Master volume log 1

Comment 2 jeremiah 2017-03-30 07:42:25 UTC
Created attachment 1267437 [details]
Master volume log 2

Comment 3 jeremiah 2017-03-30 07:42:42 UTC
Created attachment 1267438 [details]
Master volume log 3

Comment 4 jeremiah 2017-04-26 23:50:19 UTC
Created attachment 1274471 [details]
Updated master log 1

Here's the new batch of log files, they don't have any of the 'transport end point not connected' errors in them anymore.

Comment 5 jeremiah 2017-04-26 23:50:53 UTC
Created attachment 1274472 [details]
Updated master log 2

Comment 6 jeremiah 2017-04-26 23:51:19 UTC
Created attachment 1274473 [details]
Updated master log 3

Comment 7 Michael Watters 2017-04-28 19:41:58 UTC
I've also just noticed this issue.  geo-replication is working according to the status output however the data on my slave nodes does *not* match what is on the master volume.

[root@mdct-gluster-srv1 ~]# gluster volume geo-replication gv0 status
 
MASTER NODE          MASTER VOL    MASTER BRICK               SLAVE USER    SLAVE                                SLAVE NODE           STATUS     CRAWL STATUS       LAST_SYNCED                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
mdct-gluster-srv1    gv0           /var/mnt/gluster/brick2    root          ssh://mdct-gluster-srv3::slavevol    mdct-gluster-srv3    Active     Changelog Crawl    2017-04-28 08:35:58          
mdct-gluster-srv2    gv0           /var/mnt/gluster/brick     root          ssh://mdct-gluster-srv3::slavevol    mdct-gluster-srv3    Passive    N/A                N/A                          
mdct-gluster-srv1    gv0           /var/mnt/gluster/brick2    root          ssh://mdct-gluster-srv4::slavevol    mdct-gluster-srv4    Active     Changelog Crawl    2017-04-28 08:35:58          
mdct-gluster-srv2    gv0           /var/mnt/gluster/brick     root          ssh://mdct-gluster-srv4::slavevol    mdct-gluster-srv4    Passive    N/A                N/A

ls shows different data as show below.

[root@mdct-00fs-cent7 ~]# ls /var/mnt/shadow/pub/fedora/
dart  releases  updates

[root@mdct-00fs-cent7 ~]# ls /var/mnt/gluster2/pub/fedora/
20  21  22  24  25  dart  README  releases  updates

/var/mnt/shadow is the master volume.

Comment 8 Shyamsundar 2018-06-20 18:26:22 UTC
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.


Note You need to log in before you can comment on or make changes to this bug.