Bug 1026831 - Dist-geo-rep : In the newly added node, the gsyncd uses xsync as change_detector instead of changelog,
Dist-geo-rep : In the newly added node, the gsyncd uses xsync as change_detec...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: RHGS 3.1.0
Assigned To: Aravinda VK
Rahul Hinduja
consistency
:
Depends On:
Blocks: 987980 1032445 1202842 1223636
  Show dependency treegraph
 
Reported: 2013-11-05 08:57 EST by Vijaykumar Koppad
Modified: 2015-07-29 00:29 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.7.0-2.el6rhs
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-29 00:29:24 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vijaykumar Koppad 2013-11-05 08:57:31 EST
Description of problem: In the newly added node, the gsyncd uses xsync as change_detector instead of changelog because master cluster's xtime not found. After add brick, while rebalance and geo-rep were running, the files were being created on the master.  

The geo-rep logs

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2013-11-05 18:45:11.960931] I [master(/bricks/brick5):922:crawl] _GMaster: processing xsync changelog /var/run/gluste
r/master/ssh%3A%2F%2Froot%4010.70.43.159%3Agluster%3A%2F%2F127.0.0.1%3Aslave/fae3e853f0a57380f09943a77fd57fb1/xsync/XS
YNC-CHANGELOG.1383657311
[2013-11-05 18:45:11.965917] I [master(/bricks/brick5):917:crawl] _GMaster: finished hybrid crawl syncing
[2013-11-05 18:46:12.36593] I [master(/bricks/brick5):426:crawlwrap] _GMaster: 1 crawls, 1 turns
[2013-11-05 18:46:12.119884] I [master(/bricks/brick5):912:crawl] _GMaster: starting hybrid crawl
[2013-11-05 18:46:15.474694] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.536650] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.537344] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.551314] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.580162] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.600104] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.653172] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.669501] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.689486] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.711652] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.741478] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.742023] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:16.68834] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:16.78027] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:23.131655] I [master(/bricks/brick5):922:crawl] _GMaster: processing xsync changelog /var/run/gluste
r/master/ssh%3A%2F%2Froot%4010.70.43.159%3Agluster%3A%2F%2F127.0.0.1%3Aslave/fae3e853f0a57380f09943a77fd57fb1/xsync/XS
YNC-CHANGELOG.1383657372
[2013-11-05 18:46:34.580599] I [master(/bricks/brick5):917:crawl] _GMaster: finished hybrid crawl syncing
[2013-11-05 18:47:34.645342] I [master(/bricks/brick5):426:crawlwrap] _GMaster: 1 crawls, 1 turns
[2013-11-05 18:47:34.698522] I [master(/bricks/brick5):912:crawl] _GMaster: starting hybrid crawl
[2013-11-05 18:47:44.712555] I [master(/bricks/brick5):922:crawl] _GMaster: processing xsync changelog /var/run/gluste
r/master/ssh%3A%2F%2Froot%4010.70.43.159%3Agluster%3A%2F%2F127.0.0.1%3Aslave/fae3e853f0a57380f09943a77fd57fb1/xsync/XS
YNC-CHANGELOG.1383657454

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable): glusterfs-3.4.0.39rhs-1


How reproducible:Didn't try to reproduce


Steps to Reproduce:
1.create and start a geo-rep relationship between master(dist-rep) and slave. 
2.create some data on master and let it sync to slave.
3.add new node to the master cluster.
4.start creating files on master and start rebalance parallely. 


Actual results: the newly added node start using xsync as change_detector instead of changelog


Expected results: it should use changelog as change_detector.


Additional info:

Logs from changes.log 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 /bricks/brick5/.glusterfs/changelogs/CHANGELOG.1383655863
[2013-11-05 12:51:18.801265] D [gf-changelog-process.c:548:gf_changelog_ext_change] 0-glusterfs: processing changelog: /bricks/brick5/.glusterfs/changelogs/CHANGELOG.1383655878
[2013-11-05 12:51:22.148076] I [gf-changelog-process.c:589:gf_changelog_process] 0-glusterfs: close from changelog notification translator.
[2013-11-05 12:51:22.148117] I [gf-changelog.c:165:gf_changelog_notification_init] 0-glusterfs: Reconnecting...
[2013-11-05 12:51:22.148163] I [gf-changelog.c:179:gf_changelog_notification_init] 0-glusterfs: connecting to changelog socket: /var/run/gluster/changelog-fae3e853f0a57380f09943a77fd57fb1.sock (brick: /bricks/brick5)
[2013-11-05 12:51:22.148184] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 1/5...
[2013-11-05 12:51:24.148461] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 2/5...
[2013-11-05 12:51:26.148763] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 3/5...
[2013-11-05 12:51:28.156235] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 4/5...
[2013-11-05 12:51:30.156486] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 5/5...
[2013-11-05 12:51:32.156809] E [gf-changelog.c:204:gf_changelog_notification_init] 0-glusterfs: could not connect to changelog socket! bailing out...
[2013-11-05 12:51:32.156922] D [gf-changelog-process.c:616:gf_changelog_process] 0-glusterfs: byebye (1) from processing thread...

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Comment 6 Rahul Hinduja 2015-07-16 08:05:11 EDT
Verified with the build: glusterfs-3.7.1-10.el6rhs.x86_64

When new node was added, it tried history which fails because no changelog is present. It then did perform the xsync and once the xsync is successful, it set the stime and used the changelog as change detector.

[2015-07-16 17:10:59.505544] I [master(/rhs/brick2/b2):528:crawlwrap] _GMaster: crawl interval: 1 seconds
[2015-07-16 17:10:59.510324] I [master(/rhs/brick1/b1):528:crawlwrap] _GMaster: crawl interval: 1 seconds
[2015-07-16 17:10:59.541496] I [master(/rhs/brick2/b2):1123:crawl] _GMaster: starting history crawl... turns: 1, stime: (-1, 0)
[2015-07-16 17:10:59.541810] I [master(/rhs/brick2/b2):1126:crawl] _GMaster: stime not available, abandoning history crawl
[2015-07-16 17:10:59.542121] I [resource(/rhs/brick2/b2):1444:service_loop] GLUSTER: No stime available, using xsync crawl
[2015-07-16 17:10:59.550285] I [master(/rhs/brick2/b2):519:crawlwrap] _GMaster: primary master with volume id 61887764-6ecb-4956-bf10-0cae54b8f497 ...
[2015-07-16 17:10:59.562691] I [master(/rhs/brick2/b2):528:crawlwrap] _GMaster: crawl interval: 60 seconds
[2015-07-16 17:10:59.577857] I [master(/rhs/brick2/b2):1230:crawl] _GMaster: starting hybrid crawl..., stime: (-1, 0)
[2015-07-16 17:10:59.586750] I [master(/rhs/brick2/b2):1237:crawl] _GMaster: finished hybrid crawl syncing, stime: (1437046859, 0)
[2015-07-16 17:10:59.591370] I [master(/rhs/brick2/b2):519:crawlwrap] _GMaster: primary master with volume id 61887764-6ecb-4956-bf10-0cae54b8f497 ...
[2015-07-16 17:10:59.597722] I [master(/rhs/brick2/b2):528:crawlwrap] _GMaster: crawl interval: 3 seconds
[2015-07-16 17:11:02.645987] I [master(/rhs/brick2/b2):1084:crawl] _GMaster: slave's time: (1437046859, 0)
[2015-07-16 17:11:59.808649] I [master(/rhs/brick1/b1):541:crawlwrap] _GMaster: 0 crawls, 0 turns
[2015-07-16 17:12:00.626524] I [master(/rhs/brick2/b2):541:crawlwrap] _GMaster: 20 crawls, 1 turns
[2015-07-16 17:13:00.102729] I [master(/rhs/brick1/b1):541:crawlwrap] _GMaster: 0 crawls, 0 turns

This is expected in 3.1. Moving this bug to verified state.
Comment 8 errata-xmlrpc 2015-07-29 00:29:24 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.