1026831 – Dist-geo-rep : In the newly added node, the gsyncd uses xsync as change_detector instead of changelog,

Bug 1026831 - Dist-geo-rep : In the newly added node, the gsyncd uses xsync as change_detector instead of changelog,

Summary: Dist-geo-rep : In the newly added node, the gsyncd uses xsync as change_detec...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Aravinda VK
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:	consistency
Depends On:
Blocks:	987980 1032445 1202842 1223636
TreeView+	depends on / blocked

Reported:	2013-11-05 13:57 UTC by Vijaykumar Koppad
Modified:	2015-07-29 04:29 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.7.0-2.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-07-29 04:29:24 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Vijaykumar Koppad 2013-11-05 13:57:31 UTC

Description of problem: In the newly added node, the gsyncd uses xsync as change_detector instead of changelog because master cluster's xtime not found. After add brick, while rebalance and geo-rep were running, the files were being created on the master.  

The geo-rep logs

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2013-11-05 18:45:11.960931] I [master(/bricks/brick5):922:crawl] _GMaster: processing xsync changelog /var/run/gluste
r/master/ssh%3A%2F%2Froot%4010.70.43.159%3Agluster%3A%2F%2F127.0.0.1%3Aslave/fae3e853f0a57380f09943a77fd57fb1/xsync/XS
YNC-CHANGELOG.1383657311
[2013-11-05 18:45:11.965917] I [master(/bricks/brick5):917:crawl] _GMaster: finished hybrid crawl syncing
[2013-11-05 18:46:12.36593] I [master(/bricks/brick5):426:crawlwrap] _GMaster: 1 crawls, 1 turns
[2013-11-05 18:46:12.119884] I [master(/bricks/brick5):912:crawl] _GMaster: starting hybrid crawl
[2013-11-05 18:46:15.474694] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.536650] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.537344] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.551314] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.580162] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.600104] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.653172] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.669501] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.689486] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.711652] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.741478] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:15.742023] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:16.68834] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:16.78027] W [master(/bricks/brick5):987:Xcrawl] _GMaster: master cluster's xtime not found
[2013-11-05 18:46:23.131655] I [master(/bricks/brick5):922:crawl] _GMaster: processing xsync changelog /var/run/gluste
r/master/ssh%3A%2F%2Froot%4010.70.43.159%3Agluster%3A%2F%2F127.0.0.1%3Aslave/fae3e853f0a57380f09943a77fd57fb1/xsync/XS
YNC-CHANGELOG.1383657372
[2013-11-05 18:46:34.580599] I [master(/bricks/brick5):917:crawl] _GMaster: finished hybrid crawl syncing
[2013-11-05 18:47:34.645342] I [master(/bricks/brick5):426:crawlwrap] _GMaster: 1 crawls, 1 turns
[2013-11-05 18:47:34.698522] I [master(/bricks/brick5):912:crawl] _GMaster: starting hybrid crawl
[2013-11-05 18:47:44.712555] I [master(/bricks/brick5):922:crawl] _GMaster: processing xsync changelog /var/run/gluste
r/master/ssh%3A%2F%2Froot%4010.70.43.159%3Agluster%3A%2F%2F127.0.0.1%3Aslave/fae3e853f0a57380f09943a77fd57fb1/xsync/XS
YNC-CHANGELOG.1383657454

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable): glusterfs-3.4.0.39rhs-1


How reproducible:Didn't try to reproduce


Steps to Reproduce:
1.create and start a geo-rep relationship between master(dist-rep) and slave. 
2.create some data on master and let it sync to slave.
3.add new node to the master cluster.
4.start creating files on master and start rebalance parallely. 


Actual results: the newly added node start using xsync as change_detector instead of changelog


Expected results: it should use changelog as change_detector.


Additional info:

Logs from changes.log 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 /bricks/brick5/.glusterfs/changelogs/CHANGELOG.1383655863
[2013-11-05 12:51:18.801265] D [gf-changelog-process.c:548:gf_changelog_ext_change] 0-glusterfs: processing changelog: /bricks/brick5/.glusterfs/changelogs/CHANGELOG.1383655878
[2013-11-05 12:51:22.148076] I [gf-changelog-process.c:589:gf_changelog_process] 0-glusterfs: close from changelog notification translator.
[2013-11-05 12:51:22.148117] I [gf-changelog.c:165:gf_changelog_notification_init] 0-glusterfs: Reconnecting...
[2013-11-05 12:51:22.148163] I [gf-changelog.c:179:gf_changelog_notification_init] 0-glusterfs: connecting to changelog socket: /var/run/gluster/changelog-fae3e853f0a57380f09943a77fd57fb1.sock (brick: /bricks/brick5)
[2013-11-05 12:51:22.148184] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 1/5...
[2013-11-05 12:51:24.148461] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 2/5...
[2013-11-05 12:51:26.148763] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 3/5...
[2013-11-05 12:51:28.156235] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 4/5...
[2013-11-05 12:51:30.156486] W [gf-changelog.c:189:gf_changelog_notification_init] 0-glusterfs: connection attempt 5/5...
[2013-11-05 12:51:32.156809] E [gf-changelog.c:204:gf_changelog_notification_init] 0-glusterfs: could not connect to changelog socket! bailing out...
[2013-11-05 12:51:32.156922] D [gf-changelog-process.c:616:gf_changelog_process] 0-glusterfs: byebye (1) from processing thread...

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Comment 6 Rahul Hinduja 2015-07-16 12:05:11 UTC

Verified with the build: glusterfs-3.7.1-10.el6rhs.x86_64

When new node was added, it tried history which fails because no changelog is present. It then did perform the xsync and once the xsync is successful, it set the stime and used the changelog as change detector.

[2015-07-16 17:10:59.505544] I [master(/rhs/brick2/b2):528:crawlwrap] _GMaster: crawl interval: 1 seconds
[2015-07-16 17:10:59.510324] I [master(/rhs/brick1/b1):528:crawlwrap] _GMaster: crawl interval: 1 seconds
[2015-07-16 17:10:59.541496] I [master(/rhs/brick2/b2):1123:crawl] _GMaster: starting history crawl... turns: 1, stime: (-1, 0)
[2015-07-16 17:10:59.541810] I [master(/rhs/brick2/b2):1126:crawl] _GMaster: stime not available, abandoning history crawl
[2015-07-16 17:10:59.542121] I [resource(/rhs/brick2/b2):1444:service_loop] GLUSTER: No stime available, using xsync crawl
[2015-07-16 17:10:59.550285] I [master(/rhs/brick2/b2):519:crawlwrap] _GMaster: primary master with volume id 61887764-6ecb-4956-bf10-0cae54b8f497 ...
[2015-07-16 17:10:59.562691] I [master(/rhs/brick2/b2):528:crawlwrap] _GMaster: crawl interval: 60 seconds
[2015-07-16 17:10:59.577857] I [master(/rhs/brick2/b2):1230:crawl] _GMaster: starting hybrid crawl..., stime: (-1, 0)
[2015-07-16 17:10:59.586750] I [master(/rhs/brick2/b2):1237:crawl] _GMaster: finished hybrid crawl syncing, stime: (1437046859, 0)
[2015-07-16 17:10:59.591370] I [master(/rhs/brick2/b2):519:crawlwrap] _GMaster: primary master with volume id 61887764-6ecb-4956-bf10-0cae54b8f497 ...
[2015-07-16 17:10:59.597722] I [master(/rhs/brick2/b2):528:crawlwrap] _GMaster: crawl interval: 3 seconds
[2015-07-16 17:11:02.645987] I [master(/rhs/brick2/b2):1084:crawl] _GMaster: slave's time: (1437046859, 0)
[2015-07-16 17:11:59.808649] I [master(/rhs/brick1/b1):541:crawlwrap] _GMaster: 0 crawls, 0 turns
[2015-07-16 17:12:00.626524] I [master(/rhs/brick2/b2):541:crawlwrap] _GMaster: 20 crawls, 1 turns
[2015-07-16 17:13:00.102729] I [master(/rhs/brick1/b1):541:crawlwrap] _GMaster: 0 crawls, 0 turns

This is expected in 3.1. Moving this bug to verified state.

Comment 8 errata-xmlrpc 2015-07-29 04:29:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.