Bug 1009265 - [RFE] Dist-geo-rep : It's better to have different config file for all instances running on a same RHSS node
[RFE] Dist-geo-rep : It's better to have different config file for all instan...
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
: FutureFeature
Depends On:
  Show dependency treegraph
Reported: 2013-09-18 01:46 EDT by Rachana Patel
Modified: 2015-08-04 01:19 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-08-04 01:19:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Rachana Patel 2013-09-18 01:46:45 EDT
Description of problem:
When all geo-replication instances running on a node uses same config file (multiple bricks on the same server); If one instance goes into xsync mode and write change_detector - xsync in config file and after that on re start (because of any reason) other instances would fall back to xsync mode.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.  create and start dist-rep volume and mount it.Start creating data on master volume from mount point. 
--> Make sure that master volume has more than one brick on same RHSS

mount point:-
mount | grep remove_xsync on /mnt/remove_xsync type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) on /mnt/remove_xsync_nfs type nfs (rw,addr=

2, create and start geo rep session between master and slave volume.
[root@old5 ~]# gluster volume geo remove_xsync status
NODE                           MASTER           SLAVE                                HEALTH    UPTIME                
old5.lab.eng.blr.redhat.com    remove_xsync    ssh://    Stable    4 days 07:12:33       
old6.lab.eng.blr.redhat.com    remove_xsync    ssh://    Stable    4 days 23:52:43 

-> one RHSS has 3 bricks so all 3 geo-replication instances on that node will share same config file

3. remove brick(s) from master volume with start option.

--> gluster volume remove-brick remove_xsync start

4. once remove-brick is completed perform commit operation
 gluster volume remove-brick remove_xsync status
 gluster volume remove-brick remove_xsync commit

[root@old5 ~]# gluster v info remove_change
Volume Name: remove_change
Type: Distributed-Replicate
Volume ID: eb500199-37d4-4cb9-96ed-ae5bc1bf2498
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on

5.on remove-brick commit operation, that brick process get killed so geo-replication instance for that brick fall back to xsync and same was written in config file. 

On getting ' ECONNABORTED' other instances on same RHSS were restarted and as config file has change_detector - xsync, all other instances fall back to xsync.

[root@old5 ~]# gluster volume geo remove_xsync config | grep change_detector
change_detector: xsync

log snippet:-
 less /var/log/glusterfs/geo-replication/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync.log 

[2013-09-16 14:56:33.944725] I [master(/rhs/brick2/x3):587:fallback_xsync] _GMaster: falling back to xsync mode
[2013-09-16 14:56:48.72854] I [syncdutils(/rhs/brick2/x3):159:finalize] <top>: exiting.
[2013-09-16 14:56:50.587552] E [syncdutils(/rhs/brick2/x1):201:log_raise_exception] <top>: glusterfs session went down [ECONNABORTED]
[2013-09-16 14:56:52.982089] I [syncdutils(/rhs/brick2/x1):159:finalize] <top>: exiting.
[2013-09-16 14:56:51.429940] E [syncdutils(/rhs/brick2/x2):201:log_raise_exception] <top>: glusterfs session went down [ECONNABORTED]
[2013-09-16 14:56:53.641541] I [syncdutils(/rhs/brick2/x2):159:finalize] <top>: exiting.
[2013-09-16 14:56:56.116944] I [monitor(monitor):81:set_state] Monitor: new state: faulty
[2013-09-16 14:57:12.589235] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-09-16 14:57:12.786187] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-09-16 14:57:12.730447] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-09-16 14:57:12.844243] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-09-16 14:57:13.646564] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-09-16 14:57:13.647228] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-09-16 14:57:14.677306] I [gsyncd(/rhs/brick2/x2):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root@10.70
[2013-09-16 14:57:14.682374] I [gsyncd(/rhs/brick2/x3):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root@10.70
[2013-09-16 14:57:14.684375] I [gsyncd(/rhs/brick2/x1):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root@10.70
[2013-09-16 14:57:21.670073] I [master(/rhs/brick2/x2):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:21.676136] I [master(/rhs/brick2/x2):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:21.688627] I [master(/rhs/brick2/x2):816:register] _GMaster: xsync temp directory: /var/run/gluster/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync/9b86668c9bd1c074e1e2720fc5005e44/xsync
[2013-09-16 14:57:21.688901] I [master(/rhs/brick2/x2):816:register] _GMaster: xsync temp directory: /var/run/gluster/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync/9b86668c9bd1c074e1e2720fc5005e44/xsync
[2013-09-16 14:57:22.300641] I [master(/rhs/brick2/x3):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:22.320192] I [master(/rhs/brick2/x1):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:22.320787] I [master(/rhs/brick2/x3):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:22.323508] I [master(/rhs/brick2/x1):57:gmaster_builder] <top>: setting up xsync change detection mode

Actual results:
as all instances are using same config file, other instances also started using xsync mode instead of change log.
Comment 2 Rachana Patel 2013-09-18 05:11:17 EDT
Here remove-brick is one use case, this can happen in other user cases where one instance update shared files.
Comment 5 Aravinda VK 2015-08-04 01:19:02 EDT
Now we don't update the config file when fail-back to xsync. Xsync usage is only temporary. With RHGS 3.1 we don't switch to XSync except New brick or during initial run.

We don't have brick specific config details. Using single config file for all the bricks of same Geo-rep session is expected behavior. Multiple Geo-rep session will have different config files.

Closing this bug since the behavior of switching to XSync is changed and single config for all bricks of same session is expected behavior. Please reopen if this issue found again.

Note You need to log in before you can comment on or make changes to this bug.