Description of problem: When all geo-replication instances running on a node uses same config file (multiple bricks on the same server); If one instance goes into xsync mode and write change_detector - xsync in config file and after that on re start (because of any reason) other instances would fall back to xsync mode. Version-Release number of selected component (if applicable): 3.4.0.33rhs-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. create and start dist-rep volume and mount it.Start creating data on master volume from mount point. --> Make sure that master volume has more than one brick on same RHSS mount point:- mount | grep remove_xsync 10.70.35.179:/remove_xsync on /mnt/remove_xsync type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) 10.70.35.179:/remove_xsync on /mnt/remove_xsync_nfs type nfs (rw,addr=10.70.35.179) 2, create and start geo rep session between master and slave volume. [root@old5 ~]# gluster volume geo remove_xsync status NODE MASTER SLAVE HEALTH UPTIME ----------------------------------------------------------------------------------------------------------------- old5.lab.eng.blr.redhat.com remove_xsync ssh://10.70.37.195::remove_xsync Stable 4 days 07:12:33 old6.lab.eng.blr.redhat.com remove_xsync ssh://10.70.37.195::remove_xsync Stable 4 days 23:52:43 -> one RHSS has 3 bricks so all 3 geo-replication instances on that node will share same config file 3. remove brick(s) from master volume with start option. --> gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 start 4. once remove-brick is completed perform commit operation gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 status gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 commit [root@old5 ~]# gluster v info remove_change Volume Name: remove_change Type: Distributed-Replicate Volume ID: eb500199-37d4-4cb9-96ed-ae5bc1bf2498 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.35.179:/rhs/brick3/c1 Brick2: 10.70.35.235:/rhs/brick3/c1 Brick3: 10.70.35.179:/rhs/brick3/c2 Brick4: 10.70.35.235:/rhs/brick3/c2 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on 5.on remove-brick commit operation, that brick process get killed so geo-replication instance for that brick fall back to xsync and same was written in config file. On getting ' ECONNABORTED' other instances on same RHSS were restarted and as config file has change_detector - xsync, all other instances fall back to xsync. [root@old5 ~]# gluster volume geo remove_xsync 10.70.37.195::remove_xsync config | grep change_detector change_detector: xsync log snippet:- less /var/log/glusterfs/geo-replication/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync.log [2013-09-16 14:56:33.944725] I [master(/rhs/brick2/x3):587:fallback_xsync] _GMaster: falling back to xsync mode [2013-09-16 14:56:48.72854] I [syncdutils(/rhs/brick2/x3):159:finalize] <top>: exiting. [2013-09-16 14:56:50.587552] E [syncdutils(/rhs/brick2/x1):201:log_raise_exception] <top>: glusterfs session went down [ECONNABORTED] [2013-09-16 14:56:52.982089] I [syncdutils(/rhs/brick2/x1):159:finalize] <top>: exiting. [2013-09-16 14:56:51.429940] E [syncdutils(/rhs/brick2/x2):201:log_raise_exception] <top>: glusterfs session went down [ECONNABORTED] [2013-09-16 14:56:53.641541] I [syncdutils(/rhs/brick2/x2):159:finalize] <top>: exiting. [2013-09-16 14:56:56.116944] I [monitor(monitor):81:set_state] Monitor: new state: faulty [2013-09-16 14:57:12.589235] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------ [2013-09-16 14:57:12.786187] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker [2013-09-16 14:57:12.730447] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------ [2013-09-16 14:57:12.844243] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker [2013-09-16 14:57:13.646564] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------ [2013-09-16 14:57:13.647228] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker [2013-09-16 14:57:14.677306] I [gsyncd(/rhs/brick2/x2):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root .37.195:gluster://localhost:remove_xsync [2013-09-16 14:57:14.682374] I [gsyncd(/rhs/brick2/x3):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root .37.98:gluster://localhost:remove_xsync [2013-09-16 14:57:14.684375] I [gsyncd(/rhs/brick2/x1):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root .37.98:gluster://localhost:remove_xsync [2013-09-16 14:57:21.670073] I [master(/rhs/brick2/x2):57:gmaster_builder] <top>: setting up xsync change detection mode [2013-09-16 14:57:21.676136] I [master(/rhs/brick2/x2):57:gmaster_builder] <top>: setting up xsync change detection mode [2013-09-16 14:57:21.688627] I [master(/rhs/brick2/x2):816:register] _GMaster: xsync temp directory: /var/run/gluster/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync/9b86668c9bd1c074e1e2720fc5005e44/xsync [2013-09-16 14:57:21.688901] I [master(/rhs/brick2/x2):816:register] _GMaster: xsync temp directory: /var/run/gluster/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync/9b86668c9bd1c074e1e2720fc5005e44/xsync [2013-09-16 14:57:22.300641] I [master(/rhs/brick2/x3):57:gmaster_builder] <top>: setting up xsync change detection mode [2013-09-16 14:57:22.320192] I [master(/rhs/brick2/x1):57:gmaster_builder] <top>: setting up xsync change detection mode [2013-09-16 14:57:22.320787] I [master(/rhs/brick2/x3):57:gmaster_builder] <top>: setting up xsync change detection mode [2013-09-16 14:57:22.323508] I [master(/rhs/brick2/x1):57:gmaster_builder] <top>: setting up xsync change detection mode Actual results: as all instances are using same config file, other instances also started using xsync mode instead of change log.
Here remove-brick is one use case, this can happen in other user cases where one instance update shared files.
Now we don't update the config file when fail-back to xsync. Xsync usage is only temporary. With RHGS 3.1 we don't switch to XSync except New brick or during initial run. We don't have brick specific config details. Using single config file for all the bricks of same Geo-rep session is expected behavior. Multiple Geo-rep session will have different config files. Closing this bug since the behavior of switching to XSync is changed and single config for all bricks of same session is expected behavior. Please reopen if this issue found again.