Bug 1009265 - [RFE] Dist-geo-rep : It's better to have different config file for all instances running on a same RHSS node
[RFE] Dist-geo-rep : It's better to have different config file for all instan...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
storage-qa-internal@redhat.com
config
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-18 01:46 EDT by Rachana Patel
Modified: 2015-08-04 01:19 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-08-04 01:19:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rachana Patel 2013-09-18 01:46:45 EDT
Description of problem:
When all geo-replication instances running on a node uses same config file (multiple bricks on the same server); If one instance goes into xsync mode and write change_detector - xsync in config file and after that on re start (because of any reason) other instances would fall back to xsync mode.


Version-Release number of selected component (if applicable):
3.4.0.33rhs-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1.  create and start dist-rep volume and mount it.Start creating data on master volume from mount point. 
--> Make sure that master volume has more than one brick on same RHSS

mount point:-
mount | grep remove_xsync
10.70.35.179:/remove_xsync on /mnt/remove_xsync type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
10.70.35.179:/remove_xsync on /mnt/remove_xsync_nfs type nfs (rw,addr=10.70.35.179)


2, create and start geo rep session between master and slave volume.
[root@old5 ~]# gluster volume geo remove_xsync status
NODE                           MASTER           SLAVE                                HEALTH    UPTIME                
-----------------------------------------------------------------------------------------------------------------
old5.lab.eng.blr.redhat.com    remove_xsync    ssh://10.70.37.195::remove_xsync    Stable    4 days 07:12:33       
old6.lab.eng.blr.redhat.com    remove_xsync    ssh://10.70.37.195::remove_xsync    Stable    4 days 23:52:43 

-> one RHSS has 3 bricks so all 3 geo-replication instances on that node will share same config file

3. remove brick(s) from master volume with start option.

--> gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 start

4. once remove-brick is completed perform commit operation
 gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 status
 gluster volume remove-brick remove_xsync 10.70.35.179:/rhs/brick3/x3 10.70.35.235:/rhs/brick3/x3 commit

[root@old5 ~]# gluster v info remove_change
 
Volume Name: remove_change
Type: Distributed-Replicate
Volume ID: eb500199-37d4-4cb9-96ed-ae5bc1bf2498
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.179:/rhs/brick3/c1
Brick2: 10.70.35.235:/rhs/brick3/c1
Brick3: 10.70.35.179:/rhs/brick3/c2
Brick4: 10.70.35.235:/rhs/brick3/c2
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on

5.on remove-brick commit operation, that brick process get killed so geo-replication instance for that brick fall back to xsync and same was written in config file. 

On getting ' ECONNABORTED' other instances on same RHSS were restarted and as config file has change_detector - xsync, all other instances fall back to xsync.

[root@old5 ~]# gluster volume geo remove_xsync 10.70.37.195::remove_xsync config | grep change_detector
change_detector: xsync

log snippet:-
 less /var/log/glusterfs/geo-replication/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync.log 

[2013-09-16 14:56:33.944725] I [master(/rhs/brick2/x3):587:fallback_xsync] _GMaster: falling back to xsync mode
[2013-09-16 14:56:48.72854] I [syncdutils(/rhs/brick2/x3):159:finalize] <top>: exiting.
[2013-09-16 14:56:50.587552] E [syncdutils(/rhs/brick2/x1):201:log_raise_exception] <top>: glusterfs session went down [ECONNABORTED]
[2013-09-16 14:56:52.982089] I [syncdutils(/rhs/brick2/x1):159:finalize] <top>: exiting.
[2013-09-16 14:56:51.429940] E [syncdutils(/rhs/brick2/x2):201:log_raise_exception] <top>: glusterfs session went down [ECONNABORTED]
[2013-09-16 14:56:53.641541] I [syncdutils(/rhs/brick2/x2):159:finalize] <top>: exiting.
[2013-09-16 14:56:56.116944] I [monitor(monitor):81:set_state] Monitor: new state: faulty
[2013-09-16 14:57:12.589235] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-09-16 14:57:12.786187] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-09-16 14:57:12.730447] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-09-16 14:57:12.844243] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-09-16 14:57:13.646564] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-09-16 14:57:13.647228] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-09-16 14:57:14.677306] I [gsyncd(/rhs/brick2/x2):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root@10.70
.37.195:gluster://localhost:remove_xsync
[2013-09-16 14:57:14.682374] I [gsyncd(/rhs/brick2/x3):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root@10.70
.37.98:gluster://localhost:remove_xsync
[2013-09-16 14:57:14.684375] I [gsyncd(/rhs/brick2/x1):503:main_i] <top>: syncing: gluster://localhost:remove_xsync -> ssh://root@10.70
.37.98:gluster://localhost:remove_xsync
[2013-09-16 14:57:21.670073] I [master(/rhs/brick2/x2):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:21.676136] I [master(/rhs/brick2/x2):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:21.688627] I [master(/rhs/brick2/x2):816:register] _GMaster: xsync temp directory: /var/run/gluster/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync/9b86668c9bd1c074e1e2720fc5005e44/xsync
[2013-09-16 14:57:21.688901] I [master(/rhs/brick2/x2):816:register] _GMaster: xsync temp directory: /var/run/gluster/remove_xsync/ssh%3A%2F%2Froot%4010.70.37.195%3Agluster%3A%2F%2F127.0.0.1%3Aremove_xsync/9b86668c9bd1c074e1e2720fc5005e44/xsync
[2013-09-16 14:57:22.300641] I [master(/rhs/brick2/x3):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:22.320192] I [master(/rhs/brick2/x1):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:22.320787] I [master(/rhs/brick2/x3):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-09-16 14:57:22.323508] I [master(/rhs/brick2/x1):57:gmaster_builder] <top>: setting up xsync change detection mode

Actual results:
as all instances are using same config file, other instances also started using xsync mode instead of change log.
Comment 2 Rachana Patel 2013-09-18 05:11:17 EDT
Here remove-brick is one use case, this can happen in other user cases where one instance update shared files.
Comment 5 Aravinda VK 2015-08-04 01:19:02 EDT
Now we don't update the config file when fail-back to xsync. Xsync usage is only temporary. With RHGS 3.1 we don't switch to XSync except New brick or during initial run.

We don't have brick specific config details. Using single config file for all the bricks of same Geo-rep session is expected behavior. Multiple Geo-rep session will have different config files.

Closing this bug since the behavior of switching to XSync is changed and single config for all bricks of same session is expected behavior. Please reopen if this issue found again.

Note You need to log in before you can comment on or make changes to this bug.