1008826 – [RFE] Dist-geo-rep : remove-brick commit(for brick(s) on master volume) should kill geo-rep worker process for the bricks getting removed.

Bug 1008826 - [RFE] Dist-geo-rep : remove-brick commit(for brick(s) on master volume) should kill geo-rep worker process for the bricks getting removed.

Summary: [RFE] Dist-geo-rep : remove-brick commit(for brick(s) on master volume) shou...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Kotresh HR
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:	consistency
Depends On:
Blocks:	1202842 1223636
TreeView+	depends on / blocked

Reported:	2013-09-17 07:06 UTC by Rachana Patel
Modified:	2015-07-29 04:28 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.7.0-2.el6rhs
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-07-29 04:28:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Rachana Patel 2013-09-17 07:06:46 UTC

Description of problem:
 Dist-geo-rep : remove-brick commit(for brick(s) on master volume)  should kill geo-rep worker process for the bricks getting removed.

Version-Release number of selected component (if applicable):
3.4.0.33rhs-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. create and start geo rep session between master and slave volume.
[root@old5 ~]# gluster volume geo remove_change status
NODE                           MASTER           SLAVE                                HEALTH    UPTIME                
-----------------------------------------------------------------------------------------------------------------
old5.lab.eng.blr.redhat.com    remove_change    ssh://10.70.37.195::remove_chnage    Stable    4 days 07:12:33       
old6.lab.eng.blr.redhat.com    remove_change    ssh://10.70.37.195::remove_chnage    Stable    4 days 23:52:43 

2. start creating data on master volume from mount point
[root@rhs-client22 ~]# mount | grep remove_change
10.70.35.179:/remove_change on /mnt/remove_change type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
10.70.35.179:/remove_change on /mnt/remove_change_nfs type nfs (rw,addr=10.70.35.179)

3. remove brick(s) from master volume

--> gluster volume remove-brick remove_change 10.70.35.179:/rhs/brick3/c3 10.70.35.235:/rhs/brick3/c3 start

4. once remove-brick is completed perform commit operation
 gluster volume remove-brick remove_change 10.70.35.179:/rhs/brick3/c3 10.70.35.235:/rhs/brick3/c3 status
 gluster volume remove-brick remove_change 10.70.35.179:/rhs/brick3/c3 10.70.35.235:/rhs/brick3/c3 commit

[root@old5 ~]# gluster v info remove_change
 
Volume Name: remove_change
Type: Distributed-Replicate
Volume ID: eb500199-37d4-4cb9-96ed-ae5bc1bf2498
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.179:/rhs/brick3/c1
Brick2: 10.70.35.235:/rhs/brick3/c1
Brick3: 10.70.35.179:/rhs/brick3/c2
Brick4: 10.70.35.235:/rhs/brick3/c2
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on


5. verify that geo rep process is running for that brick or not?


[root@old5 ~]# ps auxw | grep remove_change | grep feedback | grep 'local-path /rhs/brick3/c3'
root     24210  0.0  0.1 1037676 5456 ?        Sl   Sep12   2:45 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick3/c1 --path=/rhs/brick3/c2 --path=/rhs/brick3/c3  -c /var/lib/glusterd/geo-replication/remove_change_10.70.37.195_remove_chnage/gsyncd.conf :remove_change --glusterd-uuid=448e09ce-4626-40d0-b352-5ef26a46124f 10.70.37.195::remove_chnage -N -p  --slave-id e6e125c3-4237-4c00-bc2a-679202df0505 --feedback-fd 8 --local-path /rhs/brick3/c3 --local-id .%2Frhs%2Fbrick3%2Fc3 --resource-remote ssh://root.37.98:gluster://localhost:remove_chnage


log snippet:-
2013-09-17 10:11:28.143104] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:11:37.337871] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:12:27.765177] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:12:28.252641] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:12:37.467040] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:13:27.910901] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:13:28.317553] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:13:37.596528] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:14:28.7318] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:14:28.409152] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:14:37.719461] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:15:28.86055] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:15:28.572752] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:15:38.276532] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:16:28.238033] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns
[2013-09-17 10:16:28.841550] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns


Actual results:
geo rep process is still running and scanning that brick for data change.

Expected results:
after commit operation that brick(s) is no longer part of volume. so no need of running geo rep process for that brick.

Additional info:

Comment 8 Rahul Hinduja 2015-05-27 09:56:40 UTC

verified with the build: glusterfs-3.7.0-2.el6rhs.x86_64

As mentioned in comment 3, the steps for remove bricks are changed. 
Commit is not allowed if the geo-rep session is active. Correct error message is shown. In order to perform a commit, one must need to stop the geo-rep session, which kills all the geo-rep process. Hence the original issue reported in this bug will not be seen.

Moving the bug to verified state. 

[root@georep1 scripts]# gluster volume remove-brick master 10.70.46.96:/rhs/brick2/b2 10.70.46.97:/rhs/brick2/b2 10.70.46.93:/rhs/brick2/b2 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: failed: geo-replication sessions are active for the volume master.
Stop geo-replication sessions involved in this volume. Use 'volume geo-replication status' command for more info.
[root@georep1 scripts]# 
[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave stop
Stopping geo-replication session between master & 10.70.46.154::slave has been successful
[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave status detail
 
MASTER NODE    MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE    STATUS     CRAWL STATUS    LAST_SYNCED    ENTRY    DATA    META    FAILURES    CHECKPOINT TIME    CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
georep1        master        /rhs/brick1/b1    root          10.70.46.154::slave    N/A           Stopped    N/A             N/A            N/A      N/A     N/A     N/A         N/A                N/A                     2015-05-27 20:43:15          
georep1        master        /rhs/brick2/b2    root          10.70.46.154::slave    N/A           Stopped    N/A             N/A            N/A      N/A     N/A     N/A         N/A                N/A                     2015-05-27 20:44:17          
georep3        master        /rhs/brick1/b1    root          10.70.46.154::slave    N/A           Stopped    N/A             N/A            N/A      N/A     N/A     N/A         N/A                N/A                     N/A                          
georep3        master        /rhs/brick2/b2    root          10.70.46.154::slave    N/A           Stopped    N/A             N/A            N/A      N/A     N/A     N/A         N/A                N/A                     N/A                          
georep2        master        /rhs/brick1/b1    root          10.70.46.154::slave    N/A           Stopped    N/A             N/A            N/A      N/A     N/A     N/A         N/A                N/A                     N/A                          
georep2        master        /rhs/brick2/b2    root          10.70.46.154::slave    N/A           Stopped    N/A             N/A            N/A      N/A     N/A     N/A         N/A                N/A                     N/A                          
[root@georep1 scripts]# gluster volume remove-brick master 10.70.46.96:/rhs/brick2/b2 10.70.46.97:/rhs/brick2/b2 10.70.46.93:/rhs/brick2/b2 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success
Check the removed bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. 
[root@georep1 scripts]# 

[root@georep1 scripts]# ps auxw | grep master | grep feedback | grep /rhs/brick
[root@georep1 scripts]# ps -eaf | grep gsync

Comment 10 errata-xmlrpc 2015-07-29 04:28:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.