Description of problem: Dist-geo-rep : remove-brick commit(for brick(s) on master volume) should kill geo-rep worker process for the bricks getting removed. Version-Release number of selected component (if applicable): 3.4.0.33rhs-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. create and start geo rep session between master and slave volume. [root@old5 ~]# gluster volume geo remove_change status NODE MASTER SLAVE HEALTH UPTIME ----------------------------------------------------------------------------------------------------------------- old5.lab.eng.blr.redhat.com remove_change ssh://10.70.37.195::remove_chnage Stable 4 days 07:12:33 old6.lab.eng.blr.redhat.com remove_change ssh://10.70.37.195::remove_chnage Stable 4 days 23:52:43 2. start creating data on master volume from mount point [root@rhs-client22 ~]# mount | grep remove_change 10.70.35.179:/remove_change on /mnt/remove_change type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) 10.70.35.179:/remove_change on /mnt/remove_change_nfs type nfs (rw,addr=10.70.35.179) 3. remove brick(s) from master volume --> gluster volume remove-brick remove_change 10.70.35.179:/rhs/brick3/c3 10.70.35.235:/rhs/brick3/c3 start 4. once remove-brick is completed perform commit operation gluster volume remove-brick remove_change 10.70.35.179:/rhs/brick3/c3 10.70.35.235:/rhs/brick3/c3 status gluster volume remove-brick remove_change 10.70.35.179:/rhs/brick3/c3 10.70.35.235:/rhs/brick3/c3 commit [root@old5 ~]# gluster v info remove_change Volume Name: remove_change Type: Distributed-Replicate Volume ID: eb500199-37d4-4cb9-96ed-ae5bc1bf2498 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.35.179:/rhs/brick3/c1 Brick2: 10.70.35.235:/rhs/brick3/c1 Brick3: 10.70.35.179:/rhs/brick3/c2 Brick4: 10.70.35.235:/rhs/brick3/c2 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on 5. verify that geo rep process is running for that brick or not? [root@old5 ~]# ps auxw | grep remove_change | grep feedback | grep 'local-path /rhs/brick3/c3' root 24210 0.0 0.1 1037676 5456 ? Sl Sep12 2:45 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick3/c1 --path=/rhs/brick3/c2 --path=/rhs/brick3/c3 -c /var/lib/glusterd/geo-replication/remove_change_10.70.37.195_remove_chnage/gsyncd.conf :remove_change --glusterd-uuid=448e09ce-4626-40d0-b352-5ef26a46124f 10.70.37.195::remove_chnage -N -p --slave-id e6e125c3-4237-4c00-bc2a-679202df0505 --feedback-fd 8 --local-path /rhs/brick3/c3 --local-id .%2Frhs%2Fbrick3%2Fc3 --resource-remote ssh://root.37.98:gluster://localhost:remove_chnage log snippet:- 2013-09-17 10:11:28.143104] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:11:37.337871] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:12:27.765177] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:12:28.252641] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:12:37.467040] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:13:27.910901] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:13:28.317553] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:13:37.596528] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:14:28.7318] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:14:28.409152] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:14:37.719461] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:15:28.86055] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:15:28.572752] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:15:38.276532] I [master(/rhs/brick3/c2):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:16:28.238033] I [master(/rhs/brick3/c1):358:crawlwrap] _GMaster: 1 crawls, 0 turns [2013-09-17 10:16:28.841550] I [master(/rhs/brick3/c3):358:crawlwrap] _GMaster: 1 crawls, 0 turns Actual results: geo rep process is still running and scanning that brick for data change. Expected results: after commit operation that brick(s) is no longer part of volume. so no need of running geo rep process for that brick. Additional info:
verified with the build: glusterfs-3.7.0-2.el6rhs.x86_64 As mentioned in comment 3, the steps for remove bricks are changed. Commit is not allowed if the geo-rep session is active. Correct error message is shown. In order to perform a commit, one must need to stop the geo-rep session, which kills all the geo-rep process. Hence the original issue reported in this bug will not be seen. Moving the bug to verified state. [root@georep1 scripts]# gluster volume remove-brick master 10.70.46.96:/rhs/brick2/b2 10.70.46.97:/rhs/brick2/b2 10.70.46.93:/rhs/brick2/b2 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: failed: geo-replication sessions are active for the volume master. Stop geo-replication sessions involved in this volume. Use 'volume geo-replication status' command for more info. [root@georep1 scripts]# [root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave stop Stopping geo-replication session between master & 10.70.46.154::slave has been successful [root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- georep1 master /rhs/brick1/b1 root 10.70.46.154::slave N/A Stopped N/A N/A N/A N/A N/A N/A N/A N/A 2015-05-27 20:43:15 georep1 master /rhs/brick2/b2 root 10.70.46.154::slave N/A Stopped N/A N/A N/A N/A N/A N/A N/A N/A 2015-05-27 20:44:17 georep3 master /rhs/brick1/b1 root 10.70.46.154::slave N/A Stopped N/A N/A N/A N/A N/A N/A N/A N/A N/A georep3 master /rhs/brick2/b2 root 10.70.46.154::slave N/A Stopped N/A N/A N/A N/A N/A N/A N/A N/A N/A georep2 master /rhs/brick1/b1 root 10.70.46.154::slave N/A Stopped N/A N/A N/A N/A N/A N/A N/A N/A N/A georep2 master /rhs/brick2/b2 root 10.70.46.154::slave N/A Stopped N/A N/A N/A N/A N/A N/A N/A N/A N/A [root@georep1 scripts]# gluster volume remove-brick master 10.70.46.96:/rhs/brick2/b2 10.70.46.97:/rhs/brick2/b2 10.70.46.93:/rhs/brick2/b2 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success Check the removed bricks to ensure all files are migrated. If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. [root@georep1 scripts]# [root@georep1 scripts]# ps auxw | grep master | grep feedback | grep /rhs/brick [root@georep1 scripts]# ps -eaf | grep gsync
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html