Description of problem: ======================= The ACTIVE brick processes for a geo-replication session were killed but it remains ACTIVE even after going down. Before the bricks were killed: ----------------------------- [root@dhcp42-18 scripts]# gluster volume geo-replication master 10.70.43.116::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.18 master /rhs/brick1/b1 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-10 01:09:32 10.70.42.18 master /rhs/brick2/b4 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-10 01:06:17 10.70.42.18 master /rhs/brick3/b7 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-10 01:06:17 10.70.41.239 master /rhs/brick1/b2 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick2/b5 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick3/b8 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.43.179 master /rhs/brick1/b3 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick2/b6 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick3/b9 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A [root@dhcp42-18 scripts]# gluster v status Status of volume: gluster_shared_storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.41.239:/var/lib/glusterd/ss_bri ck 49152 0 Y 28814 Brick 10.70.43.179:/var/lib/glusterd/ss_bri ck 49152 0 Y 27173 Brick dhcp42-18.lab.eng.blr.redhat.com:/var /lib/glusterd/ss_brick 49152 0 Y 9969 Self-heal Daemon on localhost N/A N/A Y 10879 Self-heal Daemon on 10.70.41.239 N/A N/A Y 29525 Self-heal Daemon on 10.70.43.179 N/A N/A Y 27892 Task Status of Volume gluster_shared_storage ----------------------------------------------------------------------------- After the bricks were killed using gf_attach: --------------------------------------------- [root@dhcp42-18 scripts]# gluster v status Status of volume: gluster_shared_storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.41.239:/var/lib/glusterd/ss_bri ck 49152 0 Y 28814 Brick 10.70.43.179:/var/lib/glusterd/ss_bri ck 49152 0 Y 27173 Brick dhcp42-18.lab.eng.blr.redhat.com:/var /lib/glusterd/ss_brick 49152 0 Y 9969 Self-heal Daemon on localhost N/A N/A Y 10879 Self-heal Daemon on 10.70.41.239 N/A N/A Y 29525 Self-heal Daemon on 10.70.43.179 N/A N/A Y 27892 Task Status of Volume gluster_shared_storage ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: master Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.18:/rhs/brick1/b1 N/A N/A N N/A Brick 10.70.41.239:/rhs/brick1/b2 49152 0 Y 28814 Brick 10.70.43.179:/rhs/brick1/b3 49152 0 Y 27173 Brick 10.70.42.18:/rhs/brick2/b4 N/A N/A N N/A Brick 10.70.41.239:/rhs/brick2/b5 49152 0 Y 28814 Brick 10.70.43.179:/rhs/brick2/b6 49152 0 Y 27173 Brick 10.70.42.18:/rhs/brick3/b7 N/A N/A N N/A Brick 10.70.41.239:/rhs/brick3/b8 49152 0 Y 28814 Brick 10.70.43.179:/rhs/brick3/b9 49152 0 Y 27173 Self-heal Daemon on localhost N/A N/A Y 10879 Self-heal Daemon on 10.70.41.239 N/A N/A Y 29525 Self-heal Daemon on 10.70.43.179 N/A N/A Y 27892 Task Status of Volume master ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp42-18 scripts]# gluster volume geo-replication master 10.70.43.116::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.18 master /rhs/brick1/b1 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-10 01:11:33 10.70.42.18 master /rhs/brick2/b4 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-10 01:12:02 10.70.42.18 master /rhs/brick3/b7 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-10 01:12:18 10.70.41.239 master /rhs/brick1/b2 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick2/b5 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick3/b8 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.43.179 master /rhs/brick1/b3 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick2/b6 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick3/b9 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A Version-Release number of selected component (if applicable): ============================================================= [root@dhcp42-18 scripts]# rpm -qa | grep gluster glusterfs-rdma-3.12.2-13.el7rhgs.x86_64 glusterfs-server-3.12.2-13.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-client-xlators-3.12.2-13.el7rhgs.x86_64 glusterfs-cli-3.12.2-13.el7rhgs.x86_64 python2-gluster-3.12.2-13.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64 glusterfs-3.12.2-13.el7rhgs.x86_64 glusterfs-api-3.12.2-13.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 glusterfs-libs-3.12.2-13.el7rhgs.x86_64 vdsm-gluster-4.19.43-2.3.el7rhgs.noarch glusterfs-fuse-3.12.2-13.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-13.el7rhgs.x86_64 glusterfs-events-3.12.2-13.el7rhgs.x86_64 How reproducible: ================= 2/2 Steps to Reproduce: 1.Create a geo-replication session (3x3 master and slave volume) 2.Mount the master and slave volume 3.Create files on the master 4.kill brick using gf_attach Actual results: =============== The workers still remain ACTIVE Expected results: ================ The 3 ACTIVE workers should go to FAULTY and 3 PASSIVE workers should become ACTIVE and do the syncing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249