Bug 1599587 - [geo-rep]: Worker still ACTIVE after killing bricks on brick-mux enabled setup
Summary: [geo-rep]: Worker still ACTIVE after killing bricks on brick-mux enabled setup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.5.0
Assignee: Mohit Agrawal
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks: 1483541 1600145 1601314 1696807
TreeView+ depends on / blocked
 
Reported: 2018-07-10 06:24 UTC by Rochelle
Modified: 2019-10-30 12:20 UTC (History)
11 users (show)

Fixed In Version: glusterfs-6.0-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1600145 (view as bug list)
Environment:
Last Closed: 2019-10-30 12:19:38 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 0 None None None 2019-10-30 12:20:03 UTC

Description Rochelle 2018-07-10 06:24:14 UTC
Description of problem:
=======================
The ACTIVE brick processes for a geo-replication session were killed but it remains ACTIVE even after going down.

Before the bricks were killed:
-----------------------------
[root@dhcp42-18 scripts]# gluster volume geo-replication master 10.70.43.116::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:09:32          
10.70.42.18     master        /rhs/brick2/b4    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:06:17          
10.70.42.18     master        /rhs/brick3/b7    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:06:17          
10.70.41.239    master        /rhs/brick1/b2    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick2/b5    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick3/b8    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick1/b3    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick2/b6    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick3/b9    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
[root@dhcp42-18 scripts]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.41.239:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       28814
Brick 10.70.43.179:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       27173
Brick dhcp42-18.lab.eng.blr.redhat.com:/var
/lib/glusterd/ss_brick                      49152     0          Y       9969 
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892
 
Task Status of Volume gluster_shared_storage
-----------------------------------------------------------------------------



After the bricks were killed using gf_attach:
---------------------------------------------
[root@dhcp42-18 scripts]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.41.239:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       28814
Brick 10.70.43.179:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       27173
Brick dhcp42-18.lab.eng.blr.redhat.com:/var
/lib/glusterd/ss_brick                      49152     0          Y       9969 
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: master
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.18:/rhs/brick1/b1            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick1/b2           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick1/b3           49152     0          Y       27173
Brick 10.70.42.18:/rhs/brick2/b4            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick2/b5           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick2/b6           49152     0          Y       27173
Brick 10.70.42.18:/rhs/brick3/b7            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick3/b8           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick3/b9           49152     0          Y       27173
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892
 
Task Status of Volume master
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp42-18 scripts]# gluster volume geo-replication master 10.70.43.116::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:11:33          
10.70.42.18     master        /rhs/brick2/b4    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:12:02          
10.70.42.18     master        /rhs/brick3/b7    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:12:18          
10.70.41.239    master        /rhs/brick1/b2    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick2/b5    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick3/b8    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick1/b3    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick2/b6    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick3/b9    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          



Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp42-18 scripts]# rpm -qa | grep gluster
glusterfs-rdma-3.12.2-13.el7rhgs.x86_64
glusterfs-server-3.12.2-13.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-client-xlators-3.12.2-13.el7rhgs.x86_64
glusterfs-cli-3.12.2-13.el7rhgs.x86_64
python2-gluster-3.12.2-13.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64
glusterfs-3.12.2-13.el7rhgs.x86_64
glusterfs-api-3.12.2-13.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-libs-3.12.2-13.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
glusterfs-fuse-3.12.2-13.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-13.el7rhgs.x86_64
glusterfs-events-3.12.2-13.el7rhgs.x86_64


How reproducible:
=================
2/2


Steps to Reproduce:
1.Create a geo-replication session (3x3 master and slave volume)
2.Mount the master and slave volume
3.Create files on the master
4.kill brick using gf_attach 

Actual results:
===============
The workers still remain ACTIVE


Expected results:
================
The 3 ACTIVE workers should go to FAULTY and 3 PASSIVE workers should become ACTIVE and do the syncing

Comment 32 errata-xmlrpc 2019-10-30 12:19:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.