1600145 – [geo-rep]: Worker still ACTIVE after killing bricks

Bug 1600145 - [geo-rep]: Worker still ACTIVE after killing bricks

Summary: [geo-rep]: Worker still ACTIVE after killing bricks

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Mohit Agrawal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1599587
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-11 14:09 UTC by Mohit Agrawal
Modified:	2019-03-25 16:30 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-6.0
Clone Of:	1599587
Environment:
Last Closed:	2019-03-25 16:30:27 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	20645	0	None	Merged	[geo-rep]: Worker still ACTIVE after killing bricks	2018-12-13 04:47:43 UTC

Comment 1 Worker Ant 2018-07-11 14:33:37 UTC

REVIEW: https://review.gluster.org/20494 ([geo-rep]: Worker still ACTIVE after killing bricks) posted (#1) for review on master by MOHIT AGRAWAL

Comment 2 Kotresh HR 2018-07-13 12:56:25 UTC

Description of problem:
=======================
The ACTIVE brick processes for a geo-replication session were killed but it remains ACTIVE even after going down.

Before the bricks were killed:
-----------------------------
[root@dhcp42-18 scripts]# gluster volume geo-replication master 10.70.43.116::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:09:32          
10.70.42.18     master        /rhs/brick2/b4    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:06:17          
10.70.42.18     master        /rhs/brick3/b7    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:06:17          
10.70.41.239    master        /rhs/brick1/b2    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick2/b5    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick3/b8    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick1/b3    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick2/b6    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick3/b9    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
[root@dhcp42-18 scripts]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.41.239:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       28814
Brick 10.70.43.179:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       27173
Brick dhcp42-18.lab.eng.blr.redhat.com:/var
/lib/glusterd/ss_brick                      49152     0          Y       9969 
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892
 
Task Status of Volume gluster_shared_storage
-----------------------------------------------------------------------------



After the bricks were killed using gf_attach:
---------------------------------------------
[root@dhcp42-18 scripts]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.41.239:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       28814
Brick 10.70.43.179:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       27173
Brick dhcp42-18.lab.eng.blr.redhat.com:/var
/lib/glusterd/ss_brick                      49152     0          Y       9969 
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: master
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.18:/rhs/brick1/b1            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick1/b2           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick1/b3           49152     0          Y       27173
Brick 10.70.42.18:/rhs/brick2/b4            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick2/b5           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick2/b6           49152     0          Y       27173
Brick 10.70.42.18:/rhs/brick3/b7            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick3/b8           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick3/b9           49152     0          Y       27173
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892
 
Task Status of Volume master
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp42-18 scripts]# gluster volume geo-replication master 10.70.43.116::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:11:33          
10.70.42.18     master        /rhs/brick2/b4    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:12:02          
10.70.42.18     master        /rhs/brick3/b7    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10 01:12:18          
10.70.41.239    master        /rhs/brick1/b2    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick2/b5    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick3/b8    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick1/b3    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick2/b6    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick3/b9    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          



Version-Release number of selected component (if applicable):
=============================================================
mainline

How reproducible:
=================
2/2


Steps to Reproduce:
1.Create a geo-replication session (3x3 master and slave volume)
2.Mount the master and slave volume
3.Create files on the master
4.kill brick using gf_attach 

Actual results:
===============
The workers still remain ACTIVE


Expected results:
================
The 3 ACTIVE workers should go to FAULTY and 3 PASSIVE workers should become ACTIVE and do the syncing

Comment 3 Worker Ant 2018-09-05 02:18:45 UTC

REVIEW: https://review.gluster.org/21078 ([geo-rep]: Worker still ACTIVE after killing bricks) posted (#1) for review on master by MOHIT AGRAWAL

Comment 4 Worker Ant 2018-09-05 02:22:25 UTC

REVIEW: https://review.gluster.org/21079 ([geo-rep]: Worker still ACTIVE after killing bricks) posted (#1) for review on master by MOHIT AGRAWAL

Comment 5 Worker Ant 2018-09-05 02:29:26 UTC

REVIEW: https://review.gluster.org/20645 ([geo-rep]: Worker still ACTIVE after killing bricks) posted (#20) for review on master by MOHIT AGRAWAL

Comment 6 Worker Ant 2018-12-13 04:47:42 UTC

REVIEW: https://review.gluster.org/20645 ([geo-rep]: Worker still ACTIVE after killing bricks) posted (#35) for review on master by Amar Tumballi

Comment 7 Shyamsundar 2019-03-25 16:30:27 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.