1549996 – [BMux] : Stale brick processes on the nodes after vol deletion.

Bug 1549996 - [BMux] : Stale brick processes on the nodes after vol deletion.

Summary: [BMux] : Stale brick processes on the nodes after vol deletion.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Mohit Agrawal
QA Contact:
Docs Contact:
URL:
Whiteboard:	brick-multiplexing
Depends On:	1548829
Blocks:	1549023
TreeView+	depends on / blocked

Reported:	2018-02-28 09:15 UTC by Mohit Agrawal
Modified:	2018-10-24 12:21 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-5.0
Clone Of:	1548829
Environment:
Last Closed:	2018-10-24 12:21:12 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Mohit Agrawal 2018-02-28 09:15:22 UTC

+++ This bug was initially created as a clone of Bug #1548829 +++

Description of problem:
------------------------

Create a multi-brick EC volume.

Start it.

Delete it.

Check for running glusterfsd on all the nodes in the Trusted Storage Pool.

Nodes will still be having some stale glusterfsd processes  post vol stop/delete, which should ideally be removed when a volume is deleted.


Example :

I created two volumes - drogon and drogon2. And deleted them.

<snip>

[root@gqas007 ~]# gluster v list
No volumes present in cluster
[root@gqas007 ~]# 

[root@gqas007 ~]# 
[root@gqas007 ~]# ps -ef|grep fsd
root     21148     1  0 05:43 ?        00:00:01 /usr/sbin/glusterfsd -s gqas007 --volfile-id drogon.gqas007.bricks1-A1 -p /var/run/gluster/vols/drogon/gqas007-bricks1-A1.pid -S /var/run/gluster/e5102c357100a19ec60edec10a566e61.socket --brick-name /bricks1/A1 -l /var/log/glusterfs/bricks/bricks1-A1.log --xlator-option *-posix.glusterd-uuid=e72fdebf-3130-4d05-8cf5-966f4c4926c4 --brick-port 49152 --xlator-option drogon-server.listen-port=49152
root     21639     1  0 05:55 ?        00:00:01 /usr/sbin/glusterfsd -s gqas007 --volfile-id drogon2.gqas007.bricks1-A1 -p /var/run/gluster/vols/drogon2/gqas007-bricks1-A1.pid -S /var/run/gluster/2bd4d8669b0cdb67f9e15f99776d1e36.socket --brick-name /bricks1/A1 -l /var/log/glusterfs/bricks/bricks1-A1.log --xlator-option *-posix.glusterd-uuid=e72fdebf-3130-4d05-8cf5-966f4c4926c4 --brick-port 49153 --xlator-option drogon2-server.listen-port=49153
root     22042 19908  0 06:00 pts/0    00:00:00 grep --color=auto fsd
[root@gqas007 ~]# 

</snip>


I suspect multiplexed bricks to be the problem , and hence raising it against glusterd.Feel free to change the component if that's not the case.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------

glusterfs-3.12.2-4.el7rhgs.x86_64

How reproducible:
-----------------

2/2 , same machines.

Steps to Reproduce:
--------------------

As in description.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-02-25 06:03:46 EST ---

This bug is automatically being proposed for the release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Ambarish on 2018-02-25 06:08:56 EST ---

For some reason I see brick not found errors in brick logs :

[2018-02-25 10:56:40.535257] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks1/A1 - not found
[2018-02-25 10:56:40.535358] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks2/A1 - not found
[2018-02-25 10:56:40.535487] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks3/A1 - not found
[2018-02-25 10:56:40.535595] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks4/A1 - not found
[2018-02-25 10:56:40.536902] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks5/A1 - not found
[2018-02-25 10:56:40.538366] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks6/A1 - not found
[2018-02-25 10:56:40.538566] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks7/A1 - not found
[2018-02-25 10:56:40.538807] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks8/A1 - not found
[2018-02-25 10:56:40.539040] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks9/A1 - not found
[2018-02-25 10:56:40.539261] E [glusterfsd-mgmt.c:232:glusterfs_handle_terminate] 0-glusterfs: can't terminate /bricks10/A1 - not found

--- Additional comment from Ambarish on 2018-02-25 06:13:04 EST ---

sosreports : http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1548829/

--- Additional comment from Ambarish on 2018-02-26 02:16:55 EST ---

I could not reproduce on 3.8.4-54.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-02-26 02:17:02 EST ---

This bug report has Keywords: Regression or TestBlocker.

Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release.

Please resolve ASAP.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-02-28 03:20:19 EST ---

This bug is automatically being provided 'pm_ack+' for the release flag 'rhgs‑3.4.0', having been appropriately marked for the release, and having been provided ACK from Development and QE

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-02-28 04:14:31 EST ---

Since this bug has has been approved for the RHGS 3.4.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.4.0+', and through the Internal Whiteboard entry of '3.4.0', the Target Release is being automatically set to 'RHGS 3.4.0'

Comment 1 Amar Tumballi 2018-10-24 12:21:12 UTC

https://review.gluster.org/#/c/19734/

Note You need to log in before you can comment on or make changes to this bug.