1811631 – brick crashed when creating and deleting volumes over time (with brick mux enabled only)

Bug 1811631 - brick crashed when creating and deleting volumes over time (with brick mux enabled only)

Summary: brick crashed when creating and deleting volumes over time (with brick mux en...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Mohit Agrawal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1790336
Blocks:	1656682
TreeView+	depends on / blocked

Reported:	2020-03-09 11:41 UTC by Mohit Agrawal
Modified:	2020-03-20 04:31 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:	1790336
Environment:
Last Closed:	2020-03-20 04:31:56 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	24221	0	None	Merged	Posix: Use simple approach to close fd	2020-03-20 04:31:55 UTC

Comment 1 Mohit Agrawal 2020-03-09 11:53:20 UTC

In brick_mux environment, while volumes are created/stopped in a loop after running a long time the main brick is crashed.
Brick is crashed because the main brick process was not cleaned up memory for all objects at the time of detaching a volume.
Below are the objects that are missed at the time of detaching a volume
1) xlator object for a brick graph
2) local_pool for posix_lock xlator
3) rpc object cleanup at quota xlator
4) inode leak at brick xlator

To avoid the leak needs to clean up all objects at the time of detaching a brick.

Reproducer:
1.created 3 node setup, brickmux enabled
2.created 2 volumes, one called base_x3 of type x3, and another of type arbiter named "basevol_bitter"
the above 2 volumes will not be deleted at all throughout the test
3.now started creating volumes, starting them in batches of 100, and then delete them
4. the above step#3 was to go on for a few days

Comment 2 Worker Ant 2020-03-12 12:58:06 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/977, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 3 Worker Ant 2020-03-20 04:10:41 UTC

REVIEW: https://review.gluster.org/24221 (Posix: Use simple approach to close fd) posted (#10) for review on master by MOHIT AGRAWAL

Comment 4 Worker Ant 2020-03-20 04:31:56 UTC

REVIEW: https://review.gluster.org/24221 (Posix: Use simple approach to close fd) merged (#10) on master by MOHIT AGRAWAL

Note You need to log in before you can comment on or make changes to this bug.