Bug 1811631

Summary: brick crashed when creating and deleting volumes over time (with brick mux enabled only)
Product: [Community] GlusterFS Reporter: Mohit Agrawal <moagrawa>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, moagrawa, nchilaka, pasik, rhinduja, rhs-bugs, storage-qa-internal, ykaul
Target Milestone: ---Keywords: Reopened, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1790336 Environment:
Last Closed: 2020-03-20 04:31:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1790336    
Bug Blocks: 1656682    

Comment 1 Mohit Agrawal 2020-03-09 11:53:20 UTC
In brick_mux environment, while volumes are created/stopped in a loop after running a long time the main brick is crashed.
Brick is crashed because the main brick process was not cleaned up memory for all objects at the time of detaching a volume.
Below are the objects that are missed at the time of detaching a volume
1) xlator object for a brick graph
2) local_pool for posix_lock xlator
3) rpc object cleanup at quota xlator
4) inode leak at brick xlator

To avoid the leak needs to clean up all objects at the time of detaching a brick.

Reproducer:
1.created 3 node setup, brickmux enabled
2.created 2 volumes, one called base_x3 of type x3, and another of type arbiter named "basevol_bitter"
the above 2 volumes will not be deleted at all throughout the test
3.now started creating volumes, starting them in batches of 100, and then delete them
4. the above step#3 was to go on for a few days

Comment 2 Worker Ant 2020-03-12 12:58:06 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/977, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 3 Worker Ant 2020-03-20 04:10:41 UTC
REVIEW: https://review.gluster.org/24221 (Posix: Use simple approach to close fd) posted (#10) for review on master by MOHIT AGRAWAL

Comment 4 Worker Ant 2020-03-20 04:31:56 UTC
REVIEW: https://review.gluster.org/24221 (Posix: Use simple approach to close fd) merged (#10) on master by MOHIT AGRAWAL