+++ This bug was initially created as a clone of Bug #1535281 +++ Description of problem: With brick multiplexing on, when volume creation and deletion was run continuously for ~12 hours, glusterfsd process on each of the three nodes consumes close to 14gb of memory with a single volume in the system. This is quite high. Please note that throughout the test, heketidb volume is not deleted and hence the same brick process remain throughout the test. Version-Release number of selected component (if applicable): sh-4.2# rpm -qa | grep 'gluster' glusterfs-libs-3.8.4-54.el7rhgs.x86_64 glusterfs-3.8.4-54.el7rhgs.x86_64 glusterfs-api-3.8.4-54.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.el7rhgs.x86_64 glusterfs-server-3.8.4-54.el7rhgs.x86_64 gluster-block-0.2.1-14.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. on a CNS setup, run the following script for 12 hours. while true; do for i in {1..5}; do heketi-cli volume create --size=1; done; heketi-cli volume list | awk '{print $1}' | cut -c 4- >> vollist; while read i; do heketi-cli volume delete $i; sleep 2; done<vollist; rm vollist; done Actual results: glusterfsd process consumes ~14 gb with 1 volume Expected results: typically, glusterfsd would consume < 1gb for a volume Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2018-01-16 22:20:26 EST --- This bug is automatically being proposed for the release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from krishnaram Karthick on 2018-01-16 23:15:35 EST --- logs are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1535281/ --- Additional comment from Atin Mukherjee on 2018-01-16 23:28:54 EST --- A probable RCA: When a brick instance is detached from a brick process, the individual xlators and their respective memory allocations should be freed. Even though some of the xlators do have their destructor functions ( fini () ) in place they are not invoked AFAIK. So I believe that's when even though a volume is deleted which eventually detaches its respective brick instance(s) from the existing brick process all the xlators and their respective allocated memory are not freed up and when we end up doing this many number of detach operations, the leaks look to be quite significant. The effort to make every fini () handlers of the respective xlators to work properly might be quite significant and we'd definitely need to assess it as with brick multiplexing the impact is quite severe here. I've assigned this bug to Mohit to begin with estimating the effort required here. I believe there'd be lot of collaboration and effort required from the owner of the individual xlators. --- Additional comment from Prasanth on 2018-01-17 17:47:54 EST --- Karthick, please have a clone of this BZ created against CNS for tracking purpose and propose it for the next immediate release. --- Additional comment from krishnaram Karthick on 2018-01-18 01:09:11 EST --- (In reply to Prasanth from comment #4) > Karthick, please have a clone of this BZ created against CNS for tracking > purpose and propose it for the next immediate release. Done. --- Additional comment from nchilaka on 2018-01-19 05:08:51 EST --- added this case(as mentioned in description) to rhgs brickmux (non containerized) test plan --- Additional comment from Red Hat Bugzilla Rules Engine on 2018-01-24 00:09:13 EST --- This bug is automatically being provided 'pm_ack+' for the release flag 'rhgs‑3.4.0', having been appropriately marked for the release, and having been provided ACK from Development and QE --- Additional comment from Red Hat Bugzilla Rules Engine on 2018-01-24 07:26:33 EST --- Since this bug has has been approved for the RHGS 3.4.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.4.0+', and through the Internal Whiteboard entry of '3.4.0', the Target Release is being automatically set to 'RHGS 3.4.0'
REVIEW: https://review.gluster.org/19537 (glusterfsd: Memleak in glusterfsd process while brick mux is on) posted (#1) for review on master by MOHIT AGRAWAL
COMMIT: https://review.gluster.org/19537 committed in master by "Jeff Darcy" <jeff.us> with a commit message- glusterfsd: Memleak in glusterfsd process while brick mux is on Problem: At the time of stopping the volume while brick multiplex is enabled memory is not cleanup from all server side xlators. Solution: To cleanup memory for all server side xlators call fini in glusterfs_handle_terminate after send GF_EVENT_CLEANUP notification to top xlator. BUG: 1544090 Change-Id: Ifa1525e25b697371276158705026b421b4f81140 Signed-off-by: Mohit Agrawal <moagrawa>
REVIEW: https://review.gluster.org/19580 (Revert \"glusterfsd: Memleak in glusterfsd process while brick mux is on\") posted (#1) for review on master by MOHIT AGRAWAL
COMMIT: https://review.gluster.org/19580 committed in master by "Amar Tumballi" <amarts> with a commit message- Revert "glusterfsd: Memleak in glusterfsd process while brick mux is on" There are still remain some code paths where cleanup is required while brick mux is on.I will upload a new patch after resolve all code paths. This reverts commit b313d97faa766443a7f8128b6e19f3d2f1b267dd. BUG: 1544090 Change-Id: I26ef1d29061092bd9a409c8933d5488e968ed90e Signed-off-by: Mohit Agrawal <moagrawa>
REVIEW: https://review.gluster.org/19616 (glusterfsd: Memleak in glusterfsd process while brick mux is on) posted (#1) for review on master by MOHIT AGRAWAL
COMMIT: https://review.gluster.org/19616 committed in master by "Amar Tumballi" <amarts> with a commit message- glusterfsd: Memleak in glusterfsd process while brick mux is on Problem: At the time of stopping the volume while brick multiplex is enabled memory is not cleanup from all server side xlators. Solution: To cleanup memory for all server side xlators call fini in glusterfs_handle_terminate after send GF_EVENT_CLEANUP notification to top xlator. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa> Note: Run all test-cases in separate build (https://review.gluster.org/19574) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ia10dc7f2605aa50f2b90b3fe4eb380ba9299e2fc
REVIEW: https://review.gluster.org/19734 (gluster: Sometimes Brick process is crashed at the time of stopping brick) posted (#1) for review on master by MOHIT AGRAWAL
REVIEW: https://review.gluster.org/19734 (gluster: Sometimes Brick process is crashed at the time of stopping brick) posted (#5) for review on master by MOHIT AGRAWAL
COMMIT: https://review.gluster.org/19734 committed in master by "Raghavendra G" <rgowdapp> with a commit message- gluster: Sometimes Brick process is crashed at the time of stopping brick Problem: Sometimes brick process is getting crashed at the time of stop brick while brick mux is enabled. Solution: Brick process was getting crashed because of rpc connection was not cleaning properly while brick mux is enabled.In this patch after sending GF_EVENT_CLEANUP notification to xlator(server) waits for all rpc client connection destroy for specific xlator.Once rpc connections are destroyed in server_rpc_notify for all associated client for that brick then call xlator_mem_cleanup for for brick xlator as well as all child xlators.To avoid races at the time of cleanup introduce two new flags at each xlator cleanup_starting, call_cleanup. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa> Note: Run all test-cases in separate build (https://review.gluster.org/#/c/19700/) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ic4ab9c128df282d146cf1135640281fcb31997bf updates: bz#1544090
REVIEW: https://review.gluster.org/19910 (gluster: Brick process can be crash at the time of call xlator cbks) posted (#1) for review on master by MOHIT AGRAWAL
REVIEW: https://review.gluster.org/19912 (glusterd: build is failed for glusterd2) posted (#1) for review on master by MOHIT AGRAWAL
COMMIT: https://review.gluster.org/19912 committed in master by "MOHIT AGRAWAL" <moagrawa> with a commit message- server: fix unresolved symbols by moving them to libglusterfs Problem: glusterd2 build is failed due to undefined symbol (xlator_mem_cleanup , glusterfsd_ctx) in server.so Solution: To resolve the same done below two changes 1) Move xlator_mem_cleanup code from glusterfsd-mgmt.c to xlator.c to be part of libglusterfs.so 2) replace glusterfsd_ctx to this->ctx because symbol glusterfsd_ctx is not part of server.so BUG: 1544090 Change-Id: Ie5e6fba9ed458931d08eb0948d450aa962424ae5 fixes: bz#1544090 Signed-off-by: Mohit Agrawal <moagrawa>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report. glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html [2] https://www.gluster.org/pipermail/gluster-users/