Bug 1529451
Summary: | glusterd leaks memory when vol status is issued | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | |
Status: | CLOSED ERRATA | QA Contact: | Bala Konda Reddy M <bmekala> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.3 | CC: | amukherj, rhinduja, rhs-bugs, srmukher, storage-qa-internal, vbellur | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.4.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.12.2-7 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, executing gluster volume status command multiple times caused memory consumption of glusterd process to go up. With this fix, this problem has been addressed.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1550339 (view as bug list) | Environment: | ||
Last Closed: | 2018-09-04 06:40:20 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1503137 |
Description
Nag Pavan Chilakam
2017-12-28 07:28:01 UTC
sosreport along with 2 statedumps(with difference of one iteration of vol status) @http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1529451/ #############vols are now STARTED ;no brickmux; 300vols ########### Wed Dec 27 21:06:55 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.2G 2.2G 15M 1.2G 3.0G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 604876 125432 4236 S 35.0 1.6 11:10.54 glusterd ############# loop 1 ########### Wed Dec 27 21:06:55 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.2G 2.2G 15M 1.2G 3.0G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 604876 125432 4236 S 29.4 1.6 11:10.59 glusterd ############# loop 2 ########### Wed Dec 27 21:21:55 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.3G 1.9G 15M 1.5G 2.9G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 604876 156960 4236 S 50.0 2.0 21:57.55 glusterd ############# loop 3 ########### Wed Dec 27 21:36:56 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.4G 1.5G 15M 1.7G 2.8G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 670412 187476 4236 S 62.5 2.3 32:39.39 glusterd ############# loop 4 ########### Wed Dec 27 21:51:56 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.5G 1.2G 15M 1.9G 2.6G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 670412 225880 4236 S 73.7 2.8 43:15.54 glusterd ############# loop 5 ########### Wed Dec 27 22:06:56 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.6G 862M 15M 2.2G 2.5G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 735948 264340 4236 S 65.2 3.3 53:49.32 glusterd ############# loop 6 ########### Wed Dec 27 22:21:56 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.7G 521M 15M 2.4G 2.4G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 735948 301480 4236 S 71.4 3.8 64:18.65 glusterd ############# loop 7 ########### Wed Dec 27 22:36:57 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.8G 294M 15M 2.5G 2.4G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 801484 327488 4236 S 0.0 4.1 71:44.56 glusterd ############# loop 8 ########### Wed Dec 27 22:51:57 IST 2017 total used free shared buff/cache available Mem: 7.6G 4.8G 286M 15M 2.5G 2.4G Swap: 7.9G 0B 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 801484 327488 4236 S 0.0 4.1 71:44.60 glusterd ############# loop 59 ########### Thu Dec 28 11:37:11 IST 2017 total used free shared buff/cache available Mem: 7.6G 5.2G 227M 13M 2.2G 1.9G Swap: 7.9G 2.1M 7.9G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26560 root 20 0 801484 327488 4236 S 0.0 4.1 71:47.20 glusterd Statedump comparison in RHGS-3.4.0 bits: Statedump data set 2 vs set 1 where set 2 was collected a minute later than set 1. [mgmt/glusterd.management - usage-type gf_common_mt_gf_timer_t memusage] size=192 num_allocs=3 max_size=384 max_num_allocs=6 total_allocs=167077 vs [mgmt/glusterd.management - usage-type gf_common_mt_gf_timer_t memusage] size=256 num_allocs=4 max_size=384 max_num_allocs=6 total_allocs=431416 [mgmt/glusterd.management - usage-type gf_common_mt_asprintf memusage] size=2000 num_allocs=106 max_size=4614 max_num_allocs=231 total_allocs=5059477 vs [mgmt/glusterd.management - usage-type gf_common_mt_asprintf memusage] size=2003 num_allocs=107 max_size=4614 max_num_allocs=231 total_allocs=12971322 [mgmt/glusterd.management - usage-type gf_common_mt_strdup memusage] size=2468525 num_allocs=167048 max_size=2469156 max_num_allocs=167057 total_allocs=2616401 vs [mgmt/glusterd.management - usage-type gf_common_mt_strdup memusage] size=6333253 num_allocs=429924 max_size=6333379 max_num_allocs=429929 total_allocs=6686295 [mgmt/glusterd.management - usage-type gf_common_mt_socket_private_t memusage] size=70040 num_allocs=103 max_size=80240 max_num_allocs=118 total_allocs=7080 vs [mgmt/glusterd.management - usage-type gf_common_mt_socket_private_t memusage] size=70040 num_allocs=103 max_size=80240 max_num_allocs=118 total_allocs=15842 [mgmt/glusterd.management - usage-type gf_common_mt_char memusage] size=12570 num_allocs=543 max_size=40750 max_num_allocs=1807 total_allocs=24288723 vs [mgmt/glusterd.management - usage-type gf_common_mt_char memusage] size=12830 num_allocs=561 max_size=40750 max_num_allocs=1807 total_allocs=62658480 [mgmt/glusterd.management - usage-type gf_common_mt_rpcsvc_wrapper_t memusage] size=64 num_allocs=2 max_size=96 max_num_allocs=3 total_allocs=6979 vs [mgmt/glusterd.management - usage-type gf_common_mt_rpcsvc_wrapper_t memusage] size=64 num_allocs=2 max_size=96 max_num_allocs=3 total_allocs=15741 [mgmt/glusterd.management - usage-type gf_common_mt_rpc_trans_t memusage] size=131016 num_allocs=103 max_size=150096 max_num_allocs=118 total_allocs=7081 vs [mgmt/glusterd.management - usage-type gf_common_mt_rpc_trans_t memusage] size=131016 num_allocs=103 max_size=150096 max_num_allocs=118 total_allocs=15843 The below data structures are not there in 1st set of statedump report which makes me to believe that there are certain different code path which is hit post the 1st statedump is collected. [mgmt/glusterd.management - usage-type gf_common_mt_iobref memusage] size=64 num_allocs=1 max_size=192 max_num_allocs=3 total_allocs=7872201 [mgmt/glusterd.management - usage-type gf_common_mt_memdup memusage] size=24 num_allocs=3 max_size=700 max_num_allocs=71 total_allocs=45676311 [mgmt/glusterd.management - usage-type gf_common_mt_iobrefs memusage] size=128 num_allocs=1 max_size=384 max_num_allocs=3 total_allocs=7872201 [mgmt/glusterd.management - usage-type gf_common_mt_uuid_t memusage] size=64 num_allocs=4 max_size=64 max_num_allocs=4 total_allocs=4376562 [mgmt/glusterd.management - usage-type gf_common_mt_mgmt_v3_lock_obj_t memusage] size=16 num_allocs=1 max_size=16 max_num_allocs=1 total_allocs=429073 [mgmt/glusterd.management - usage-type gf_common_mt_txn_opinfo_obj_t memusage] size=112 num_allocs=1 max_size=112 max_num_allocs=1 total_allocs=443344 [mgmt/glusterd.management - usage-type gf_common_mt_synctask memusage] size=5264 num_allocs=1 max_size=10528 max_num_allocs=2 total_allocs=443379 [mgmt/glusterd.management - usage-type gf_common_mt_syncstack memusage] size=2097152 num_allocs=1 max_size=4194304 max_num_allocs=2 total_allocs=443379 [mgmt/glusterd.management - usage-type gf_common_mt_mgmt_v3_lock_timer_t memusage] size=16 num_allocs=1 max_size=16 max_num_allocs=1 total_allocs=429073 Glusterd memory is increasing, when issued vol status concurrently on all the nodes in the trusted storage pool of 3 nodes. 1. Created 30 1*3 replica volumes on a 3 node cluster 2. Performed vol status on all three nodes in a loop for 1 hour 3. glusterd memory shooted upto 500 MB in 1 hour Before starting vol status in loop PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20499 root 20 0 672228 27032 4392 S 0.0 0.3 0:32.34 glusterd After 1 hour PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20499 root 20 0 1139176 528012 4392 S 64.7 6.6 63:21.12 glusterd Statedumps before and after the vol status http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/bmekala/bug.1529451/ As mer my debugging I found that, gf_common_mt_strdup structure is consuming more moemory. Num_alloc are keep on growing for every status call for a volume. In simple mathematical terms if we have 10 volumes and volume status is being executed 10 times, then num_alloc for gf_common_mt_strdup is gettinng increased by 10 * 10 = 100. In locking phase a timer is being introduced, to take care of stale lock issue, which has caused this leak. Debugging for other structure is under progress. (In reply to Gaurav Yadav from comment #5) > As mer my debugging I found that, gf_common_mt_strdup structure is consuming > more moemory. Num_alloc are keep on growing for every status call for a > volume. > > In simple mathematical terms if we have 10 volumes and volume status is > being executed 10 times, then num_alloc for gf_common_mt_strdup is gettinng > increased by 10 * 10 = 100. > > In locking phase a timer is being introduced, to take care of stale lock > issue, which has caused this leak. As this leak is in locking phase hence it will be there for other commands as well like create vol, volume set etc. > Debugging for other structure is under progress. Build: 3.12.2-8 On a three node setup with brick-mux enabled, with 30 volumes 1 X 3. Executed volume status more than 30,000 iterations the glusterd memory increased from 20MB to 140 MB approx. ###############0 iteration ############# total used free shared buff/cache available Mem: 7.6G 495M 6.9G 9.0M 227M 6.9G Swap: 2.0G 0B 2.0G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1634 root 20 0 670584 20232 4336 S 0.0 0.3 0:03.03 glusterd 3352 root 20 0 5430220 194448 5256 S 0.0 2.4 0:06.22 glusterfsd ###############32008 iteration ############# total used free shared buff/cache available Mem: 7.6G 798M 4.8G 9.0M 2.1G 6.5G Swap: 2.0G 0B 2.0G PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1634 root 20 0 736120 142612 4388 S 0.0 1.8 69:54.29 glusterd 3352 root 20 0 6217036 365096 5256 S 0.0 4.6 2:36.96 glusterfsd As there isn't much increase of glusterd memory and this is acceptable. Hence marking it as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |