Description of problem: Was executing non-functional test "RHG3-13273 - Memory Leak Testing: check for memory leaks when all but one volume are created and deleted in loop" The idea is to check if there is a memory leak with volume creates and deletes, without disturbing the first volume created{so as to avoid stopping the glusterfsd, as atleast one volume is always linked to it} The test involves, creating 100 volumes of either arbiter/x3 type and then delete them. The above is executed in many iterations However, after about 4 days, glusterfsd crashed with below details tail of glusterfsd log: ======================= [2020-01-13 06:01:12.384086] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-basevol_x3-server: Shutting down connection CTX_ID:0f899e45-573a-4983-be30-3b6c9a583a02-GRAPH_ID:0-PID:27959-HOST:rhsqa12.lab.eng.blr.redhat.com-PC_NAME:arb_4yk1nis18o9rk-client-2-RECON_NO:-0 [2020-01-13 06:01:12.384210] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-basevol_x3-server: Shutting down connection CTX_ID:0f899e45-573a-4983-be30-3b6c9a583a02-GRAPH_ID:0-PID:27959-HOST:rhsqa12.lab.eng.blr.redhat.com-PC_NAME:arb_aj5xr35kgw9ju-client-2-RECON_NO:-0 [2020-01-13 06:01:12.384398] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-basevol_x3-server: Shutting down connection CTX_ID:0f899e45-573a-4983-be30-3b6c9a583a02-GRAPH_ID:0-PID:27959-HOST:rhsqa12.lab.eng.blr.redhat.com-PC_NAME:arb_bdxziwl7awxly-client-1-RECON_NO:-0 [2020-01-13 06:01:14.649922] E [posix-helpers.c:2301:posix_disk_space_check] (-->/lib64/libpthread.so.0(+0x7ea5) [0x7f4a1d347ea5] -->/usr/lib64/glusterfs/6.0/xlator/storage/posix.so(+0xc3c4) [0x7f4a0fbc73c4] -->/usr/lib64/glusterfs/6.0/xlator/storage/posix.so(+0xc269) [0x7f4a0fbc7269] ) 0-rep_rviudtsghmn2j-posix: invalid argument: priv [Invalid argument] pending frames: patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2020-01-13 06:01:14 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.0 /lib64/libglusterfs.so.0(+0x271e0)[0x7f4a1e5091e0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f4a1e513c04] /lib64/libc.so.6(+0x363f0)[0x7f4a1cb453f0] /usr/lib64/glusterfs/6.0/xlator/storage/posix.so(+0xc3c4)[0x7f4a0fbc73c4] /lib64/libpthread.so.0(+0x7ea5)[0x7f4a1d347ea5] /lib64/libc.so.6(clone+0x6d)[0x7f4a1cc0d8cd] --------- gdb /core.5248 Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. Missing separate debuginfo for Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/b5/a5458535a1397fa6baaf5e8c13a6395426a1b2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s rhsqa10.lab.eng.blr.redhat.com --volfile-id basevol_x3.'. Program terminated with signal 11, Segmentation fault. #0 posix_disk_space_check_thread_proc (data=0x7f0440edbb20) at posix-helpers.c:2358 2358 if (!priv->disk_space_check_active) Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libacl-2.2.51-14.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-61.el7_7.1.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 sqlite-3.7.17-8.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) (gdb) bt #0 posix_disk_space_check_thread_proc (data=0x7f0440edbb20) at posix-helpers.c:2358 #1 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #2 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 (gdb) #0 posix_disk_space_check_thread_proc (data=0x7f0440edbb20) at posix-helpers.c:2358 #1 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #2 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 (gdb) #0 posix_disk_space_check_thread_proc (data=0x7f0440edbb20) at posix-helpers.c:2358 #1 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #2 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 (gdb) t a a bt Thread 1071 (Thread 0x7f051ffff700 (LWP 44662)): #0 0x00007f4a1d34ba35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4a0fbc760b in posix_fsyncer_pick (this=this@entry=0x7f060fd2bbe0, head=head@entry=0x7f051fffe140) at posix-helpers.c:2417 #2 0x00007f4a0fbc7895 in posix_fsyncer (d=0x7f060fd2bbe0) at posix-helpers.c:2507 #3 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 Thread 1070 (Thread 0x7f4a072fb700 (LWP 5259)): #0 0x00007f4a1d34ba35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4a0ea73df3 in changelog_ev_connector (data=0x7f4a08073678) at changelog-ev-handle.c:221 #2 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 Thread 1069 (Thread 0x7f01c4e91700 (LWP 50570)): #0 0x00007f4a1d34ba35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4a0ea73df3 in changelog_ev_connector (data=0x7f081260ec68) at changelog-ev-handle.c:221 #2 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 Thread 1068 (Thread 0x7f01c7e97700 (LWP 50349)): #0 0x00007f4a1d34ba35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4a0fbc760b in posix_fsyncer_pick (this=this@entry=0x7f03267fd250, head=head@entry=0x7f01c7e96140) at posix-helpers.c:2417 ---Type <return> to continue, or q <return> to quit--- #2 0x00007f4a0fbc7895 in posix_fsyncer (d=0x7f03267fd250) at posix-helpers.c:2507 #3 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 Thread 1067 (Thread 0x7f01c0688700 (LWP 50576)): #0 0x00007f4a1d34ba35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4a0fbc760b in posix_fsyncer_pick (this=this@entry=0x7f0490344540, head=head@entry=0x7f01c0687140) at posix-helpers.c:2417 #2 0x00007f4a0fbc7895 in posix_fsyncer (d=0x7f0490344540) at posix-helpers.c:2507 #3 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 Thread 1066 (Thread 0x7f0199ffb700 (LWP 51193)): #0 0x00007f4a1d34ba35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4a0fbc760b in posix_fsyncer_pick (this=this@entry=0x7f099b003070, head=head@entry=0x7f0199ffa140) at posix-helpers.c:2417 #2 0x00007f4a0fbc7895 in posix_fsyncer (d=0x7f099b003070) at posix-helpers.c:2507 #3 0x00007f4a1d347ea5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f4a1cc0d8cd in clone () from /lib64/libc.so.6 Thread 1065 (Thread 0x7f02edf93700 (LWP 46493)): #0 0x00007f4a1d34ba35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4a0fbc760b in posix_fsyncer_pick (this=this@entry=0x7f0796332880, head=head@entry=0x7f02edf92140) at posix-helpers.c:2417 #2 0x00007f4a0fbc7895 in posix_fsyncer (d=0x7f0796332880) at posix-helpers.c:2507 Version-Release number of selected component (if applicable): ================= 6.0.25 How reproducible: ============= given it is a non-functional test run over days, hit at first run as part of executing this test Steps to Reproduce: =============== 1.created 3 node setup, brickmux enabled 2.created 2 volumes, one called base_x3 of type x3, and another of type arbiter named "basevol_bitter" the above 2 volumes will not be deleted at all throughout the test 3.now started creating volumes, starting them in batches of 100, and then deleting them 4. the above step#3 was to go on for a few days Actual results: =============== brick crashed on n1, with a coredump file size of *256GB* Expected results: Additional info:
marking as regression as this test was passed in 3.5.0 Also, proposing as blocker. Will attach the logs, however note that the core file is of >256GB size
Can you please share the logs? Thanks, Mohit Agrawal
Hi, As per coredump glusterfsd was crashed due to not able to access priv object by posix_disk_space_thread p data $1 = (void *) 0x7f0440edbb20 (gdb) p (xlator_t*) data $2 = (xlator_t *) 0x7f0440edbb20 (gdb) p *(xlator_t*) data $3 = {name = 0x7f0cf75b1b00 "rep_rviudtsghmn2j-posix", type = 0x7f09123c7a70 "storage/posix", instance_name = 0x0, next = 0x0, prev = 0x7f091266b410, parents = 0x7f0913b82f60, children = 0x0, options = 0x7f09123c7948, dlhandle = 0x7f4a08008f10, fops = 0x7f4a0fe050e0 <fops>, cbks = 0x7f4a0fe05080 <cbks>, dumpops = 0x7f4a0fe05480 <dumpops>, volume_options = {next = 0x7f0cf774d520, prev = 0x7f00cd02b120}, fini = 0x7f4a0fbf8260 <posix_fini>, init = 0x7f4a0fbf68e0 <posix_init>, reconfigure = 0x7f4a0fbf5f60 <posix_reconfigure>, mem_acct_init = 0x7f4a0fbf5f40 <mem_acct_init>, dump_metrics = 0x0, notify = 0x7f4a0fbf5d90 <posix_notify>, loglevel = GF_LOG_NONE, stats = {total = {metrics = {{fop = {lk = 0x7f0440edbbc8 "", value = 0}, cbk = {lk = 0x7f0440edbbd0 "", value = 0}} <repeats 59 times>}, count = {lk = 0x7f0440edbf78 "", value = 0}}, interval = {latencies = { {min = 0, max = 0, total = 0, count = 0} <repeats 59 times>}, metrics = {{fop = {lk = 0x7f0440edc6e0 "", value = 0}, cbk = { lk = 0x7f0440edc6e8 "", value = 0}} <repeats 59 times>}, count = {lk = 0x7f0440edca90 "", value = 0}}}, history = 0x0, ctx = 0x55ceca7a5010, graph = 0x7f0cf774d400, itable = 0x0, init_succeeded = 0 '\000', private = 0x0, mem_acct = 0x7f0440eea4f0, winds = 0, switched = 0 '\000', local_pool = 0x0, is_autoloaded = false, volfile_id = 0x0, xl_id = 1, op_version = {1, 0, 0, 0}, flags = 0, id = 0, identifier = 0x7f4a0fbfa520 "posix", pass_through = false, pass_through_fops = 0x7f4a1e7f0c40 <_default_fops>, cleanup_starting = 0, call_cleanup = 0, category = GF_MAINTAINED, xprtrefcnt = {lk = 0x7f0440edcb40 "", value = 0}, notify_down = 0, parent_down = 0} And core dump is showing brick xlator name is "rep_rviudtsghmn2j", As we can see in brick logs brick(rep_rviudtsghmn2j) graph was not successfully built due to not able to spawn a Thread after throwing the error "Cannot allocate memory" >>>>>>>>>>>> [2020-01-13 06:01:09.544728] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 6d70754d-230b-494f-a981-fc3a31c632d0 [2020-01-13 06:01:09.544744] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-basevol_x3-server: accepted client from CTX_ID:0f899e45-573a-4983-be30-3b6c9a583a02-GRAPH_ID:0-PID:27959-HOST:rhsqa12.lab.eng.blr.redhat.com-PC_NAME:arb_bdxziwl7awxly-client-1-RECON_NO:-0 (version: 6.0) with subvol /gluster/brick1/arb_bdxziwl7awxly [2020-01-13 06:01:09.640607] I [glusterfsd-mgmt.c:928:glusterfs_handle_attach] 0-glusterfs: got attach for /var/lib/glusterd/vols/rep_rviudtsghmn2j/rep_rviudtsghmn2j.rhsqa10.lab.eng.blr.redhat.com.gluster-brick1-rep_rviudtsghmn2j.vol [2020-01-13 06:01:09.648163] I [socket.c:962:__socket_server_bind] 31293-socket.rep_rviudtsghmn2j-changelog: closing (AF_UNIX) reuse check socket 185 [2020-01-13 06:01:09.648414] I [MSGID: 121050] [ctr-helper.c:248:extract_ctr_options] 31293-gfdbdatastore: CTR Xlator is disabled. [2020-01-13 06:01:09.648446] I [MSGID: 121050] [changetimerecorder.c:2144:init] 31293-gfdbdatastore: CTR Xlator is not enabled so skip ctr init [2020-01-13 06:01:09.648489] I [trash.c:2450:init] 31293-rep_rviudtsghmn2j-trash: no option specified for 'eliminate', using NULL [2020-01-13 06:01:09.649592] E [MSGID: 101072] [common-utils.c:4030:gf_thread_vcreate] 31293-rep_rviudtsghmn2j-posix: Thread creation failed [Resource temporarily unavailable] [2020-01-13 06:01:09.649626] E [MSGID: 113075] [posix-helpers.c:2276:posix_spawn_health_check_thread] 31293-rep_rviudtsghmn2j-posix: unable to setup health-check thread [Cannot allocate memory] [2020-01-13 06:01:09.649647] E [MSGID: 101019] [xlator.c:629:xlator_init] 0-rep_rviudtsghmn2j-posix: Initialization of volume 'rep_rviudtsghmn2j-posix' failed, review your volfile again [2020-01-13 06:01:09.649661] E [MSGID: 101066] [graph.c:362:glusterfs_graph_init] 0-rep_rviudtsghmn2j-posix: initializing translator failed [2020-01-13 06:01:09.649674] W [graph.c:1235:glusterfs_graph_attach] 0-glusterfs: failed to initialize graph for xlator /gluster/brick1/rep_rviudtsghmn2j [2020-01-13 06:01:09.649731] E [rpcsvc.c:631:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor (4867634:2:12) failed to complete successfully [2020-01-13 06:01:09.906526] W [socket.c:781:__socket_rwv] 0-tcp.basevol_x3-server: readv on 10.70.44.10:45580 failed (No data available) [2020-01-13 06:01:09.906526] W [socket.c:781:__socket_rwv] 0-tcp.basevol_x3-server: readv on 10.70.44.10:45588 failed (No data available) [2020-01-13 06:01:09.906597] I [MSGID: 115036] [server.c:498:server_rpc_notify] 0-basevol_x3-server: disconnecting connection from CTX_ID:4f05dd6d-85f2-445d-87a2-59cb5d8b3e9f-GRAPH_ID:0-PID:61541-HOST:rhsqa10.lab.eng.blr.redhat.com-PC_NAME:arb_1ubdr7lvu47a0-client-2-RECON_NO:-0 [2020-01-13 06:01:09.906613] I [MSGID: 115036] [server.c:498:server_rpc_notify] 0-basevol_x3-server: disconnecting connection from CTX_ID:4f05dd6d-85f2-445d-87a2-59cb5d8b3e9f-GRAPH_ID:0-PID:61541-HOST:rhsqa10.lab.eng.blr.redhat.com-PC_NAME:arb_0swl6dsidwusm-client-1-RECON_NO:-0 [2020-01-13 06:01:09.906692] W [socket.c:781:__socket_rwv] 0-tcp.basevol_x3-server: readv on 10.70.44.10:45576 failed (No data available) [2020-01-13 06:01:09.906744] I [MSGID: 115036] [server.c:498:server_rpc_notify] 0-basevol_x3-server: disconnecting connection from CTX_ID:4f05dd6d-85f2-445d-87a2-59cb5d8b3e9f-GRAPH_ID:0-PID:61541-HOST:rhsqa10.lab.eng.blr.redhat.com-PC_NAME:arb_2c7dd68qsbr7i-client-1-RECON_NO:-0 [2020-01-13 06:01:09.906741] W [socket.c:781:__socket_rwv] 0-tcp.basevol_x3-server: readv on 10.70.44.10:45568 failed (No data available) >>>>>>>>>>>>>>> After checked the script logs it is showing at that moment brick memory consumption was around 19.2g, it seems brick has some leak. #### top output and volume count after creating additional 100 volumes and before deleting them in iteration 313 Mon Jan 13 11:50:53 IST 2020 5248 root 20 0 291.6g 19.2g 5008 D 22.2 30.6 2301:06 glusterfsd 4326 root 20 0 629532 133116 4192 S 5.6 0.2 317:11.07 glusterd 61877 root 20 0 1125156 16312 4316 S 0.0 0.0 0:00.43 glusterfsd 62166 root 20 0 1123908 16540 4336 S 0.0 0.0 0:00.40 glusterfsd 62630 root 20 0 2929700 43540 4348 S 0.0 0.1 0:00.90 glusterfsd 64488 root 20 0 266996 4004 2852 S 0.0 0.0 0:00.00 gluster 64490 root 20 0 4083240 152816 3204 S 0.0 0.2 0:03.51 glusterfs To fix the same I have to reproduce the issue and trying to find out the leak code path then I can provide the fix. Thanks, Mohit Agrawal
There are some leaks that are known and those existed in previous release also. It is coincident brick was not crashed in previous releases in case of unavailable of memory but I believe it is not a regression. I am still testing the patch, it will take some time. I will provide the update as soon as the testing will finish. Two known patches specific to resolve a memory leak https://review.gluster.org/#/c/glusterfs/+/23642/ https://review.gluster.org/#/c/glusterfs/+/22738/
Nag, Can you please try to reproduce in 3.5 to prove it is not a regression? I believe memory leak is not specific to 3.5.1?
(In reply to Mohit Agrawal from comment #7) > Nag, > > Can you please try to reproduce in 3.5 to prove it is not a regression? > I believe memory leak is not specific to 3.5.1? MOhit, as per previous test results , it was run in 3.5.0 and hence marked as regression. That means it was run in 3.5.0
Hi, This is not a regression, in 3.5 also the brick process was consuming similar memory the only thing we need to wait for the crash. It depends on the system but after sometime when a brick is not able to get memory it will also crash. As we can see on the same system 3_5_1 for attempt(277) brick usage was 17.6G grep -iw "volume count at start of iteration 277" 351_memleak-RHG3-13273.log -A 5 #### top output and volume count at start of iteration 277 Mon Jan 13 01:10:25 IST 2020 4326 root 20 0 624796 126136 5368 S 0.0 0.2 267:11.85 glusterd 5248 root 20 0 255.3g 17.6g 4988 S 0.0 28.0 2013:52 glusterfsd 74732 root 20 0 704148 7740 3096 S 0.0 0.0 0:00.08 glusterfs #### top output and volume count after creating additional 100 volumes and before deleting them in iteration 277 and on 3.5 the current test case is running consumption is 17.5g it is almost the same. #### top output and volume count at start of iteration 277 Mon Jan 20 14:28:14 IST 2020 4365 root 20 0 624796 125896 5416 S 0.0 0.2 262:45.19 glusterd 5295 root 20 0 255.1g 17.5g 4996 S 0.0 27.9 1774:22 glusterfsd 74508 root 20 0 704148 7740 3092 S 0.0 0.0 0:00.08 glusterfs This leak exists in brick_mux from day 1. At this point of time, it would be difficult to fix because regress testing is required to validate the patch. The fix is in core part and to avoid the leak we have to delete detached xlator objects completely or we need to implement some other mechanism so that brick should not crash if two threads access detached xl object at the same time. @Nag, Can you please delete this bug from blocker list for 3.5.1 Regards, Mohit Agrawal
Hi, Yes, it is the same as I mentioned earlier. As we can see in mem-script logs for the attempt 318 new brick processes are spawned. >>>>>>>>>>>>> #### top output and volume count at start of iteration 318 Tue Jan 21 01:38:02 IST 2020 4365 root 20 0 629916 132380 5436 S 0.0 0.2 314:55.83 glusterd 5295 root 20 0 291.3g 19.2g 4996 S 0.0 30.6 2026:53 glusterfsd 38392 root 20 0 458364 9668 3064 S 0.0 0.0 0:00.05 glusterfs #### top output and volume count after creating additional 100 volumes and before deleting them in iteration 318 Tue Jan 21 02:32:39 IST 2020 62047 root 20 0 3926132 151844 3156 S 177.8 0.2 0:04.14 glusterfs 39566 root 20 0 33.8g 777152 4712 S 11.1 1.2 2:32.47 glusterfsd 4365 root 20 0 630172 131292 4236 S 0.0 0.2 315:19.46 glusterd 39334 root 20 0 1125156 24836 4556 S 0.0 0.0 0:02.58 glusterfsd 39396 root 20 0 1125156 24092 4564 S 0.0 0.0 0:02.65 glusterfsd 39463 root 20 0 1125156 22760 4560 S 0.0 0.0 0:02.39 glusterfsd 62046 root 20 0 146712 2084 1456 S 0.0 0.0 0:00.02 glusterfs #### top output and volume count after deleting all the volumes created in iteration 318 Tue Jan 21 02:40:34 IST 2020 #### top output and volume count at start of iteration 319 Tue Jan 21 02:40:34 IST 2020 4365 root 20 0 630300 131316 4236 S 0.0 0.2 315:37.06 glusterd 78430 root 20 0 458364 7624 3068 S 0.0 0.0 0:00.06 glusterfs >>>>>>>>>>>> Logs are also confirming the same brick was crashed due to not able to get memory. >>>>>>>>>>>>>>> [2020-01-20 20:08:13.142913] I [glusterfsd-mgmt.c:928:glusterfs_handle_attach] 0-glusterfs: got attach for /var/lib/glusterd/vols/rep_2szcvqrcdn6xt/rep_2szcvqrcdn6xt.rhsqa10.lab.eng.blr.redhat.com.gluster-brick1-rep_2szcvqrcdn6xt.vol [2020-01-20 20:08:13.159599] A [MSGID: 0] [mem-pool.c:145:__gf_calloc] : no memory available for size (9437112) current memory usage in kilobytes 20114652 [call stack follows] /lib64/libglusterfs.so.0(+0x271e0)[0x7fdff80811e0] >>>>>>>>>>>>>>> >>>>>>>>>>>>. [2020-01-21 05:59:10.756378] I [glusterfsd-mgmt.c:928:glusterfs_handle_attach] 0-glusterfs: got attach for /var/lib/glusterd/vols/rep_pr9ynhpxpyk5s/rep_pr9ynhpxpyk5s.rhsqa12.lab.eng.blr.redhat.com.gluster-brick1-rep_pr9ynhpxpyk5s.vol [2020-01-21 05:59:10.771123] A [MSGID: 0] [mem-pool.c:145:__gf_calloc] : no memory available for size (9437112) current memory usage in kilobytes 20425156 [call stack follows] /lib64/libglusterfs.so.0(+0x271e0)[0x7f5e76f201e0] /lib64/libglusterfs.so.0(_gf_msg_nomem+0x401)[0x7f5e76f207e1] /lib64/libglusterfs.so.0(__gf_calloc+0xf8)[0x7f5e76f4be08] /usr/lib64/glusterfs/6.0/xlator/debug/io-stats.so(+0xf425)[0x7f5e61320425] /usr/lib64/glusterfs/6.0/xlator/debug/io-stats.so(+0xf4a5)[0x7f5e613204a5] [2020-01-21 05:59:10.771711] E [io-stats.c:3955:init] 31573-rep_pr9ynhpxpyk5s-io-stats: Out of memory. >>>>>>>>>>>>>>>>>> Thanks, Mohit Agrawal
*** Bug 1653225 has been marked as a duplicate of this bug. ***
(In reply to Mohit Agrawal from comment #12) > >>>>>>>>>>>>>>> > > [2020-01-20 20:08:13.142913] I > [glusterfsd-mgmt.c:928:glusterfs_handle_attach] 0-glusterfs: got attach for > /var/lib/glusterd/vols/rep_2szcvqrcdn6xt/rep_2szcvqrcdn6xt.rhsqa10.lab.eng. > blr.redhat.com.gluster-brick1-rep_2szcvqrcdn6xt.vol > [2020-01-20 20:08:13.159599] A [MSGID: 0] [mem-pool.c:145:__gf_calloc] : no > memory available for size (9437112) current memory usage in kilobytes > 20114652 [call stack follows] What is trying to allocate so much memory (9437112) ? io-stats xlator with so many bricks requires so much memory?
*** Bug 1761423 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5603