Description of problem: ======================= OOM kill of glusterfsd is seen during parallel untar + continuous add-bricks. Version-Release number of selected component (if applicable): 3.8.4-10.el7rhgs.x86_64 How reproducible: ================= 2/2 Steps to Reproduce: =================== 1) Create ganesha cluster and create a distributed-replicate volume. 2) Enable nfs-ganesha on the volume with mdcache settings. 3) Mount the volume. 4) On the mount point, copy linux kernel package and create three empty directories say 1,2 and 3 5) Now, simultaneously start untarring the kernel package to directories 1,2 and 3. tar -xvf linux-4.4.1.tar -C 1 tar -xvf linux-4.4.1.tar -C 2 tar -xvf linux-4.4.1.tar -C 3 6) While untar is in-progress, continuously add bricks to the volume. glusterfsd OOM killed. Jan 9 17:54:09 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29592:stderr [ 0 bytes (0 B) copied, 11.162 s, 0.0 kB/s ] Jan 9 17:54:24 dhcp47-155 corosync[14072]: [MAIN ] Corosync main process was not scheduled for 2967.8682 ms (threshold is 1840.0000 ms). Consider token timeout increase. Jan 9 17:54:24 dhcp47-155 kernel: lrmd invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0 Jan 9 17:54:24 dhcp47-155 kernel: lrmd cpuset=/ mems_allowed=0 Jan 9 17:54:24 dhcp47-155 kernel: CPU: 1 PID: 14091 Comm: lrmd Not tainted 3.10.0-514.2.2.el7.x86_64 #1 Jan 9 17:54:24 dhcp47-155 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Jan 9 17:54:24 dhcp47-155 kernel: ffff880036283ec0 000000003402a8ea ffff88021aaebae8 ffffffff816860cc Jan 9 17:54:24 dhcp47-155 kernel: ffff88021aaebb78 ffffffff81681077 ffffffff812ae87b ffff8800bb68efa0 Jan 9 17:54:24 dhcp47-155 kernel: ffff8800bb68efb8 ffffffff00000206 fbfeebff00000000 000000000000000d Jan 9 17:54:24 dhcp47-155 kernel: Call Trace: Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff816860cc>] dump_stack+0x19/0x1b Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81681077>] dump_header+0x8e/0x225 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff812ae87b>] ? cred_has_capability+0x6b/0x120 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8113cc03>] ? proc_do_uts_string+0x123/0x130 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8118476e>] oom_kill_process+0x24e/0x3c0 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff810937ee>] ? has_capability_noaudit+0x1e/0x30 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81184fa6>] out_of_memory+0x4b6/0x4f0 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81681b80>] __alloc_pages_slowpath+0x5d7/0x725 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8118b0b5>] __alloc_pages_nodemask+0x405/0x420 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff810831cd>] copy_process+0x1dd/0x1960 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81084b01>] do_fork+0x91/0x2c0 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8121cc80>] ? get_unused_fd_flags+0x30/0x40 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8121ccd7>] ? __fd_install+0x47/0x60 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81084db6>] SyS_clone+0x16/0x20 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81696a19>] stub_clone+0x69/0x90 Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff816966c9>] ? system_call_fastpath+0x16/0x1b Jan 9 17:54:24 dhcp47-155 kernel: Mem-Info: Jan 9 17:54:24 dhcp47-155 kernel: active_anon:1576807 inactive_anon:311606 isolated_anon:0#012 active_file:376 inactive_file:1858 isolated_file:0#012 unevictable:8617 dirty:0 writeback:6 unstable:0#012 slab_reclaimable:11488 slab_unreclaimable:18027#012 mapped:12310 shmem:93443 pagetables:9098 bounce:0#012 free:25869 free_pcp:19 free_cma:0 Jan 9 17:54:24 dhcp47-155 kernel: Node 0 DMA free:15908kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jan 9 17:54:24 dhcp47-155 kernel: lowmem_reserve[]: 0 2816 7807 7807 Jan 9 17:54:24 dhcp47-155 kernel: Node 0 DMA32 free:44460kB min:24324kB low:30404kB high:36484kB active_anon:2178252kB inactive_anon:558164kB active_file:916kB inactive_file:4532kB unevictable:10208kB isolated(anon):0kB isolated(file):0kB present:3129336kB managed:2884596kB mlocked:10208kB dirty:0kB writeback:12kB mapped:14092kB shmem:114240kB slab_reclaimable:18136kB slab_unreclaimable:24584kB kernel_stack:5616kB pagetables:11996kB unstable:0kB bounce:0kB free_pcp:28kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:7473 all_unreclaimable? yes Jan 9 17:54:28 dhcp47-155 kernel: lowmem_reserve[]: 0 0 4990 4990 Jan 9 17:54:28 dhcp47-155 kernel: Node 0 Normal free:43108kB min:43120kB low:53900kB high:64680kB active_anon:4128976kB inactive_anon:688260kB active_file:588kB inactive_file:2900kB unevictable:24260kB isolated(anon):0kB isolated(file):0kB present:5242880kB managed:5110044kB mlocked:24260kB dirty:0kB writeback:12kB mapped:35148kB shmem:259532kB slab_reclaimable:27816kB slab_unreclaimable:47524kB kernel_stack:13424kB pagetables:24396kB unstable:0kB bounce:0kB free_pcp:48kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5880 all_unreclaimable? yes Jan 9 17:54:28 dhcp47-155 kernel: lowmem_reserve[]: 0 0 0 0 Jan 9 17:54:28 dhcp47-155 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB Jan 9 17:54:28 dhcp47-155 kernel: Node 0 DMA32: 5017*4kB (UE) 3064*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44580kB Jan 9 17:54:28 dhcp47-155 kernel: Node 0 Normal: 10808*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43232kB Jan 9 17:54:28 dhcp47-155 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Jan 9 17:54:28 dhcp47-155 kernel: 147527 total pagecache pages Jan 9 17:54:28 dhcp47-155 kernel: 49426 pages in swap cache Jan 9 17:54:28 dhcp47-155 kernel: Swap cache stats: add 1417257, delete 1367831, find 6662219/6739670 Jan 9 17:54:28 dhcp47-155 kernel: Free swap = 0kB Jan 9 17:54:28 dhcp47-155 kernel: Total swap = 2097148kB Jan 9 17:54:28 dhcp47-155 kernel: 2097052 pages RAM Jan 9 17:54:28 dhcp47-155 kernel: 0 pages HighMem/MovableOnly Jan 9 17:54:28 dhcp47-155 kernel: 94415 pages reserved Jan 9 17:54:28 dhcp47-155 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Jan 9 17:54:28 dhcp47-155 kernel: [ 698] 0 698 13398 4048 31 58 0 systemd-journal Jan 9 17:54:28 dhcp47-155 kernel: [ 705] 0 705 218768 813 42 637 0 lvmetad Jan 9 17:54:28 dhcp47-155 kernel: [ 723] 0 723 11580 674 23 430 -1000 systemd-udevd Jan 9 17:54:28 dhcp47-155 kernel: [ 1328] 0 1328 13854 217 27 84 -1000 auditd Jan 9 17:54:28 dhcp47-155 kernel: [ 1347] 998 1347 132067 472 54 1846 0 polkitd Jan 9 17:54:28 dhcp47-155 kernel: [ 1348] 81 1348 24617 310 18 62 -900 dbus-daemon Jan 9 17:54:28 dhcp47-155 kernel: [ 1352] 0 1352 6066 291 15 30 0 systemd-logind Jan 9 17:54:28 dhcp47-155 kernel: [ 1354] 0 1354 4827 213 14 44 0 irqbalance Jan 9 17:54:28 dhcp47-155 kernel: [ 1355] 0 1355 53131 264 57 410 0 abrtd Jan 9 17:54:28 dhcp47-155 kernel: [ 1364] 997 1364 28962 313 27 36 0 chronyd Jan 9 17:54:28 dhcp47-155 kernel: [ 1367] 32 1367 16263 183 35 118 0 rpcbind Jan 9 17:54:28 dhcp47-155 kernel: [ 1369] 0 1369 50303 148 37 114 0 gssproxy Jan 9 17:54:28 dhcp47-155 kernel: [ 1370] 0 1370 82866 4667 82 1779 0 firewalld Jan 9 17:54:28 dhcp47-155 kernel: [ 1734] 0 1734 28206 147 53 3061 0 dhclient Jan 9 17:54:28 dhcp47-155 kernel: [ 1828] 0 1828 138288 1830 87 1280 0 tuned Jan 9 17:54:28 dhcp47-155 kernel: [ 1832] 0 1832 28335 120 11 17 0 rhsmcertd Jan 9 17:54:28 dhcp47-155 kernel: [ 1847] 0 1847 20617 83 42 188 -1000 sshd Jan 9 17:54:28 dhcp47-155 kernel: [ 1962] 0 1962 22244 196 42 236 0 master Jan 9 17:54:28 dhcp47-155 kernel: [ 1964] 89 1964 22287 211 44 235 0 qmgr Jan 9 17:54:28 dhcp47-155 kernel: [ 2354] 0 2354 31555 246 17 130 0 crond Jan 9 17:54:28 dhcp47-155 kernel: [ 2362] 0 2362 26978 130 9 9 0 rhnsd Jan 9 17:54:28 dhcp47-155 kernel: [ 2378] 0 2378 27509 165 10 30 0 agetty Jan 9 17:54:28 dhcp47-155 kernel: [19586] 0 19586 179084 8344 48 0 -1000 dmeventd Jan 9 17:54:28 dhcp47-155 kernel: [ 6147] 0 6147 137119 3257 121 91 0 rsyslogd Jan 9 17:54:28 dhcp47-155 kernel: [14689] 0 14689 243755 3755 91 7687 0 glusterfs Jan 9 17:54:28 dhcp47-155 kernel: [17037] 0 17037 172515 23542 135 11287 0 glusterd Jan 9 17:54:28 dhcp47-155 kernel: [15978] 0 15978 263238 721 106 8845 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [15998] 0 15998 263238 745 105 8190 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [16018] 0 16018 263238 177 104 8846 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [11397] 0 11397 314084 2379 110 8227 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [11417] 0 11417 330983 2919 111 8186 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [11437] 0 11437 330982 2398 113 8552 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [30763] 0 30763 262627 782 108 9807 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [30782] 0 30782 262626 839 105 9788 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [30801] 0 30801 262626 822 106 8770 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [15232] 0 15232 494829 3141 131 8902 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [14072] 0 14072 49475 5891 62 1220 0 corosync Jan 9 17:54:28 dhcp47-155 kernel: [14088] 0 14088 33158 442 66 972 0 pacemakerd Jan 9 17:54:28 dhcp47-155 kernel: [14089] 189 14089 34571 2375 67 1201 0 cib Jan 9 17:54:28 dhcp47-155 kernel: [14090] 0 14090 34287 1119 69 364 0 stonithd Jan 9 17:54:28 dhcp47-155 kernel: [14091] 0 14091 26273 459 52 227 0 lrmd Jan 9 17:54:28 dhcp47-155 kernel: [14092] 189 14092 31731 1286 62 410 0 attrd Jan 9 17:54:28 dhcp47-155 kernel: [14093] 189 14093 39003 3989 71 182 0 pengine Jan 9 17:54:28 dhcp47-155 kernel: [14094] 189 14094 46982 4041 81 703 0 crmd Jan 9 17:54:28 dhcp47-155 kernel: [14099] 0 14099 244360 10766 108 620 0 pcsd Jan 9 17:54:28 dhcp47-155 kernel: [21690] 0 21690 1008670 407551 1106 55342 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [21709] 0 21709 1005223 346311 1125 128491 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [21736] 0 21736 1010977 307333 1124 163215 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [32553] 0 32553 151450 400 66 6330 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [22781] 0 22781 625453 229961 817 18368 0 glusterfs Jan 9 17:54:28 dhcp47-155 kernel: [ 8894] 0 8894 1480087 336954 1214 0 0 ganesha.nfsd Jan 9 17:54:28 dhcp47-155 kernel: [27848] 0 27848 35743 602 71 0 0 sshd Jan 9 17:54:28 dhcp47-155 kernel: [27860] 0 27860 28879 329 14 0 0 bash Jan 9 17:54:28 dhcp47-155 kernel: [ 9987] 89 9987 22270 472 42 0 0 pickup Jan 9 17:54:28 dhcp47-155 kernel: [21817] 0 21817 35726 595 72 0 0 sshd Jan 9 17:54:28 dhcp47-155 kernel: [21894] 0 21894 28879 289 14 0 0 bash Jan 9 17:54:28 dhcp47-155 kernel: [29371] 0 29371 367307 8841 116 0 0 glusterfsd Jan 9 17:54:28 dhcp47-155 kernel: [29745] 0 29745 39435 2870 26 0 0 glusterfs Jan 9 17:54:28 dhcp47-155 kernel: [29746] 0 29746 205479 23309 131 0 0 glusterfs Jan 9 17:54:28 dhcp47-155 kernel: [29753] 0 29753 39435 2869 27 0 0 glusterfs Jan 9 17:54:28 dhcp47-155 kernel: [29754] 0 29754 111964 12874 87 0 0 glusterfs Jan 9 17:54:28 dhcp47-155 kernel: [29771] 0 29771 28811 231 13 0 0 S28Quota-root-x Jan 9 17:54:28 dhcp47-155 kernel: [29777] 0 29777 40032 3400 28 0 0 glusterfs Jan 9 17:54:28 dhcp47-155 kernel: [29779] 0 29779 90002 3539 45 0 0 glusterfs Jan 9 17:54:28 dhcp47-155 kernel: [29784] 0 29784 9807 135 21 418 0 systemd-udevd Jan 9 17:54:28 dhcp47-155 kernel: Out of memory: Kill process 21709 (glusterfsd) score 182 or sacrifice child Jan 9 17:54:28 dhcp47-155 kernel: Killed process 21709 (glusterfsd) total-vm:4020892kB, anon-rss:1384648kB, file-rss:596kB, shmem-rss:0kB Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] A processor failed, forming new configuration. Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] Process pause detected for 2433 ms, flushing membership messages. Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] A new membership (10.70.46.42:2700) was formed. Members Jan 9 17:54:28 dhcp47-155 corosync[14072]: [QUORUM] Members[4]: 1 2 3 4 Jan 9 17:54:28 dhcp47-155 corosync[14072]: [MAIN ] Completed service synchronization, ready to provide service. Jan 9 17:54:28 dhcp47-155 crmd[14094]: notice: High CPU load detected: 9.650000 Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ 0+0 records in ] Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ 0+0 records out ] Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ 0 bytes (0 B) copied, 0.0124017 s, 0.0 kB/s ] Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ 0+0 records in ] Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ 0+0 records out ] Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ 0 bytes (0 B) copied, 0.0972941 s, 0.0 kB/s ] Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ 0+0 records in ] Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ 0+0 records out ] Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ 0 bytes (0 B) copied, 0.0104596 s, 0.0 kB/s ] Jan 9 17:54:57 dhcp47-155 crmd[14094]: notice: High CPU load detected: 8.460000 Actual results: =============== OOM kill of glusterfsd during continuous add-bricks. Expected results: ================= There should not be any OOM kills
(In reply to Prasad Desala from comment #0) > Description of problem: > ======================= > OOM kill of glusterfsd is seen during parallel untar + continuous add-bricks. > > Version-Release number of selected component (if applicable): > 3.8.4-10.el7rhgs.x86_64 > > How reproducible: > ================= > 2/2 > > Steps to Reproduce: > =================== > 1) Create ganesha cluster and create a distributed-replicate volume. > 2) Enable nfs-ganesha on the volume with mdcache settings. > 3) Mount the volume. > 4) On the mount point, copy linux kernel package and create three empty > directories say 1,2 and 3 > 5) Now, simultaneously start untarring the kernel package to directories 1,2 > and 3. > tar -xvf linux-4.4.1.tar -C 1 > tar -xvf linux-4.4.1.tar -C 2 > tar -xvf linux-4.4.1.tar -C 3 > 6) While untar is in-progress, continuously add bricks to the volume. How many add-brick commands here? > > glusterfsd OOM killed. > > Jan 9 17:54:09 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29592:stderr [ 0 > bytes (0 B) copied, 11.162 s, 0.0 kB/s ] > Jan 9 17:54:24 dhcp47-155 corosync[14072]: [MAIN ] Corosync main process > was not scheduled for 2967.8682 ms (threshold is 1840.0000 ms). Consider > token timeout increase. > Jan 9 17:54:24 dhcp47-155 kernel: lrmd invoked oom-killer: > gfp_mask=0x3000d0, order=2, oom_score_adj=0 > Jan 9 17:54:24 dhcp47-155 kernel: lrmd cpuset=/ mems_allowed=0 > Jan 9 17:54:24 dhcp47-155 kernel: CPU: 1 PID: 14091 Comm: lrmd Not tainted > 3.10.0-514.2.2.el7.x86_64 #1 > Jan 9 17:54:24 dhcp47-155 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 > 01/01/2011 > Jan 9 17:54:24 dhcp47-155 kernel: ffff880036283ec0 000000003402a8ea > ffff88021aaebae8 ffffffff816860cc > Jan 9 17:54:24 dhcp47-155 kernel: ffff88021aaebb78 ffffffff81681077 > ffffffff812ae87b ffff8800bb68efa0 > Jan 9 17:54:24 dhcp47-155 kernel: ffff8800bb68efb8 ffffffff00000206 > fbfeebff00000000 000000000000000d > Jan 9 17:54:24 dhcp47-155 kernel: Call Trace: > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff816860cc>] dump_stack+0x19/0x1b > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81681077>] > dump_header+0x8e/0x225 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff812ae87b>] ? > cred_has_capability+0x6b/0x120 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8113cc03>] ? > proc_do_uts_string+0x123/0x130 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8118476e>] > oom_kill_process+0x24e/0x3c0 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff810937ee>] ? > has_capability_noaudit+0x1e/0x30 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81184fa6>] > out_of_memory+0x4b6/0x4f0 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81681b80>] > __alloc_pages_slowpath+0x5d7/0x725 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8118b0b5>] > __alloc_pages_nodemask+0x405/0x420 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff810831cd>] > copy_process+0x1dd/0x1960 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81084b01>] do_fork+0x91/0x2c0 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8121cc80>] ? > get_unused_fd_flags+0x30/0x40 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8121ccd7>] ? > __fd_install+0x47/0x60 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81084db6>] SyS_clone+0x16/0x20 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81696a19>] stub_clone+0x69/0x90 > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff816966c9>] ? > system_call_fastpath+0x16/0x1b > Jan 9 17:54:24 dhcp47-155 kernel: Mem-Info: > Jan 9 17:54:24 dhcp47-155 kernel: active_anon:1576807 inactive_anon:311606 > isolated_anon:0#012 active_file:376 inactive_file:1858 isolated_file:0#012 > unevictable:8617 dirty:0 writeback:6 unstable:0#012 slab_reclaimable:11488 > slab_unreclaimable:18027#012 mapped:12310 shmem:93443 pagetables:9098 > bounce:0#012 free:25869 free_pcp:19 free_cma:0 > Jan 9 17:54:24 dhcp47-155 kernel: Node 0 DMA free:15908kB min:132kB > low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB > inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB > mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB > kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB > local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 > all_unreclaimable? yes > Jan 9 17:54:24 dhcp47-155 kernel: lowmem_reserve[]: 0 2816 7807 7807 > Jan 9 17:54:24 dhcp47-155 kernel: Node 0 DMA32 free:44460kB min:24324kB > low:30404kB high:36484kB active_anon:2178252kB inactive_anon:558164kB > active_file:916kB inactive_file:4532kB unevictable:10208kB > isolated(anon):0kB isolated(file):0kB present:3129336kB managed:2884596kB > mlocked:10208kB dirty:0kB writeback:12kB mapped:14092kB shmem:114240kB > slab_reclaimable:18136kB slab_unreclaimable:24584kB kernel_stack:5616kB > pagetables:11996kB unstable:0kB bounce:0kB free_pcp:28kB local_pcp:0kB > free_cma:0kB writeback_tmp:0kB pages_scanned:7473 all_unreclaimable? yes > Jan 9 17:54:28 dhcp47-155 kernel: lowmem_reserve[]: 0 0 4990 4990 > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 Normal free:43108kB min:43120kB > low:53900kB high:64680kB active_anon:4128976kB inactive_anon:688260kB > active_file:588kB inactive_file:2900kB unevictable:24260kB > isolated(anon):0kB isolated(file):0kB present:5242880kB managed:5110044kB > mlocked:24260kB dirty:0kB writeback:12kB mapped:35148kB shmem:259532kB > slab_reclaimable:27816kB slab_unreclaimable:47524kB kernel_stack:13424kB > pagetables:24396kB unstable:0kB bounce:0kB free_pcp:48kB local_pcp:0kB > free_cma:0kB writeback_tmp:0kB pages_scanned:5880 all_unreclaimable? yes > Jan 9 17:54:28 dhcp47-155 kernel: lowmem_reserve[]: 0 0 0 0 > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB > (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) > 3*4096kB (M) = 15908kB > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 DMA32: 5017*4kB (UE) 3064*8kB (UE) > 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = > 44580kB > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 Normal: 10808*4kB (UM) 0*8kB > 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = > 43232kB > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 hugepages_total=0 hugepages_free=0 > hugepages_surp=0 hugepages_size=2048kB > Jan 9 17:54:28 dhcp47-155 kernel: 147527 total pagecache pages > Jan 9 17:54:28 dhcp47-155 kernel: 49426 pages in swap cache > Jan 9 17:54:28 dhcp47-155 kernel: Swap cache stats: add 1417257, delete > 1367831, find 6662219/6739670 > Jan 9 17:54:28 dhcp47-155 kernel: Free swap = 0kB > Jan 9 17:54:28 dhcp47-155 kernel: Total swap = 2097148kB > Jan 9 17:54:28 dhcp47-155 kernel: 2097052 pages RAM > Jan 9 17:54:28 dhcp47-155 kernel: 0 pages HighMem/MovableOnly > Jan 9 17:54:28 dhcp47-155 kernel: 94415 pages reserved > Jan 9 17:54:28 dhcp47-155 kernel: [ pid ] uid tgid total_vm rss > nr_ptes swapents oom_score_adj name > Jan 9 17:54:28 dhcp47-155 kernel: [ 698] 0 698 13398 4048 > 31 58 0 systemd-journal > Jan 9 17:54:28 dhcp47-155 kernel: [ 705] 0 705 218768 813 > 42 637 0 lvmetad > Jan 9 17:54:28 dhcp47-155 kernel: [ 723] 0 723 11580 674 > 23 430 -1000 systemd-udevd > Jan 9 17:54:28 dhcp47-155 kernel: [ 1328] 0 1328 13854 217 > 27 84 -1000 auditd > Jan 9 17:54:28 dhcp47-155 kernel: [ 1347] 998 1347 132067 472 > 54 1846 0 polkitd > Jan 9 17:54:28 dhcp47-155 kernel: [ 1348] 81 1348 24617 310 > 18 62 -900 dbus-daemon > Jan 9 17:54:28 dhcp47-155 kernel: [ 1352] 0 1352 6066 291 > 15 30 0 systemd-logind > Jan 9 17:54:28 dhcp47-155 kernel: [ 1354] 0 1354 4827 213 > 14 44 0 irqbalance > Jan 9 17:54:28 dhcp47-155 kernel: [ 1355] 0 1355 53131 264 > 57 410 0 abrtd > Jan 9 17:54:28 dhcp47-155 kernel: [ 1364] 997 1364 28962 313 > 27 36 0 chronyd > Jan 9 17:54:28 dhcp47-155 kernel: [ 1367] 32 1367 16263 183 > 35 118 0 rpcbind > Jan 9 17:54:28 dhcp47-155 kernel: [ 1369] 0 1369 50303 148 > 37 114 0 gssproxy > Jan 9 17:54:28 dhcp47-155 kernel: [ 1370] 0 1370 82866 4667 > 82 1779 0 firewalld > Jan 9 17:54:28 dhcp47-155 kernel: [ 1734] 0 1734 28206 147 > 53 3061 0 dhclient > Jan 9 17:54:28 dhcp47-155 kernel: [ 1828] 0 1828 138288 1830 > 87 1280 0 tuned > Jan 9 17:54:28 dhcp47-155 kernel: [ 1832] 0 1832 28335 120 > 11 17 0 rhsmcertd > Jan 9 17:54:28 dhcp47-155 kernel: [ 1847] 0 1847 20617 83 > 42 188 -1000 sshd > Jan 9 17:54:28 dhcp47-155 kernel: [ 1962] 0 1962 22244 196 > 42 236 0 master > Jan 9 17:54:28 dhcp47-155 kernel: [ 1964] 89 1964 22287 211 > 44 235 0 qmgr > Jan 9 17:54:28 dhcp47-155 kernel: [ 2354] 0 2354 31555 246 > 17 130 0 crond > Jan 9 17:54:28 dhcp47-155 kernel: [ 2362] 0 2362 26978 130 > 9 9 0 rhnsd > Jan 9 17:54:28 dhcp47-155 kernel: [ 2378] 0 2378 27509 165 > 10 30 0 agetty > Jan 9 17:54:28 dhcp47-155 kernel: [19586] 0 19586 179084 8344 > 48 0 -1000 dmeventd > Jan 9 17:54:28 dhcp47-155 kernel: [ 6147] 0 6147 137119 3257 > 121 91 0 rsyslogd > Jan 9 17:54:28 dhcp47-155 kernel: [14689] 0 14689 243755 3755 > 91 7687 0 glusterfs > Jan 9 17:54:28 dhcp47-155 kernel: [17037] 0 17037 172515 23542 > 135 11287 0 glusterd > Jan 9 17:54:28 dhcp47-155 kernel: [15978] 0 15978 263238 721 > 106 8845 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [15998] 0 15998 263238 745 > 105 8190 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [16018] 0 16018 263238 177 > 104 8846 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [11397] 0 11397 314084 2379 > 110 8227 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [11417] 0 11417 330983 2919 > 111 8186 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [11437] 0 11437 330982 2398 > 113 8552 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [30763] 0 30763 262627 782 > 108 9807 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [30782] 0 30782 262626 839 > 105 9788 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [30801] 0 30801 262626 822 > 106 8770 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [15232] 0 15232 494829 3141 > 131 8902 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [14072] 0 14072 49475 5891 > 62 1220 0 corosync > Jan 9 17:54:28 dhcp47-155 kernel: [14088] 0 14088 33158 442 > 66 972 0 pacemakerd > Jan 9 17:54:28 dhcp47-155 kernel: [14089] 189 14089 34571 2375 > 67 1201 0 cib > Jan 9 17:54:28 dhcp47-155 kernel: [14090] 0 14090 34287 1119 > 69 364 0 stonithd > Jan 9 17:54:28 dhcp47-155 kernel: [14091] 0 14091 26273 459 > 52 227 0 lrmd > Jan 9 17:54:28 dhcp47-155 kernel: [14092] 189 14092 31731 1286 > 62 410 0 attrd > Jan 9 17:54:28 dhcp47-155 kernel: [14093] 189 14093 39003 3989 > 71 182 0 pengine > Jan 9 17:54:28 dhcp47-155 kernel: [14094] 189 14094 46982 4041 > 81 703 0 crmd > Jan 9 17:54:28 dhcp47-155 kernel: [14099] 0 14099 244360 10766 > 108 620 0 pcsd > Jan 9 17:54:28 dhcp47-155 kernel: [21690] 0 21690 1008670 407551 > 1106 55342 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [21709] 0 21709 1005223 346311 > 1125 128491 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [21736] 0 21736 1010977 307333 > 1124 163215 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [32553] 0 32553 151450 400 > 66 6330 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [22781] 0 22781 625453 229961 > 817 18368 0 glusterfs > Jan 9 17:54:28 dhcp47-155 kernel: [ 8894] 0 8894 1480087 336954 > 1214 0 0 ganesha.nfsd > Jan 9 17:54:28 dhcp47-155 kernel: [27848] 0 27848 35743 602 > 71 0 0 sshd > Jan 9 17:54:28 dhcp47-155 kernel: [27860] 0 27860 28879 329 > 14 0 0 bash > Jan 9 17:54:28 dhcp47-155 kernel: [ 9987] 89 9987 22270 472 > 42 0 0 pickup > Jan 9 17:54:28 dhcp47-155 kernel: [21817] 0 21817 35726 595 > 72 0 0 sshd > Jan 9 17:54:28 dhcp47-155 kernel: [21894] 0 21894 28879 289 > 14 0 0 bash > Jan 9 17:54:28 dhcp47-155 kernel: [29371] 0 29371 367307 8841 > 116 0 0 glusterfsd > Jan 9 17:54:28 dhcp47-155 kernel: [29745] 0 29745 39435 2870 > 26 0 0 glusterfs > Jan 9 17:54:28 dhcp47-155 kernel: [29746] 0 29746 205479 23309 > 131 0 0 glusterfs > Jan 9 17:54:28 dhcp47-155 kernel: [29753] 0 29753 39435 2869 > 27 0 0 glusterfs > Jan 9 17:54:28 dhcp47-155 kernel: [29754] 0 29754 111964 12874 > 87 0 0 glusterfs > Jan 9 17:54:28 dhcp47-155 kernel: [29771] 0 29771 28811 231 > 13 0 0 S28Quota-root-x > Jan 9 17:54:28 dhcp47-155 kernel: [29777] 0 29777 40032 3400 > 28 0 0 glusterfs > Jan 9 17:54:28 dhcp47-155 kernel: [29779] 0 29779 90002 3539 > 45 0 0 glusterfs > Jan 9 17:54:28 dhcp47-155 kernel: [29784] 0 29784 9807 135 > 21 418 0 systemd-udevd > Jan 9 17:54:28 dhcp47-155 kernel: Out of memory: Kill process 21709 > (glusterfsd) score 182 or sacrifice child > Jan 9 17:54:28 dhcp47-155 kernel: Killed process 21709 (glusterfsd) > total-vm:4020892kB, anon-rss:1384648kB, file-rss:596kB, shmem-rss:0kB > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] A processor failed, > forming new configuration. > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] Process pause detected > for 2433 ms, flushing membership messages. > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] A new membership > (10.70.46.42:2700) was formed. Members > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [QUORUM] Members[4]: 1 2 3 4 > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [MAIN ] Completed service > synchronization, ready to provide service. > Jan 9 17:54:28 dhcp47-155 crmd[14094]: notice: High CPU load detected: > 9.650000 > Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ > 0+0 records in ] > Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ > 0+0 records out ] > Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ 0 > bytes (0 B) copied, 0.0124017 s, 0.0 kB/s ] > Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ > 0+0 records in ] > Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ > 0+0 records out ] > Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ 0 > bytes (0 B) copied, 0.0972941 s, 0.0 kB/s ] > Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ > 0+0 records in ] > Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ > 0+0 records out ] > Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ 0 > bytes (0 B) copied, 0.0104596 s, 0.0 kB/s ] > Jan 9 17:54:57 dhcp47-155 crmd[14094]: notice: High CPU load detected: > 8.460000 > > Actual results: > =============== > OOM kill of glusterfsd during continuous add-bricks. > > Expected results: > ================= > There should not be any OOM kills
(In reply to Atin Mukherjee from comment #2) > (In reply to Prasad Desala from comment #0) > > Description of problem: > > ======================= > > OOM kill of glusterfsd is seen during parallel untar + continuous add-bricks. > > > > Version-Release number of selected component (if applicable): > > 3.8.4-10.el7rhgs.x86_64 > > > > How reproducible: > > ================= > > 2/2 > > > > Steps to Reproduce: > > =================== > > 1) Create ganesha cluster and create a distributed-replicate volume. > > 2) Enable nfs-ganesha on the volume with mdcache settings. > > 3) Mount the volume. > > 4) On the mount point, copy linux kernel package and create three empty > > directories say 1,2 and 3 > > 5) Now, simultaneously start untarring the kernel package to directories 1,2 > > and 3. > > tar -xvf linux-4.4.1.tar -C 1 > > tar -xvf linux-4.4.1.tar -C 2 > > tar -xvf linux-4.4.1.tar -C 3 > > 6) While untar is in-progress, continuously add bricks to the volume. > > How many add-brick commands here? Initially, the volume is 6x2. I have started parallel untar and started adding bricks. Below are the add-brick commands that were executed, gluster v add-brick distrep replica 2 10.70.46.42:/bricks/brick3/b3 10.70.46.101:/bricks/brick3/b3 gluster v add-brick distrep replica 2 10.70.47.155:/bricks/brick3/b3 10.70.47.167:/bricks/brick3/b3 gluster v add-brick distrep replica 2 10.70.46.42:/bricks/brick4/b4 10.70.46.101:/bricks/brick4/b4 gluster v add-brick distrep replica 2 10.70.47.155:/bricks/brick4/b4 10.70.47.167:/bricks/brick4/b4 > > > > glusterfsd OOM killed. > > > > Jan 9 17:54:09 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29592:stderr [ 0 > > bytes (0 B) copied, 11.162 s, 0.0 kB/s ] > > Jan 9 17:54:24 dhcp47-155 corosync[14072]: [MAIN ] Corosync main process > > was not scheduled for 2967.8682 ms (threshold is 1840.0000 ms). Consider > > token timeout increase. > > Jan 9 17:54:24 dhcp47-155 kernel: lrmd invoked oom-killer: > > gfp_mask=0x3000d0, order=2, oom_score_adj=0 > > Jan 9 17:54:24 dhcp47-155 kernel: lrmd cpuset=/ mems_allowed=0 > > Jan 9 17:54:24 dhcp47-155 kernel: CPU: 1 PID: 14091 Comm: lrmd Not tainted > > 3.10.0-514.2.2.el7.x86_64 #1 > > Jan 9 17:54:24 dhcp47-155 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 > > 01/01/2011 > > Jan 9 17:54:24 dhcp47-155 kernel: ffff880036283ec0 000000003402a8ea > > ffff88021aaebae8 ffffffff816860cc > > Jan 9 17:54:24 dhcp47-155 kernel: ffff88021aaebb78 ffffffff81681077 > > ffffffff812ae87b ffff8800bb68efa0 > > Jan 9 17:54:24 dhcp47-155 kernel: ffff8800bb68efb8 ffffffff00000206 > > fbfeebff00000000 000000000000000d > > Jan 9 17:54:24 dhcp47-155 kernel: Call Trace: > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff816860cc>] dump_stack+0x19/0x1b > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81681077>] > > dump_header+0x8e/0x225 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff812ae87b>] ? > > cred_has_capability+0x6b/0x120 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8113cc03>] ? > > proc_do_uts_string+0x123/0x130 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8118476e>] > > oom_kill_process+0x24e/0x3c0 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff810937ee>] ? > > has_capability_noaudit+0x1e/0x30 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81184fa6>] > > out_of_memory+0x4b6/0x4f0 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81681b80>] > > __alloc_pages_slowpath+0x5d7/0x725 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8118b0b5>] > > __alloc_pages_nodemask+0x405/0x420 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff810831cd>] > > copy_process+0x1dd/0x1960 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81084b01>] do_fork+0x91/0x2c0 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8121cc80>] ? > > get_unused_fd_flags+0x30/0x40 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff8121ccd7>] ? > > __fd_install+0x47/0x60 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81084db6>] SyS_clone+0x16/0x20 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff81696a19>] stub_clone+0x69/0x90 > > Jan 9 17:54:24 dhcp47-155 kernel: [<ffffffff816966c9>] ? > > system_call_fastpath+0x16/0x1b > > Jan 9 17:54:24 dhcp47-155 kernel: Mem-Info: > > Jan 9 17:54:24 dhcp47-155 kernel: active_anon:1576807 inactive_anon:311606 > > isolated_anon:0#012 active_file:376 inactive_file:1858 isolated_file:0#012 > > unevictable:8617 dirty:0 writeback:6 unstable:0#012 slab_reclaimable:11488 > > slab_unreclaimable:18027#012 mapped:12310 shmem:93443 pagetables:9098 > > bounce:0#012 free:25869 free_pcp:19 free_cma:0 > > Jan 9 17:54:24 dhcp47-155 kernel: Node 0 DMA free:15908kB min:132kB > > low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB > > inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > > present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB > > mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB > > kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB > > local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 > > all_unreclaimable? yes > > Jan 9 17:54:24 dhcp47-155 kernel: lowmem_reserve[]: 0 2816 7807 7807 > > Jan 9 17:54:24 dhcp47-155 kernel: Node 0 DMA32 free:44460kB min:24324kB > > low:30404kB high:36484kB active_anon:2178252kB inactive_anon:558164kB > > active_file:916kB inactive_file:4532kB unevictable:10208kB > > isolated(anon):0kB isolated(file):0kB present:3129336kB managed:2884596kB > > mlocked:10208kB dirty:0kB writeback:12kB mapped:14092kB shmem:114240kB > > slab_reclaimable:18136kB slab_unreclaimable:24584kB kernel_stack:5616kB > > pagetables:11996kB unstable:0kB bounce:0kB free_pcp:28kB local_pcp:0kB > > free_cma:0kB writeback_tmp:0kB pages_scanned:7473 all_unreclaimable? yes > > Jan 9 17:54:28 dhcp47-155 kernel: lowmem_reserve[]: 0 0 4990 4990 > > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 Normal free:43108kB min:43120kB > > low:53900kB high:64680kB active_anon:4128976kB inactive_anon:688260kB > > active_file:588kB inactive_file:2900kB unevictable:24260kB > > isolated(anon):0kB isolated(file):0kB present:5242880kB managed:5110044kB > > mlocked:24260kB dirty:0kB writeback:12kB mapped:35148kB shmem:259532kB > > slab_reclaimable:27816kB slab_unreclaimable:47524kB kernel_stack:13424kB > > pagetables:24396kB unstable:0kB bounce:0kB free_pcp:48kB local_pcp:0kB > > free_cma:0kB writeback_tmp:0kB pages_scanned:5880 all_unreclaimable? yes > > Jan 9 17:54:28 dhcp47-155 kernel: lowmem_reserve[]: 0 0 0 0 > > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB > > (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) > > 3*4096kB (M) = 15908kB > > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 DMA32: 5017*4kB (UE) 3064*8kB (UE) > > 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = > > 44580kB > > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 Normal: 10808*4kB (UM) 0*8kB > > 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = > > 43232kB > > Jan 9 17:54:28 dhcp47-155 kernel: Node 0 hugepages_total=0 hugepages_free=0 > > hugepages_surp=0 hugepages_size=2048kB > > Jan 9 17:54:28 dhcp47-155 kernel: 147527 total pagecache pages > > Jan 9 17:54:28 dhcp47-155 kernel: 49426 pages in swap cache > > Jan 9 17:54:28 dhcp47-155 kernel: Swap cache stats: add 1417257, delete > > 1367831, find 6662219/6739670 > > Jan 9 17:54:28 dhcp47-155 kernel: Free swap = 0kB > > Jan 9 17:54:28 dhcp47-155 kernel: Total swap = 2097148kB > > Jan 9 17:54:28 dhcp47-155 kernel: 2097052 pages RAM > > Jan 9 17:54:28 dhcp47-155 kernel: 0 pages HighMem/MovableOnly > > Jan 9 17:54:28 dhcp47-155 kernel: 94415 pages reserved > > Jan 9 17:54:28 dhcp47-155 kernel: [ pid ] uid tgid total_vm rss > > nr_ptes swapents oom_score_adj name > > Jan 9 17:54:28 dhcp47-155 kernel: [ 698] 0 698 13398 4048 > > 31 58 0 systemd-journal > > Jan 9 17:54:28 dhcp47-155 kernel: [ 705] 0 705 218768 813 > > 42 637 0 lvmetad > > Jan 9 17:54:28 dhcp47-155 kernel: [ 723] 0 723 11580 674 > > 23 430 -1000 systemd-udevd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1328] 0 1328 13854 217 > > 27 84 -1000 auditd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1347] 998 1347 132067 472 > > 54 1846 0 polkitd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1348] 81 1348 24617 310 > > 18 62 -900 dbus-daemon > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1352] 0 1352 6066 291 > > 15 30 0 systemd-logind > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1354] 0 1354 4827 213 > > 14 44 0 irqbalance > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1355] 0 1355 53131 264 > > 57 410 0 abrtd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1364] 997 1364 28962 313 > > 27 36 0 chronyd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1367] 32 1367 16263 183 > > 35 118 0 rpcbind > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1369] 0 1369 50303 148 > > 37 114 0 gssproxy > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1370] 0 1370 82866 4667 > > 82 1779 0 firewalld > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1734] 0 1734 28206 147 > > 53 3061 0 dhclient > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1828] 0 1828 138288 1830 > > 87 1280 0 tuned > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1832] 0 1832 28335 120 > > 11 17 0 rhsmcertd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1847] 0 1847 20617 83 > > 42 188 -1000 sshd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1962] 0 1962 22244 196 > > 42 236 0 master > > Jan 9 17:54:28 dhcp47-155 kernel: [ 1964] 89 1964 22287 211 > > 44 235 0 qmgr > > Jan 9 17:54:28 dhcp47-155 kernel: [ 2354] 0 2354 31555 246 > > 17 130 0 crond > > Jan 9 17:54:28 dhcp47-155 kernel: [ 2362] 0 2362 26978 130 > > 9 9 0 rhnsd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 2378] 0 2378 27509 165 > > 10 30 0 agetty > > Jan 9 17:54:28 dhcp47-155 kernel: [19586] 0 19586 179084 8344 > > 48 0 -1000 dmeventd > > Jan 9 17:54:28 dhcp47-155 kernel: [ 6147] 0 6147 137119 3257 > > 121 91 0 rsyslogd > > Jan 9 17:54:28 dhcp47-155 kernel: [14689] 0 14689 243755 3755 > > 91 7687 0 glusterfs > > Jan 9 17:54:28 dhcp47-155 kernel: [17037] 0 17037 172515 23542 > > 135 11287 0 glusterd > > Jan 9 17:54:28 dhcp47-155 kernel: [15978] 0 15978 263238 721 > > 106 8845 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [15998] 0 15998 263238 745 > > 105 8190 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [16018] 0 16018 263238 177 > > 104 8846 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [11397] 0 11397 314084 2379 > > 110 8227 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [11417] 0 11417 330983 2919 > > 111 8186 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [11437] 0 11437 330982 2398 > > 113 8552 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [30763] 0 30763 262627 782 > > 108 9807 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [30782] 0 30782 262626 839 > > 105 9788 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [30801] 0 30801 262626 822 > > 106 8770 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [15232] 0 15232 494829 3141 > > 131 8902 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [14072] 0 14072 49475 5891 > > 62 1220 0 corosync > > Jan 9 17:54:28 dhcp47-155 kernel: [14088] 0 14088 33158 442 > > 66 972 0 pacemakerd > > Jan 9 17:54:28 dhcp47-155 kernel: [14089] 189 14089 34571 2375 > > 67 1201 0 cib > > Jan 9 17:54:28 dhcp47-155 kernel: [14090] 0 14090 34287 1119 > > 69 364 0 stonithd > > Jan 9 17:54:28 dhcp47-155 kernel: [14091] 0 14091 26273 459 > > 52 227 0 lrmd > > Jan 9 17:54:28 dhcp47-155 kernel: [14092] 189 14092 31731 1286 > > 62 410 0 attrd > > Jan 9 17:54:28 dhcp47-155 kernel: [14093] 189 14093 39003 3989 > > 71 182 0 pengine > > Jan 9 17:54:28 dhcp47-155 kernel: [14094] 189 14094 46982 4041 > > 81 703 0 crmd > > Jan 9 17:54:28 dhcp47-155 kernel: [14099] 0 14099 244360 10766 > > 108 620 0 pcsd > > Jan 9 17:54:28 dhcp47-155 kernel: [21690] 0 21690 1008670 407551 > > 1106 55342 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [21709] 0 21709 1005223 346311 > > 1125 128491 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [21736] 0 21736 1010977 307333 > > 1124 163215 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [32553] 0 32553 151450 400 > > 66 6330 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [22781] 0 22781 625453 229961 > > 817 18368 0 glusterfs > > Jan 9 17:54:28 dhcp47-155 kernel: [ 8894] 0 8894 1480087 336954 > > 1214 0 0 ganesha.nfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [27848] 0 27848 35743 602 > > 71 0 0 sshd > > Jan 9 17:54:28 dhcp47-155 kernel: [27860] 0 27860 28879 329 > > 14 0 0 bash > > Jan 9 17:54:28 dhcp47-155 kernel: [ 9987] 89 9987 22270 472 > > 42 0 0 pickup > > Jan 9 17:54:28 dhcp47-155 kernel: [21817] 0 21817 35726 595 > > 72 0 0 sshd > > Jan 9 17:54:28 dhcp47-155 kernel: [21894] 0 21894 28879 289 > > 14 0 0 bash > > Jan 9 17:54:28 dhcp47-155 kernel: [29371] 0 29371 367307 8841 > > 116 0 0 glusterfsd > > Jan 9 17:54:28 dhcp47-155 kernel: [29745] 0 29745 39435 2870 > > 26 0 0 glusterfs > > Jan 9 17:54:28 dhcp47-155 kernel: [29746] 0 29746 205479 23309 > > 131 0 0 glusterfs > > Jan 9 17:54:28 dhcp47-155 kernel: [29753] 0 29753 39435 2869 > > 27 0 0 glusterfs > > Jan 9 17:54:28 dhcp47-155 kernel: [29754] 0 29754 111964 12874 > > 87 0 0 glusterfs > > Jan 9 17:54:28 dhcp47-155 kernel: [29771] 0 29771 28811 231 > > 13 0 0 S28Quota-root-x > > Jan 9 17:54:28 dhcp47-155 kernel: [29777] 0 29777 40032 3400 > > 28 0 0 glusterfs > > Jan 9 17:54:28 dhcp47-155 kernel: [29779] 0 29779 90002 3539 > > 45 0 0 glusterfs > > Jan 9 17:54:28 dhcp47-155 kernel: [29784] 0 29784 9807 135 > > 21 418 0 systemd-udevd > > Jan 9 17:54:28 dhcp47-155 kernel: Out of memory: Kill process 21709 > > (glusterfsd) score 182 or sacrifice child > > Jan 9 17:54:28 dhcp47-155 kernel: Killed process 21709 (glusterfsd) > > total-vm:4020892kB, anon-rss:1384648kB, file-rss:596kB, shmem-rss:0kB > > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] A processor failed, > > forming new configuration. > > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] Process pause detected > > for 2433 ms, flushing membership messages. > > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [TOTEM ] A new membership > > (10.70.46.42:2700) was formed. Members > > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [QUORUM] Members[4]: 1 2 3 4 > > Jan 9 17:54:28 dhcp47-155 corosync[14072]: [MAIN ] Completed service > > synchronization, ready to provide service. > > Jan 9 17:54:28 dhcp47-155 crmd[14094]: notice: High CPU load detected: > > 9.650000 > > Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ > > 0+0 records in ] > > Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ > > 0+0 records out ] > > Jan 9 17:54:29 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:29839:stderr [ 0 > > bytes (0 B) copied, 0.0124017 s, 0.0 kB/s ] > > Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ > > 0+0 records in ] > > Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ > > 0+0 records out ] > > Jan 9 17:54:40 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30094:stderr [ 0 > > bytes (0 B) copied, 0.0972941 s, 0.0 kB/s ] > > Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ > > 0+0 records in ] > > Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ > > 0+0 records out ] > > Jan 9 17:54:50 dhcp47-155 lrmd[14091]: notice: > > dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:30200:stderr [ 0 > > bytes (0 B) copied, 0.0104596 s, 0.0 kB/s ] > > Jan 9 17:54:57 dhcp47-155 crmd[14094]: notice: High CPU load detected: > > 8.460000 > > > > Actual results: > > =============== > > OOM kill of glusterfsd during continuous add-bricks. > > > > Expected results: > > ================= > > There should not be any OOM kills
Prasad, Some questions: * Is it possible to get statedump of the process which crashed (I assume its mount process)? * Is it just add-brick or were you also running rebalance? In other words was there a data migration and/or fix-layout being run?
There is a patch [1] which fixes leaks in inprogress/migration_check tasks. However, this codepaths get executed in mount process only if there is data migration going on files on which I/O is done. [1] http://review.gluster.org/16308
<raghuhg> Prasad, ping <Prasad> raghuhg, pong <raghuhg> on bz 1411329, <raghuhg> is there a statedump you collected? <raghuhg> was it the mount process which crashed? <Prasad> nope, its a brick process <Prasad> for the second question.. I was doing only add-bricks <raghuhg> ok. If its brick process, my comment is invalid <Prasad> hmm <raghuhg> Is it possible to collect statedumps? <raghuhg> you can possibly get statedumps of other bricks if they are still up <Prasad> yes, but the system is not in the same state and I have done few add-brick/remove-brick opertions.. will it help if I take the state dumps of brick processes now ? <raghuhg> its fine. <raghuhg> how is the memory consumption of other bricks? <raghuhg> are they too consuming too much memory? <Prasad> ok, I will collect it now <Prasad> I think so.. <raghuhg> also get mem consumption of other bricks <raghuhg> and the brick that crashed, was it a new brick or old brick? <Prasad> raghuhg, it was an old brick <raghuhg> ok. If possible also post the sequence of add/remove bricks done <raghuhg> it'll help us to figure out older bricks <raghuhg> If one brick crashed, its more likely that other old bricks also consume more memory <Prasad> yes, I have updted the add-brick sequence in the BZ, comment 4 <raghuhg> (of course based on I/O is happening on the files present on the brick) <raghuhg> Prasad, you didn't do any add/remove-brick after that? <raghuhg> Comment 4 <Prasad> nope, I hit this issue while I was adding bricks <raghuhg> ok
There seems to be a dict leak in brick process: pool-name=glusterfs:data_t hot-count=16384 cold-count=0 padded_sizeof=92 alloc-count=13191046 max-alloc=16384 pool-misses=11984577 cur-stdalloc=276176 max-stdalloc=276228 -----=----- pool-name=glusterfs:data_pair_t hot-count=481 cold-count=15903 padded_sizeof=68 alloc-count=11089148 max-alloc=724 pool-misses=0 cur-stdalloc=0 max-stdalloc=0 -----=----- pool-name=glusterfs:dict_t hot-count=4096 cold-count=0 padded_sizeof=172 alloc-count=5819264 max-alloc=4096 pool-misses=5717508 cur-stdalloc=287995 max-stdalloc=287998
Note that there were no fops in progress: pool-name=glusterfs:rpcsvc_request_t hot-count=0 cold-count=8 padded_sizeof=2820 alloc-count=0 max-alloc=0 pool-misses=0 cur-stdalloc=0 max-stdalloc=0
There seems to be some leaks of memory allocated with memdup: [protocol/server.distrep-server - usage-type gf_common_mt_memdup memusage] size=125110 num_allocs=6761 max_size=130591 max_num_allocs=6845 total_allocs=4002730 Went through server_setvolume. But couldn't find any obvious leaks. May be I should go through code once again
Marker and Upcall are worst offenders with mem-pool usage. Both accounting to a total memory of ~(23 + 47) = 70MB [features/marker.distrep-marker - usage-type gf_common_mt_mem_pool memusage] size=24820556 num_allocs=269788 max_size=24821004 max_num_allocs=269792 total_allocs=4025204 -- [features/upcall.distrep-upcall - usage-type gf_common_mt_mem_pool memusage] size=49547464 num_allocs=288157 max_size=49547636 max_num_allocs=288158 total_allocs=1435066
(In reply to Raghavendra G from comment #12) > Marker and Upcall are worst offenders with mem-pool usage. Both accounting > to a total memory of ~(23 + 47) = 70MB > > [features/upcall.distrep-upcall - usage-type gf_common_mt_mem_pool memusage] > size=49547464 > num_allocs=288157 > max_size=49547636 > max_num_allocs=288158 > total_allocs=1435066 Poornima/Soumya, Is it possible to take a look at why there is so much of memory consumption?
(In reply to Raghavendra G from comment #12) > Marker and Upcall are worst offenders with mem-pool usage. Both accounting > to a total memory of ~(23 + 47) = 70MB > > [features/marker.distrep-marker - usage-type gf_common_mt_mem_pool memusage] > size=24820556 > num_allocs=269788 > max_size=24821004 > max_num_allocs=269792 > total_allocs=4025204 Sanoj, Can you take a look at why marker is consuming so much of memory in mem-pools?
Hi, As per attached sosrepot, it seems memory footprint is high for glusterfsd and gansesha.nfsd. @Prasad Is it possible for you to share the statedump of ganesha.nfsd also. Regards Mohit Agrawal
In the code we try allocating from required mempool and if it fails, we fall back to mem of type gf_common_mt_mem_pool So if we look at which mempool seems to be full in statedump we see, From the statedump logs it looks marker gf_common_mt_asprintf upcall gf_common_mt_char gf_common_mt_asprintf used in gf_vasprintf. This function expects caller to free. Will have to investigate all the callers. So we can suspect either a mem leak or it could be frequent logging that is building up a large queue, which is drained slowly. looking further into where the logging happens in marker code path and map this with brick logs.
(In reply to Mohit Agrawal from comment #15) > Hi, > > As per attached sosrepot, it seems memory footprint is high for glusterfsd > and gansesha.nfsd. > > @Prasad > Is it possible for you to share the statedump of ganesha.nfsd also. The state dump of ganesha process is not possible at this moment.
[features/marker.distrep-marker - usage-type gf_common_mt_asprintf memusage] size=570964 num_allocs=285468 max_size=570970 max_num_allocs=285471 total_allocs=2194054 init_xtime_priv (xlator_t *this, dict_t *options) { ret = gf_asprintf (& (priv->marker_xattr), "%s.%s.%s", MARKER_XATTR_PREFIX, priv->volume_uuid, XTIME); } Note that this is unconditionally allocating to priv->marker_xattr and is called in reconfigure codepath. This is a definite leak.
All the above comments were made by referring to statedumps/server-1/bricks-brick7-b7.2428.dump.1484029391
Mentioning a point here, Before the OOM kill is seen, I had also issued rm -rf multiple times from multiple clients to clear the dataset on the mount point.
[features/upcall.distrep-upcall - usage-type gf_common_mt_mem_pool memusage] size=49547464 num_allocs=288157 max_size=49547636 max_num_allocs=288158 total_allocs=1435066 [features/upcall.distrep-upcall - usage-type gf_upcall_mt_upcall_inode_ctx_t memusage] size=18336 num_allocs=191 max_size=2038560 max_num_allocs=21235 total_allocs=271612 wrt upcall, We observed that number of inode_ctx allocations matches with the number of active+lru inodes. The only thing unknown is why there are so many mem-pool allocations. It could be expected with so many fops being sent to the server. As mentioned by Sanoj in earlier comments, maybe even in this case, its taking a while to clean up this memory. Taking statedumps at frequent intervals and analysing the memory consumption may help in determining if its indeed a leak.
Hi, Currently we have found one below code(the same is for up_fsetxxat also) path that holds dict leak but there could be other path also those are having leak . >>>>>>>>>>>>>>>>>>>>>> 0x7ff423dac373 : mem_get0+0x13/0x90 [/usr/lib64/libglusterfs.so.0.0.1] 0x7ff423d7d355 : get_new_dict_full+0x25/0x120 [/usr/lib64/libglusterfs.so.0.0.1] 0x7ff423d7dbab : dict_new+0xb/0x20 [/usr/lib64/libglusterfs.so.0.0.1] 0x7ff423d7fa0a : dict_copy_with_ref+0x3a/0xe0 [/usr/lib64/libglusterfs.so.0.0.1] 0x7ff41419733a : up_setxattr+0x3a/0x450 [/usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so] 0x7ff423e16684 : default_setxattr_resume+0x1d4/0x250 [/usr/lib64/libglusterfs.so.0.0.1] 0x7ff423da86ed : call_resume+0x7d/0xd0 [/usr/lib64/libglusterfs.so.0.0.1] 0x7ff40fdf9957 : iot_worker+0x117/0x220 [/usr/lib64/glusterfs/3.8.4/xlator/performance/io-threads.so] 0x7ff422be6dc5 : 0x7ff422be6dc5 [/usr/lib64/libpthread-2.17.so+0x7dc5/0x218000] >>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to find other path also, will send a patch after spend some more time on this. Regards Mohit Agrawal
upstream patch : http://review.gluster.org/16392
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/95350
*** Bug 1415632 has been marked as a duplicate of this bug. ***
This issue is not reproduced on glusterfs version 3.8.4-13.el7rhgs.x86_64. Steps: ===== 1) Created a ganesha cluster and created a distributed-replicate volume (4x2). 2) Enable nfs-ganesha on the volume with mdcache settings. 3) Mount the volume. 4) On the mount point, copied linux kernel package and created four empty directories say 1, 2 ,3, 4. 5) Now, simultaneously started untarring the kernel package to directories 1,2,3 and 4. tar -xvf linux-4.4.1.tar -C 1 tar -xvf linux-4.4.1.tar -C 2 tar -xvf linux-4.4.1.tar -C 3 tar -xvf linux-4.4.1.tar -C 4 6) While untar is in-progress, continuously added bricks to the volume. Initially the volume is 4x2, while untar is in-progress continuously added bricks to the volume and scaled it up to 14x2. During these add-bricks operations I Didn't see OOM kills of glusterfsd. Hence, moving this BZ to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html