Description of problem: ======================= I am running a test which does volume create/deletes of different kinds of volume on a brick mux setup. There are 2 volumes which are always constant through the cycle one a 5x4+2 ec_basevol volume and another a 2x3 distrep_basevol volume. The test which is being run through a script, creates volumes of any type ie is single brick vol, 1x3, nx3, 1x(4+2), nx(4+2), 1x(2+1)arbiter, nx(2+1) volume . It creates about 100 volumes starts them, stops them , delete them (note that another 2 vols always exist) This test ran for about 1 week, successful, without any issues, post which the ECvolume bricks got oom killed on all nodes Version-Release number of selected component (if applicable): =================== 6.0.15 How reproducible: ============= hit once after 1 week of testcycle Steps to Reproduce: ==================== 1.6 node cluster, brickmux enabled 2. 2 volumes created:1_base_ecvol->5 x (4 + 2) = 30 and 1_basevol->2x3volume 3. ran a test which keeps {creating 100volumes, start, stop ,delete } indefinitely, with sleeps in each op Actual results: ---------------- ecvol bricks oom killed on all nodes Additional info: ================== [root@dhcp35-75 ~]# gluster v info 1_base_ecvol Volume Name: 1_base_ecvol Type: Distributed-Disperse Volume ID: aa4eeab4-1012-44f7-8b0b-c13a99e31346 Status: Started Snapshot Count: 0 Number of Bricks: 5 x (4 + 2) = 30 Transport-type: tcp Bricks: Brick1: dhcp35-75.lab.eng.blr.redhat.com:/gluster/brick2/1_base_ecvol Brick2: dhcp35-194.lab.eng.blr.redhat.com:/gluster/brick2/1_base_ecvol Brick3: dhcp35-173.lab.eng.blr.redhat.com:/gluster/brick2/1_base_ecvol Brick4: dhcp35-108.lab.eng.blr.redhat.com:/gluster/brick2/1_base_ecvol Brick5: dhcp35-42.lab.eng.blr.redhat.com:/gluster/brick2/1_base_ecvol Brick6: dhcp35-182.lab.eng.blr.redhat.com:/gluster/brick2/1_base_ecvol Brick7: dhcp35-75.lab.eng.blr.redhat.com:/gluster/brick3/1_base_ecvol Brick8: dhcp35-194.lab.eng.blr.redhat.com:/gluster/brick3/1_base_ecvol Brick9: dhcp35-173.lab.eng.blr.redhat.com:/gluster/brick3/1_base_ecvol Brick10: dhcp35-108.lab.eng.blr.redhat.com:/gluster/brick3/1_base_ecvol Brick11: dhcp35-42.lab.eng.blr.redhat.com:/gluster/brick3/1_base_ecvol Brick12: dhcp35-182.lab.eng.blr.redhat.com:/gluster/brick3/1_base_ecvol Brick13: dhcp35-75.lab.eng.blr.redhat.com:/gluster/brick4/1_base_ecvol Brick14: dhcp35-194.lab.eng.blr.redhat.com:/gluster/brick4/1_base_ecvol Brick15: dhcp35-173.lab.eng.blr.redhat.com:/gluster/brick4/1_base_ecvol Brick16: dhcp35-108.lab.eng.blr.redhat.com:/gluster/brick4/1_base_ecvol Brick17: dhcp35-42.lab.eng.blr.redhat.com:/gluster/brick4/1_base_ecvol Brick18: dhcp35-182.lab.eng.blr.redhat.com:/gluster/brick4/1_base_ecvol Brick19: dhcp35-75.lab.eng.blr.redhat.com:/gluster/brick5/1_base_ecvol Brick20: dhcp35-194.lab.eng.blr.redhat.com:/gluster/brick5/1_base_ecvol Brick21: dhcp35-173.lab.eng.blr.redhat.com:/gluster/brick5/1_base_ecvol Brick22: dhcp35-108.lab.eng.blr.redhat.com:/gluster/brick5/1_base_ecvol Brick23: dhcp35-42.lab.eng.blr.redhat.com:/gluster/brick5/1_base_ecvol Brick24: dhcp35-182.lab.eng.blr.redhat.com:/gluster/brick5/1_base_ecvol Brick25: dhcp35-75.lab.eng.blr.redhat.com:/gluster/brick6/1_base_ecvol Brick26: dhcp35-194.lab.eng.blr.redhat.com:/gluster/brick6/1_base_ecvol Brick27: dhcp35-173.lab.eng.blr.redhat.com:/gluster/brick6/1_base_ecvol Brick28: dhcp35-108.lab.eng.blr.redhat.com:/gluster/brick6/1_base_ecvol Brick29: dhcp35-42.lab.eng.blr.redhat.com:/gluster/brick6/1_base_ecvol Brick30: dhcp35-182.lab.eng.blr.redhat.com:/gluster/brick6/1_base_ecvol Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on cluster.brick-multiplex: enable [root@dhcp35-75 ~]# gluster v info 1_basevol Volume Name: 1_basevol Type: Distributed-Replicate Volume ID: 8f5d0079-6a4e-4109-b1e3-8ca420aa01e5 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: dhcp35-75.lab.eng.blr.redhat.com:/gluster/brick1/1_basevol Brick2: dhcp35-194.lab.eng.blr.redhat.com:/gluster/brick1/1_basevol Brick3: dhcp35-173.lab.eng.blr.redhat.com:/gluster/brick1/1_basevol Brick4: dhcp35-108.lab.eng.blr.redhat.com:/gluster/brick1/1_basevol Brick5: dhcp35-42.lab.eng.blr.redhat.com:/gluster/brick1/1_basevol Brick6: dhcp35-182.lab.eng.blr.redhat.com:/gluster/brick1/1_basevol Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off cluster.brick-multiplex: enable [root@dhcp35-75 ~]# rpm -qa|grep gluster glusterfs-6.0-15.el7rhgs.x86_64 glusterfs-server-6.0-15.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-client-xlators-6.0-15.el7rhgs.x86_64 glusterfs-geo-replication-6.0-15.el7rhgs.x86_64 glusterfs-events-6.0-15.el7rhgs.x86_64 glusterfs-rdma-6.0-15.el7rhgs.x86_64 glusterfs-libs-6.0-15.el7rhgs.x86_64 glusterfs-api-6.0-15.el7rhgs.x86_64 glusterfs-fuse-6.0-15.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-23.el7_7.1.x86_64 vdsm-gluster-4.30.18-1.0.el7rhgs.x86_64 glusterfs-cli-6.0-15.el7rhgs.x86_64 python2-gluster-6.0-15.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 [Fri Oct 11 21:54:08 2019] CPU: 5 PID: 24895 Comm: glfs_epoll00f Kdump: loaded Not tainted 3.10.0-1062.1.2.el7.x86_64 #1 [Fri Oct 11 21:54:08 2019] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [Fri Oct 11 21:54:08 2019] Call Trace: [Fri Oct 11 21:54:08 2019] [<ffffffffadb792c2>] dump_stack+0x19/0x1b [Fri Oct 11 21:54:08 2019] [<ffffffffadb73c64>] dump_header+0x90/0x229 [Fri Oct 11 21:54:08 2019] [<ffffffffad70825b>] ? cred_has_capability+0x6b/0x120 [Fri Oct 11 21:54:08 2019] [<ffffffffadb86fed>] ? do_async_page_fault+0x6d/0xf0 [Fri Oct 11 21:54:08 2019] [<ffffffffad5bfd74>] oom_kill_process+0x254/0x3e0 [Fri Oct 11 21:54:08 2019] [<ffffffffad70833e>] ? selinux_capable+0x2e/0x40 [Fri Oct 11 21:54:08 2019] [<ffffffffad5c05c6>] out_of_memory+0x4b6/0x4f0 [Fri Oct 11 21:54:08 2019] [<ffffffffadb7477c>] __alloc_pages_slowpath+0x5d6/0x724 [Fri Oct 11 21:54:08 2019] [<ffffffffad5c6b84>] __alloc_pages_nodemask+0x404/0x420 [Fri Oct 11 21:54:08 2019] [<ffffffffad618105>] alloc_pages_vma+0xb5/0x200 [Fri Oct 11 21:54:08 2019] [<ffffffffad5e5407>] ? anon_vma_interval_tree_insert+0x97/0xa0 [Fri Oct 11 21:54:08 2019] [<ffffffffad5efa54>] handle_pte_fault+0x984/0xe20 [Fri Oct 11 21:54:08 2019] [<ffffffffad5f200d>] handle_mm_fault+0x39d/0x9b0 [Fri Oct 11 21:54:08 2019] [<ffffffffadb87653>] __do_page_fault+0x213/0x500 [Fri Oct 11 21:54:08 2019] [<ffffffffadb87a26>] trace_do_page_fault+0x56/0x150 [Fri Oct 11 21:54:08 2019] [<ffffffffadb86fa2>] do_async_page_fault+0x22/0xf0 [Fri Oct 11 21:54:08 2019] [<ffffffffadb837a8>] async_page_fault+0x28/0x30 [Fri Oct 11 21:54:08 2019] Mem-Info: [Fri Oct 11 21:54:08 2019] active_anon:1503183 inactive_anon:293959 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:14123 dirty:8 writeback:0 unstable:0 slab_reclaimable:23017 slab_unreclaimable:58554 mapped:6476 shmem:3411 pagetables:34129 bounce:0 free:25763 free_pcp:4041 free_cma:0 [Fri Oct 11 21:54:08 2019] Node 0 DMA free:15908kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [Fri Oct 11 21:54:08 2019] lowmem_reserve[]: 0 2812 7799 7799 [Fri Oct 11 21:54:08 2019] Node 0 DMA32 free:44028kB min:24320kB low:30400kB high:36480kB active_anon:2063388kB inactive_anon:517544kB active_file:0kB inactive_file:0kB unevictable:23496kB isolated(anon):0kB isolated(file):0kB present:3129328kB managed:2882956kB mlocked:23496kB dirty:0kB writeback:0kB mapped:8048kB shmem:4612kB slab_reclaimable:37468kB slab_unreclaimable:89748kB kernel_stack:15600kB pagetables:48172kB unstable:0kB bounce:0kB free_pcp:6844kB local_pcp:616kB free_cma:0kB writeback_tmp:0kB pages_scanned:14 all_unreclaimable? yes [Fri Oct 11 21:54:08 2019] lowmem_reserve[]: 0 0 4987 4987 [Fri Oct 11 21:54:08 2019] Node 0 Normal free:43116kB min:43128kB low:53908kB high:64692kB active_anon:3949344kB inactive_anon:658292kB active_file:0kB inactive_file:0kB unevictable:32996kB isolated(anon):0kB isolated(file):0kB present:5242880kB managed:5106928kB mlocked:32996kB dirty:32kB writeback:0kB mapped:17856kB shmem:9032kB slab_reclaimable:54600kB slab_unreclaimable:144468kB kernel_stack:8848kB pagetables:88344kB unstable:0kB bounce:0kB free_pcp:9320kB local_pcp:684kB free_cma:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? yes [Fri Oct 11 21:54:08 2019] lowmem_reserve[]: 0 0 0 0 [Fri Oct 11 21:54:08 2019] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB [Fri Oct 11 21:54:08 2019] Node 0 DMA32: 4563*4kB (UE) 1430*8kB (UE) 101*16kB (UE) 113*32kB (UEM) 89*64kB (UEM) 19*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43052kB [Fri Oct 11 21:54:08 2019] Node 0 Normal: 10234*4kB (UEM) 2*8kB (E) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 40952kB [Fri Oct 11 21:54:08 2019] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [Fri Oct 11 21:54:08 2019] 19333 total pagecache pages [Fri Oct 11 21:54:08 2019] 12920 pages in swap cache [Fri Oct 11 21:54:08 2019] Swap cache stats: add 2933098, delete 2920182, find 1907978/2049507 [Fri Oct 11 21:54:08 2019] Free swap = 0kB [Fri Oct 11 21:54:08 2019] Total swap = 8257532kB [Fri Oct 11 21:54:08 2019] 2097050 pages RAM [Fri Oct 11 21:54:08 2019] 0 pages HighMem/MovableOnly [Fri Oct 11 21:54:08 2019] 95602 pages reserved [Fri Oct 11 21:54:08 2019] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [Fri Oct 11 21:54:08 2019] [ 1093] 0 1093 13867 5143 32 51 0 systemd-journal [Fri Oct 11 21:54:08 2019] [ 1128] 0 1128 9608 634 24 621 -1000 systemd-udevd [Fri Oct 11 21:54:08 2019] [ 1170] 0 1170 80723 1928 39 0 -1000 multipathd [Fri Oct 11 21:54:08 2019] [ 1536] 0 1536 263111 12924 67 0 -1000 dmeventd [Fri Oct 11 21:54:08 2019] [ 2197] 0 2197 13882 106 29 87 -1000 auditd [Fri Oct 11 21:54:08 2019] [ 2224] 0 2224 6620 280 18 43 0 systemd-logind [Fri Oct 11 21:54:08 2019] [ 2228] 0 2228 4225 175 14 44 0 alsactl [Fri Oct 11 21:54:08 2019] [ 2229] 999 2229 153245 356 61 1673 0 polkitd [Fri Oct 11 21:54:08 2019] [ 2238] 0 2238 1097 97 8 34 0 acpid [Fri Oct 11 21:54:08 2019] [ 2239] 0 2239 13224 277 30 181 0 smartd [Fri Oct 11 21:54:08 2019] [ 2240] 0 2240 57069 272 62 483 0 abrtd [Fri Oct 11 21:54:08 2019] [ 2245] 0 2245 5408 233 16 41 0 irqbalance [Fri Oct 11 21:54:08 2019] [ 2248] 81 2248 16644 377 34 102 -900 dbus-daemon [Fri Oct 11 21:54:08 2019] [ 2259] 0 2259 22641 161 47 227 0 rngd [Fri Oct 11 21:54:08 2019] [ 2261] 0 2261 56438 247 61 360 0 abrt-watch-log [Fri Oct 11 21:54:08 2019] [ 2276] 32 2276 17319 165 37 106 0 rpcbind [Fri Oct 11 21:54:08 2019] [ 2281] 998 2281 29451 236 28 87 0 chronyd [Fri Oct 11 21:54:08 2019] [ 2283] 0 2283 90658 534 99 6418 0 firewalld [Fri Oct 11 21:54:08 2019] [ 2290] 0 2290 48776 116 35 130 0 gssproxy [Fri Oct 11 21:54:08 2019] [ 2303] 0 2303 156444 497 91 267 0 NetworkManager [Fri Oct 11 21:54:08 2019] [ 2459] 0 2459 25724 324 52 457 0 dhclient [Fri Oct 11 21:54:08 2019] [ 2672] 0 2672 28230 270 57 256 -1000 sshd [Fri Oct 11 21:54:08 2019] [ 2678] 0 2678 146587 401 101 3235 0 tuned [Fri Oct 11 21:54:08 2019] [ 2681] 0 2681 28928 121 13 16 0 rhsmcertd [Fri Oct 11 21:54:08 2019] [ 2704] 0 2704 251168 749 153 1876 0 libvirtd [Fri Oct 11 21:54:08 2019] [ 2716] 0 2716 6477 143 18 48 0 atd [Fri Oct 11 21:54:08 2019] [ 2719] 0 2719 27527 164 10 33 0 agetty [Fri Oct 11 21:54:08 2019] [ 2922] 0 2922 22424 179 44 239 0 master [Fri Oct 11 21:54:08 2019] [ 2938] 89 2938 22494 254 44 235 0 qmgr [Fri Oct 11 21:54:08 2019] [ 3029] 0 3029 26993 136 10 6 0 rhnsd [Fri Oct 11 21:54:08 2019] [17378] 0 17378 39195 345 81 319 0 sshd [Fri Oct 11 21:54:08 2019] [17382] 0 17382 28913 329 13 43 0 bash [Fri Oct 11 21:54:08 2019] [18086] 0 18086 31573 247 19 130 0 crond [Fri Oct 11 21:54:08 2019] [18089] 0 18089 64863 2497 56 154 0 rsyslogd [Fri Oct 11 21:54:08 2019] [18097] 994 18097 11224 165 26 178 0 nrpe [Fri Oct 11 21:54:08 2019] [18762] 0 18762 155079 18561 121 15677 0 glusterd [Fri Oct 11 21:54:08 2019] [18952] 0 18952 39195 357 81 310 0 sshd [Fri Oct 11 21:54:08 2019] [18956] 0 18956 28913 329 13 44 0 bash [Fri Oct 11 21:54:08 2019] [19007] 0 19007 32009 183 19 117 0 screen [Fri Oct 11 21:54:08 2019] [19008] 0 19008 29087 466 13 86 0 bash [Fri Oct 11 21:54:08 2019] [19123] 0 19123 31593224 903706 17115 1063764 0 glusterfsd [Fri Oct 11 21:54:08 2019] [19252] 0 19252 27399980 814989 14788 953595 0 glusterfsd [Fri Oct 11 21:54:08 2019] [21285] 0 21285 32009 167 16 122 0 screen [Fri Oct 11 21:54:08 2019] [21286] 0 21286 28922 237 13 144 0 bash [Fri Oct 11 21:54:08 2019] [26598] 0 26598 33608 264 21 4657 0 bash [Fri Oct 11 21:54:08 2019] [ 6036] 89 6036 22450 441 45 0 0 pickup [Fri Oct 11 21:54:08 2019] [ 1124] 0 1124 26989 74 9 0 0 sleep [Fri Oct 11 21:54:08 2019] [ 3568] 0 3568 2384508 39899 449 0 0 glusterfs [Fri Oct 11 21:54:08 2019] [ 3643] 0 3643 26989 74 10 0 0 sleep [Fri Oct 11 21:54:08 2019] Out of memory: Kill process 19123 (glusterfsd) score 473 or sacrifice child [Fri Oct 11 21:54:08 2019] Killed process 19123 (glusterfsd), UID 0, total-vm:126372896kB, anon-rss:3614824kB, file-rss:0kB, shmem-rss:0kB
*** This bug has been marked as a duplicate of bug 1790336 ***