Description of problem: During sequential write iozone runs I am experiencing an OOM: Apr 2 12:58:56 gqas014 kernel: glusterfs invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Apr 2 12:58:56 gqas014 kernel: glusterfs cpuset=/ mems_allowed=0-1 Apr 2 12:58:56 gqas014 kernel: Pid: 11243, comm: glusterfs Not tainted 2.6.32-431.el6.x86_64 #1 Apr 2 12:58:56 gqas014 kernel: Call Trace: Apr 2 12:58:56 gqas014 kernel: [<ffffffff810d05b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 Apr 2 12:58:56 gqas014 kernel: [<ffffffff81122960>] ? dump_header+0x90/0x1b0 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8122798c>] ? security_real_capable_noaudit+0x3c/0x70 Apr 2 12:58:56 gqas014 kernel: [<ffffffff81122de2>] ? oom_kill_process+0x82/0x2a0 Apr 2 12:58:56 gqas014 kernel: [<ffffffff81122d21>] ? select_bad_process+0xe1/0x120 Apr 2 12:58:56 gqas014 kernel: [<ffffffff81123220>] ? out_of_memory+0x220/0x3c0 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8112fb3c>] ? __alloc_pages_nodemask+0x8ac/0x8d0 Apr 2 12:58:56 gqas014 kernel: [<ffffffff81167a9a>] ? alloc_pages_current+0xaa/0x110 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8111fd57>] ? __page_cache_alloc+0x87/0x90 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8111f73e>] ? find_get_page+0x1e/0xa0 Apr 2 12:58:56 gqas014 kernel: [<ffffffff81120cf7>] ? filemap_fault+0x1a7/0x500 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8114a084>] ? __do_fault+0x54/0x530 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8114a657>] ? handle_pte_fault+0xf7/0xb00 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8144a040>] ? sock_aio_write+0x0/0x1c0 Apr 2 12:58:56 gqas014 kernel: [<ffffffff81188b3b>] ? do_sync_readv_writev+0xfb/0x140 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8100bc2e>] ? invalidate_interrupt1+0xe/0x20 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8114b28a>] ? handle_mm_fault+0x22a/0x300 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480 Apr 2 12:58:56 gqas014 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0 Apr 2 12:58:56 gqas014 kernel: [<ffffffff8152a815>] ? page_fault+0x25/0x30 Version-Release number of selected component (if applicable): Client - glusterfs-3.5qa2-0.304.git0c1d78f.el6_4.x86_64 Server - glusterfs-3.5qa2-0.294.git00802b3.el6rhs.x86_64 How reproducible: This happened on my first run wit h3.0 bits, I have only ran once though. Steps to Reproduce: 1. Build a 2x2 volume across 4 servers with 10G interfaces. 2. Run IOzone in distributed mode across 4 clients. 3. Actual results: OOM killer is invoked. Expected results: IOZone completes successfully. Additional info:
Obversations: -I am running across 4 servers, 3 clients with 4 writter threads per client. The iozone command was: -I can't repro the OOM with just one iozone run, although I suppose if I created a big enough file it would work. I have just been running the same command several times. The leak only presents itself client side, maybe there is a problem in the io cache xlator? I'll try a run with it disabled. -Here is a sysrq +m with the client 20 GB into swap: Apr 7 13:19:47 gqas014 kernel: SysRq : Show Memory Apr 7 13:19:47 gqas014 kernel: Mem-Info: Apr 7 13:19:47 gqas014 kernel: Node 0 Normal per-cpu: Apr 7 13:19:47 gqas014 kernel: CPU 0: hi: 186, btch: 31 usd: 39 Apr 7 13:19:47 gqas014 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 2: hi: 186, btch: 31 usd: 144 Apr 7 13:19:47 gqas014 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 4: hi: 186, btch: 31 usd: 125 Apr 7 13:19:47 gqas014 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 6: hi: 186, btch: 31 usd: 37 Apr 7 13:19:47 gqas014 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 8: hi: 186, btch: 31 usd: 157 Apr 7 13:19:47 gqas014 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 12: hi: 186, btch: 31 usd: 77 Apr 7 13:19:47 gqas014 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 14: hi: 186, btch: 31 usd: 42 Apr 7 13:19:47 gqas014 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 16: hi: 186, btch: 31 usd: 65 Apr 7 13:19:47 gqas014 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 18: hi: 186, btch: 31 usd: 30 Apr 7 13:19:47 gqas014 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 21: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 22: hi: 186, btch: 31 usd: 157 Apr 7 13:19:47 gqas014 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: Node 1 DMA per-cpu: Apr 7 13:19:47 gqas014 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 12: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 14: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 15: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 16: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 17: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 18: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 19: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 20: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 21: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 22: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 23: hi: 0, btch: 1 usd: 0 Apr 7 13:19:47 gqas014 kernel: Node 1 DMA32 per-cpu: Apr 7 13:19:47 gqas014 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 1: hi: 186, btch: 31 usd: 161 Apr 7 13:19:47 gqas014 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 3: hi: 186, btch: 31 usd: 4 Apr 7 13:19:47 gqas014 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 13: hi: 186, btch: 31 usd: 4 Apr 7 13:19:47 gqas014 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 15: hi: 186, btch: 31 usd: 30 Apr 7 13:19:47 gqas014 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 21: hi: 186, btch: 31 usd: 64 Apr 7 13:19:47 gqas014 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: Node 1 Normal per-cpu: Apr 7 13:19:47 gqas014 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 1: hi: 186, btch: 31 usd: 168 Apr 7 13:19:47 gqas014 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 3: hi: 186, btch: 31 usd: 173 Apr 7 13:19:47 gqas014 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 5: hi: 186, btch: 31 usd: 77 Apr 7 13:19:47 gqas014 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 13: hi: 186, btch: 31 usd: 47 Apr 7 13:19:47 gqas014 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 15: hi: 186, btch: 31 usd: 77 Apr 7 13:19:47 gqas014 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 17: hi: 186, btch: 31 usd: 41 Apr 7 13:19:47 gqas014 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 19: hi: 186, btch: 31 usd: 61 Apr 7 13:19:47 gqas014 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 21: hi: 186, btch: 31 usd: 177 Apr 7 13:19:47 gqas014 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Apr 7 13:19:47 gqas014 kernel: active_anon:11172347 inactive_anon:894631 isolated_anon:0 Apr 7 13:19:47 gqas014 kernel: active_file:20569 inactive_file:42997 isolated_file:0 Apr 7 13:19:47 gqas014 kernel: unevictable:0 dirty:1 writeback:0 unstable:0 Apr 7 13:19:47 gqas014 kernel: free:59949 slab_reclaimable:13992 slab_unreclaimable:10548 Apr 7 13:19:47 gqas014 kernel: mapped:921 shmem:8 pagetables:37535 bounce:0 Apr 7 13:19:47 gqas014 kernel: Node 0 Normal free:62488kB min:45076kB low:56344kB high:67612kB active_anon:22763808kB inactive_anon:1560688kB active_file:36956kB inactive_file:80716kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:24821760kB mlocked:0kB dirty:0kB writeback:0kB mapped:1872kB shmem:16kB slab_reclaimable:20900kB slab_unreclaimable:22096kB kernel_stack:2936kB pagetables:78724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Apr 7 13:19:47 gqas014 kernel: lowmem_reserve[]: 0 0 0 0 Apr 7 13:19:47 gqas014 kernel: Node 1 DMA free:15740kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Apr 7 13:19:47 gqas014 kernel: lowmem_reserve[]: 0 3243 24201 24201 Apr 7 13:19:47 gqas014 kernel: Node 1 DMA32 free:99264kB min:6032kB low:7540kB high:9048kB active_anon:2067080kB inactive_anon:597364kB active_file:8508kB inactive_file:21804kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3321540kB mlocked:0kB dirty:4kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:12688kB slab_unreclaimable:168kB kernel_stack:0kB pagetables:6396kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Apr 7 13:19:47 gqas014 kernel: lowmem_reserve[]: 0 0 20957 20957 Apr 7 13:19:47 gqas014 kernel: Node 1 Normal free:62304kB min:38972kB low:48712kB high:58456kB active_anon:19858500kB inactive_anon:1420472kB active_file:36812kB inactive_file:69468kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:21460480kB mlocked:0kB dirty:0kB writeback:0kB mapped:1804kB shmem:16kB slab_reclaimable:22380kB slab_unreclaimable:19928kB kernel_stack:520kB pagetables:65020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Apr 7 13:19:47 gqas014 kernel: lowmem_reserve[]: 0 0 0 0 Apr 7 13:19:47 gqas014 kernel: Node 0 Normal: 1322*4kB 432*8kB 111*16kB 38*32kB 25*64kB 10*128kB 3*256kB 22*512kB 35*1024kB 0*2048kB 0*4096kB = 62488kB Apr 7 13:19:47 gqas014 kernel: Node 1 DMA: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15740kB Apr 7 13:19:47 gqas014 kernel: Node 1 DMA32: 488*4kB 224*8kB 90*16kB 14*32kB 11*64kB 52*128kB 111*256kB 57*512kB 16*1024kB 4*2048kB 1*4096kB = 99264kB Apr 7 13:19:47 gqas014 kernel: Node 1 Normal: 510*4kB 605*8kB 168*16kB 104*32kB 54*64kB 17*128kB 15*256kB 28*512kB 9*1024kB 4*2048kB 2*4096kB = 62304kB Apr 7 13:19:47 gqas014 kernel: 79286 total pagecache pages Apr 7 13:19:47 gqas014 kernel: 15657 pages in swap cache Apr 7 13:19:47 gqas014 kernel: Swap cache stats: add 5142086, delete 5126429, find 418/582 Apr 7 13:19:47 gqas014 kernel: Free swap = 4210248kB Apr 7 13:19:47 gqas014 kernel: Total swap = 24772600kB Apr 7 13:19:47 gqas014 kernel: 12582911 pages RAM Apr 7 13:19:47 gqas014 kernel: 230576 pages reserved Apr 7 13:19:47 gqas014 kernel: 56907 pages shared Apr 7 13:19:47 gqas014 kernel: 12232719 pages non-shared -Messages from the OOM itself: Apr 7 13:37:45 gqas014 kernel: glusterfs invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Apr 7 13:37:45 gqas014 kernel: glusterfs cpuset=/ mems_allowed=0-1 Apr 7 13:37:45 gqas014 kernel: Pid: 11053, comm: glusterfs Not tainted 2.6.32-431.el6.x86_64 #1 Apr 7 13:37:45 gqas014 kernel: Call Trace: Apr 7 13:37:45 gqas014 kernel: [<ffffffff810d05b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 Apr 7 13:37:45 gqas014 kernel: [<ffffffff81122960>] ? dump_header+0x90/0x1b0 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8122798c>] ? security_real_capable_noaudit+0x3c/0x70 Apr 7 13:37:45 gqas014 kernel: [<ffffffff81122de2>] ? oom_kill_process+0x82/0x2a0 Apr 7 13:37:45 gqas014 kernel: [<ffffffff81122d21>] ? select_bad_process+0xe1/0x120 Apr 7 13:37:45 gqas014 kernel: [<ffffffff81123220>] ? out_of_memory+0x220/0x3c0 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8112fb3c>] ? __alloc_pages_nodemask+0x8ac/0x8d0 Apr 7 13:37:45 gqas014 kernel: [<ffffffff81167a9a>] ? alloc_pages_current+0xaa/0x110 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8111fd57>] ? __page_cache_alloc+0x87/0x90 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8111f73e>] ? find_get_page+0x1e/0xa0 Apr 7 13:37:45 gqas014 kernel: [<ffffffff81120cf7>] ? filemap_fault+0x1a7/0x500 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8114a084>] ? __do_fault+0x54/0x530 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8114a657>] ? handle_pte_fault+0xf7/0xb00 Apr 7 13:37:45 gqas014 kernel: [<ffffffff814469f0>] ? sock_aio_read+0x0/0x1b0 Apr 7 13:37:45 gqas014 kernel: [<ffffffff810af1ce>] ? futex_wake+0x10e/0x120 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8114b28a>] ? handle_mm_fault+0x22a/0x300 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480 Apr 7 13:37:45 gqas014 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0 Apr 7 13:37:45 gqas014 kernel: [<ffffffff8152a815>] ? page_fault+0x25/0x30 Apr 7 13:37:45 gqas014 kernel: Mem-Info: Apr 7 13:37:45 gqas014 kernel: Node 0 Normal per-cpu: Apr 7 13:37:45 gqas014 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 21: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: Node 1 DMA per-cpu: Apr 7 13:37:45 gqas014 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 12: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 14: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 15: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 16: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 17: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 18: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 19: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 20: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 21: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 22: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 23: hi: 0, btch: 1 usd: 0 Apr 7 13:37:45 gqas014 kernel: Node 1 DMA32 per-cpu: Apr 7 13:37:45 gqas014 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 1: hi: 186, btch: 31 usd: 30 Apr 7 13:37:45 gqas014 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 13: hi: 186, btch: 31 usd: 30 Apr 7 13:37:45 gqas014 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 21: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: Node 1 Normal per-cpu: Apr 7 13:37:45 gqas014 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 21: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Apr 7 13:37:45 gqas014 kernel: active_anon:11229878 inactive_anon:921071 isolated_anon:0 Apr 7 13:37:45 gqas014 kernel: active_file:88 inactive_file:0 isolated_file:0 Apr 7 13:37:45 gqas014 kernel: unevictable:0 dirty:0 writeback:0 unstable:0 Apr 7 13:37:45 gqas014 kernel: free:47320 slab_reclaimable:2742 slab_unreclaimable:11296 Apr 7 13:37:45 gqas014 kernel: mapped:73 shmem:8 pagetables:41413 bounce:0 Apr 7 13:37:45 gqas014 kernel: Node 0 Normal free:44840kB min:45076kB low:56344kB high:67612kB active_anon:22886032kB inactive_anon:1583132kB active_file:84kB inactive_file:268kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:24821760kB mlocked:0kB dirty:0kB writeback:0kB mapped:92kB shmem:16kB slab_reclaimable:4860kB slab_unreclaimable:23768kB kernel_stack:2992kB pagetables:85800kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:532 all_unreclaimable? yes Apr 7 13:37:45 gqas014 kernel: lowmem_reserve[]: 0 0 0 0 Apr 7 13:37:45 gqas014 kernel: Node 1 DMA free:15740kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Apr 7 13:37:45 gqas014 kernel: lowmem_reserve[]: 0 3243 24201 24201 Apr 7 13:37:45 gqas014 kernel: Node 1 DMA32 free:89828kB min:6032kB low:7540kB high:9048kB active_anon:2072780kB inactive_anon:643676kB active_file:28kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3321540kB mlocked:0kB dirty:0kB writeback:0kB mapped:20kB shmem:0kB slab_reclaimable:232kB slab_unreclaimable:144kB kernel_stack:0kB pagetables:7364kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:46 all_unreclaimable? yes Apr 7 13:37:45 gqas014 kernel: lowmem_reserve[]: 0 0 20957 20957 Apr 7 13:37:45 gqas014 kernel: Node 1 Normal free:38872kB min:38972kB low:48712kB high:58456kB active_anon:19960700kB inactive_anon:1457476kB active_file:240kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:21460480kB mlocked:0kB dirty:0kB writeback:0kB mapped:180kB shmem:16kB slab_reclaimable:5876kB slab_unreclaimable:21272kB kernel_stack:504kB pagetables:72488kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1176 all_unreclaimable? yes Apr 7 13:37:45 gqas014 kernel: lowmem_reserve[]: 0 0 0 0 Apr 7 13:37:45 gqas014 kernel: Node 0 Normal: 449*4kB 244*8kB 171*16kB 120*32kB 61*64kB 23*128kB 8*256kB 14*512kB 19*1024kB 0*2048kB 0*4096kB = 45844kB Apr 7 13:37:45 gqas014 kernel: Node 1 DMA: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15740kB Apr 7 13:37:45 gqas014 kernel: Node 1 DMA32: 70*4kB 57*8kB 129*16kB 56*32kB 36*64kB 20*128kB 144*256kB 57*512kB 10*1024kB 2*2048kB 0*4096kB = 89840kB Apr 7 13:37:45 gqas014 kernel: Node 1 Normal: 722*4kB 456*8kB 251*16kB 118*32kB 55*64kB 30*128kB 15*256kB 9*512kB 10*1024kB 0*2048kB 0*4096kB = 40376kB Apr 7 13:37:45 gqas014 kernel: 7325 total pagecache pages Apr 7 13:37:45 gqas014 kernel: 6919 pages in swap cache Apr 7 13:37:45 gqas014 kernel: Swap cache stats: add 6204047, delete 6197128, find 680/913 Apr 7 13:37:45 gqas014 kernel: Free swap = 0kB Apr 7 13:37:45 gqas014 kernel: Total swap = 24772600kB Apr 7 13:37:45 gqas014 kernel: 12582911 pages RAM Apr 7 13:37:45 gqas014 kernel: 230576 pages reserved Apr 7 13:37:45 gqas014 kernel: 17037 pages shared Apr 7 13:37:45 gqas014 kernel: 12282667 pages non-shared Apr 7 13:37:45 gqas014 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name Apr 7 13:37:45 gqas014 kernel: [ 697] 0 697 2727 11 2 -17 -1000 udevd Apr 7 13:37:45 gqas014 kernel: [ 1762] 0 1762 6910 31 15 -17 -1000 auditd Apr 7 13:37:45 gqas014 kernel: [ 1787] 0 1787 62271 121 0 0 0 rsyslogd Apr 7 13:37:45 gqas014 kernel: [ 1816] 0 1816 2735 48 12 0 0 irqbalance Apr 7 13:37:45 gqas014 kernel: [ 1830] 32 1830 4744 16 12 0 0 rpcbind Apr 7 13:37:45 gqas014 kernel: [ 1848] 29 1848 5837 15 0 0 0 rpc.statd Apr 7 13:37:45 gqas014 kernel: [ 5711] 81 5711 5351 9 13 0 0 dbus-daemon Apr 7 13:37:45 gqas014 kernel: [ 5727] 0 5727 47333 2 0 0 0 cupsd Apr 7 13:37:45 gqas014 kernel: [ 5752] 0 5752 1020 2 0 0 0 acpid Apr 7 13:37:45 gqas014 kernel: [ 5761] 68 5761 9520 153 0 0 0 hald Apr 7 13:37:45 gqas014 kernel: [ 5762] 0 5762 5081 1 4 0 0 hald-runner Apr 7 13:37:45 gqas014 kernel: [ 5792] 0 5792 5611 1 3 0 0 hald-addon-inpu Apr 7 13:37:45 gqas014 kernel: [ 5809] 68 5809 4483 3 4 0 0 hald-addon-acpi Apr 7 13:37:45 gqas014 kernel: [ 5836] 0 5836 16651 27 14 -17 -1000 sshd Apr 7 13:37:45 gqas014 kernel: [ 5882] 0 5882 23357 59 1 0 0 sendmail Apr 7 13:37:45 gqas014 kernel: [ 5890] 51 5890 20175 10 15 0 0 sendmail Apr 7 13:37:45 gqas014 kernel: [ 5913] 0 5913 27580 1 0 0 0 abrtd Apr 7 13:37:45 gqas014 kernel: [ 5921] 0 5921 29325 24 0 0 0 crond Apr 7 13:37:45 gqas014 kernel: [ 5932] 0 5932 5385 0 0 0 0 atd Apr 7 13:37:45 gqas014 kernel: [ 5947] 0 5947 26004 2 2 0 0 rhsmcertd Apr 7 13:37:45 gqas014 kernel: [ 5959] 0 5959 60028 3 1 0 0 beah-srv Apr 7 13:37:45 gqas014 kernel: [ 5984] 0 5984 81868 892 13 0 0 beah-beaker-bac Apr 7 13:37:45 gqas014 kernel: [ 5996] 0 5996 132094 8 13 0 0 beah-fwd-backen Apr 7 13:37:45 gqas014 kernel: [ 8550] 38 8550 7679 43 12 0 0 ntpd Apr 7 13:37:45 gqas014 kernel: [ 8552] 0 8552 2280 2 14 0 0 dhclient Apr 7 13:37:45 gqas014 kernel: [11053] 0 11053 20650967 12121336 0 0 0 glusterfs Apr 7 13:37:45 gqas014 kernel: [11271] 0 11271 5545 25 11 0 0 xinetd Apr 7 13:37:45 gqas014 kernel: [11363] 0 11363 37528 294 12 0 0 beah-rhts-task Apr 7 13:37:45 gqas014 kernel: [11560] 0 11560 1016 1 0 0 0 mingetty Apr 7 13:37:45 gqas014 kernel: [11562] 0 11562 1016 3 2 0 0 mingetty Apr 7 13:37:45 gqas014 kernel: [11564] 0 11564 1016 1 16 0 0 mingetty Apr 7 13:37:45 gqas014 kernel: [11566] 0 11566 1016 1 4 0 0 mingetty Apr 7 13:37:45 gqas014 kernel: [11567] 0 11567 1020 1 0 0 0 agetty Apr 7 13:37:45 gqas014 kernel: [11569] 0 11569 1016 1 17 0 0 mingetty Apr 7 13:37:45 gqas014 kernel: [11571] 0 11571 2726 10 15 -17 -1000 udevd Apr 7 13:37:45 gqas014 kernel: [11572] 0 11572 2726 10 17 -17 -1000 udevd Apr 7 13:37:45 gqas014 kernel: [11946] 0 11946 4872 37 0 0 0 anacron Apr 7 13:37:45 gqas014 kernel: [12001] 0 12001 25091 84 1 0 0 sshd Apr 7 13:37:45 gqas014 kernel: [12003] 0 12003 27076 60 1 0 0 bash Apr 7 13:37:45 gqas014 kernel: [12097] 0 12097 25289 457 14 0 0 sshd Apr 7 13:37:45 gqas014 kernel: [12099] 0 12099 27076 100 1 0 0 bash Apr 7 13:37:45 gqas014 kernel: [20812] 0 20812 3827 139 1 0 0 top Apr 7 13:37:45 gqas014 kernel: [20854] 0 20854 12315 4421 13 0 0 iozone Apr 7 13:37:45 gqas014 kernel: [20855] 0 20855 12315 4400 15 0 0 iozone Apr 7 13:37:45 gqas014 kernel: [20866] 0 20866 12315 4660 4 0 0 iozone Apr 7 13:37:45 gqas014 kernel: [20867] 0 20867 12315 4654 15 0 0 iozone Apr 7 13:37:45 gqas014 kernel: [20878] 0 20878 12315 4660 5 0 0 iozone Apr 7 13:37:45 gqas014 kernel: [20879] 0 20879 12315 4655 15 0 0 iozone Apr 7 13:37:45 gqas014 kernel: [20890] 0 20890 12315 4660 0 0 0 iozone Apr 7 13:37:45 gqas014 kernel: [20891] 0 20891 12315 4654 15 0 0 iozone Apr 7 13:37:45 gqas014 kernel: Out of memory: Kill process 11053 (glusterfs) score 958 or sacrifice child Apr 7 13:37:45 gqas014 kernel: Killed process 11053, UID 0, (glusterfs) total-vm:82603868kB, anon-rss:48485296kB, file-rss:72kB
This was reproducible with perf iocache and per write behind disabled. Attempting to grab Valgrind output next.
Created attachment 883770 [details] Valgrind output Valgrind output
Created attachment 883773 [details] Valgrind 2 Valgrind from different client
Ben, While there are lot of small leaks, the biggest contributor seems to be the following: ==2702== ==2702== 478,952 (75,624 direct, 403,328 indirect) bytes in 3,151 blocks are definitely lost in loss record 386 of 400 ==2702== at 0x4A0577B: calloc (vg_replace_malloc.c:593) ==2702== by 0x374D2463F2: __gf_calloc (in /usr/lib64/libglusterfs.so.0.0.0) ==2702== by 0x374D247684: iobref_new (in /usr/lib64/libglusterfs.so.0.0.0) ==2702== by 0xFA786F4: __wb_collapse_small_writes (in /usr/lib64/glusterfs/3.5qa2/xlator/performance/write-behind.so) ==2702== by 0xFA789CA: __wb_preprocess_winds (in /usr/lib64/glusterfs/3.5qa2/xlator/performance/write-behind.so) ==2702== by 0xFA78A4F: wb_process_queue (in /usr/lib64/glusterfs/3.5qa2/xlator/performance/write-behind.so) ==2702== by 0xFA79397: wb_writev (in /usr/lib64/glusterfs/3.5qa2/xlator/performance/write-behind.so) ==2702== by 0xFC8257B: ra_writev (in /usr/lib64/glusterfs/3.5qa2/xlator/performance/read-ahead.so) ==2702== by 0xFE921A5: ioc_writev (in /usr/lib64/glusterfs/3.5qa2/xlator/performance/io-cache.so) ==2702== by 0x1009EFDF: qr_writev (in /usr/lib64/glusterfs/3.5qa2/xlator/performance/quick-read.so) ==2702== by 0x374D226EC1: default_writev_resume (in /usr/lib64/libglusterfs.so.0.0.0) ==2702== by 0x374D23A081: call_resume (in /usr/lib64/libglusterfs.so.0.0.0) ==2702== pranith
(In reply to Ben Turner from comment #6) > This was reproducible with perf iocache and per write behind disabled. > Attempting to grab Valgrind output next. This is interesting, According to valgrind output the leak is stemming from the allocation that happened in write-behind. Wonder what we are missing? Pranith
Marking it as verified as 1085511 has already been verified by Ben.
Setting flags required to add BZs to RHS 3.0 Errata
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html