Description of problem: ====================== glusterfs invoked OOM killer while creating snapshots and enabling and disabling USS in between snap creation and the fuse mounts were not accessible. Version-Release number of selected component (if applicable): ============================================================ glusterfs 3.6.0.33 How reproducible: ================= 1/1 Steps to Reproduce: =================== 1.Create 4 dist-rep volumes and start it, fuse and nfs mount the volumes 2.Create IO on all the mounts 3.While IO is going on, start creating snapshots on all volumes at the same time Create snapshots,activate them, enable USS. Create snapshots again, activate them and disable USS. Run the following script: ~~~~~~~~~~~~~~~~~~~~~~~~~ i=1 while [ $i -le 256 ] do echo "================Running Test $i========================"; gluster snapshot create $i vol0; gluster volume set vol0 uss on; gluster snapshot activate $i; i=$((i+1)); echo "================Running Test $i========================"; gluster snapshot create $i vol0; gluster volume set vol0 uss off; gluster snapshot activate $i; i=$((i+1)); done ~ =================Part of dmesg========================================= glusterfs invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 glusterfs cpuset=/ mems_allowed=0 Pid: 5141, comm: glusterfs Not tainted 2.6.32-504.el6.x86_64 #1 Call Trace: [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 [<ffffffff81127300>] ? dump_header+0x90/0x1b0 [<ffffffff8122ea2c>] ? security_real_capable_noaudit+0x3c/0x70 [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0 [<ffffffff8112767e>] ? select_bad_process+0x9e/0x120 [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0 [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0 [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90 [<ffffffff811240de>] ? find_get_page+0x1e/0xa0 [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500 [<ffffffff8114eae4>] ? __do_fault+0x54/0x530 [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00 [<ffffffff810516b7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20 [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480 [<ffffffff8115435a>] ? vma_merge+0x29a/0x3e0 [<ffffffff81041e98>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffff81040f2c>] ? kvm_clock_read+0x1c/0x20 [<ffffffff81040f39>] ? kvm_clock_get_cycles+0x9/0x10 [<ffffffff810a9af7>] ? getnstimeofday+0x57/0xe0 [<ffffffff8152ffbe>] ? do_page_fault+0x3e/0xa0 [<ffffffff8152d375>] ? page_fault+0x25/0x30 Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: Actual results: =============== glusterfs invoked OOM killer while creating snapshots and USS was enabled and disabled in between the snap creation. Expected results: Additional info:
Enabling/disabling USS multiple times, causes multiple client-graph changes (adding/removing of snap-view client translator). The graph changes caused by the same, have memory leaks and over a period of time can cause OOM kill.
Hi Avra, Can you please review the edited doc text for technical accuracy and sign off?
We already have a known memory leak during graph switch. Enabling and disabling USS create a graph switch and therefore can increase memory utilization of the client process. This issue is tracked in a different bug. *** This bug has been marked as a duplicate of bug 1394229 ***