Bug 1167648

Summary: [USS] :glusterfs (Fuse client) invoked OOM killer while creating snapshots and enabling and disabling USS in between snap creation
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: senaik
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED DUPLICATE QA Contact: Anoop <annair>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: asengupt, asriram, rhinduja, rhs-bugs, smohan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: USS
Fixed In Version: Doc Type: Known Issue
Doc Text:
Performing operations which involve client graph changes such as volume set operations, restoring snapshot etc eventually leads to out of memory scenarios for the client processes which mount the volume.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-21 07:27:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1153907    

Description senaik 2014-11-25 09:04:05 UTC
Description of problem:
======================
glusterfs invoked OOM killer while creating snapshots and enabling and disabling USS in between snap creation and the fuse mounts were not accessible. 

Version-Release number of selected component (if applicable):
============================================================
glusterfs 3.6.0.33 

How reproducible:
=================
1/1

Steps to Reproduce:
===================
1.Create 4 dist-rep volumes and start it, fuse and nfs mount the volumes

2.Create IO on all the mounts 

3.While IO is going on, start creating snapshots on all volumes at the same time   
  Create snapshots,activate them, enable USS. 
  Create snapshots again, activate them and disable USS.

Run the following script:
~~~~~~~~~~~~~~~~~~~~~~~~~
i=1
while [ $i -le 256 ]
do
echo "================Running Test $i========================";
gluster snapshot create $i vol0;
gluster volume set vol0 uss on;
gluster snapshot activate $i;
i=$((i+1));
echo "================Running Test $i========================";
gluster snapshot create $i vol0;
gluster volume set vol0 uss off;
gluster snapshot activate $i;
i=$((i+1));
done
~  

=================Part of dmesg=========================================

glusterfs invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
glusterfs cpuset=/ mems_allowed=0
Pid: 5141, comm: glusterfs Not tainted 2.6.32-504.el6.x86_64 #1
Call Trace:
 [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
 [<ffffffff81127300>] ? dump_header+0x90/0x1b0
 [<ffffffff8122ea2c>] ? security_real_capable_noaudit+0x3c/0x70
 [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
 [<ffffffff8112767e>] ? select_bad_process+0x9e/0x120
 [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
 [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
 [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
 [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
 [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
 [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
 [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
 [<ffffffff810516b7>] ? pte_alloc_one+0x37/0x50
 [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20
 [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
 [<ffffffff8115435a>] ? vma_merge+0x29a/0x3e0
 [<ffffffff81041e98>] ? pvclock_clocksource_read+0x58/0xd0
 [<ffffffff81040f2c>] ? kvm_clock_read+0x1c/0x20
 [<ffffffff81040f39>] ? kvm_clock_get_cycles+0x9/0x10
 [<ffffffff810a9af7>] ? getnstimeofday+0x57/0xe0
 [<ffffffff8152ffbe>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8152d375>] ? page_fault+0x25/0x30
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:


Actual results:
===============
glusterfs invoked OOM killer while creating snapshots and USS was enabled and disabled in between the snap creation. 


Expected results:

Additional info:

Comment 3 Avra Sengupta 2014-11-26 13:09:57 UTC
Enabling/disabling USS multiple times, causes multiple client-graph changes (adding/removing of snap-view client translator). The graph changes caused by the same, have memory leaks and over a period of time can cause OOM kill.

Comment 5 Pavithra 2014-12-08 05:11:29 UTC
Hi Avra,

Can you please review the edited doc text for technical accuracy and sign off?

Comment 9 rjoseph 2017-02-21 07:27:37 UTC
We already have a known memory leak during graph switch. Enabling and disabling USS create a graph switch and therefore can increase memory utilization of the client process. This issue is tracked in a different bug.

*** This bug has been marked as a duplicate of bug 1394229 ***