1167648 – [USS] :glusterfs (Fuse client) invoked OOM killer while creating snapshots and enabling and disabling USS in between snap creation

Bug 1167648 - [USS] :glusterfs (Fuse client) invoked OOM killer while creating snapshots and enabling and disabling USS in between snap creation

Summary: [USS] :glusterfs (Fuse client) invoked OOM killer while creating snapshots an...

Keywords:
Status:	CLOSED DUPLICATE of bug 1394229
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Avra Sengupta
QA Contact:	Anoop
Docs Contact:
URL:
Whiteboard:	USS
Depends On:
Blocks:	1153907
TreeView+	depends on / blocked

Reported:	2014-11-25 09:04 UTC by senaik
Modified:	2017-02-21 07:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	Performing operations which involve client graph changes such as volume set operations, restoring snapshot etc eventually leads to out of memory scenarios for the client processes which mount the volume.
Clone Of:
Environment:
Last Closed:	2017-02-21 07:27:37 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description senaik 2014-11-25 09:04:05 UTC

Description of problem:
======================
glusterfs invoked OOM killer while creating snapshots and enabling and disabling USS in between snap creation and the fuse mounts were not accessible. 

Version-Release number of selected component (if applicable):
============================================================
glusterfs 3.6.0.33 

How reproducible:
=================
1/1

Steps to Reproduce:
===================
1.Create 4 dist-rep volumes and start it, fuse and nfs mount the volumes

2.Create IO on all the mounts 

3.While IO is going on, start creating snapshots on all volumes at the same time   
  Create snapshots,activate them, enable USS. 
  Create snapshots again, activate them and disable USS.

Run the following script:
~~~~~~~~~~~~~~~~~~~~~~~~~
i=1
while [ $i -le 256 ]
do
echo "================Running Test $i========================";
gluster snapshot create $i vol0;
gluster volume set vol0 uss on;
gluster snapshot activate $i;
i=$((i+1));
echo "================Running Test $i========================";
gluster snapshot create $i vol0;
gluster volume set vol0 uss off;
gluster snapshot activate $i;
i=$((i+1));
done
~  

=================Part of dmesg=========================================

glusterfs invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
glusterfs cpuset=/ mems_allowed=0
Pid: 5141, comm: glusterfs Not tainted 2.6.32-504.el6.x86_64 #1
Call Trace:
 [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
 [<ffffffff81127300>] ? dump_header+0x90/0x1b0
 [<ffffffff8122ea2c>] ? security_real_capable_noaudit+0x3c/0x70
 [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
 [<ffffffff8112767e>] ? select_bad_process+0x9e/0x120
 [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
 [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
 [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
 [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
 [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
 [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
 [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
 [<ffffffff810516b7>] ? pte_alloc_one+0x37/0x50
 [<ffffffff8100bc0e>] ? invalidate_interrupt0+0xe/0x20
 [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
 [<ffffffff8115435a>] ? vma_merge+0x29a/0x3e0
 [<ffffffff81041e98>] ? pvclock_clocksource_read+0x58/0xd0
 [<ffffffff81040f2c>] ? kvm_clock_read+0x1c/0x20
 [<ffffffff81040f39>] ? kvm_clock_get_cycles+0x9/0x10
 [<ffffffff810a9af7>] ? getnstimeofday+0x57/0xe0
 [<ffffffff8152ffbe>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8152d375>] ? page_fault+0x25/0x30
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:


Actual results:
===============
glusterfs invoked OOM killer while creating snapshots and USS was enabled and disabled in between the snap creation. 


Expected results:

Additional info:

Comment 3 Avra Sengupta 2014-11-26 13:09:57 UTC

Enabling/disabling USS multiple times, causes multiple client-graph changes (adding/removing of snap-view client translator). The graph changes caused by the same, have memory leaks and over a period of time can cause OOM kill.

Comment 5 Pavithra 2014-12-08 05:11:29 UTC

Hi Avra,

Can you please review the edited doc text for technical accuracy and sign off?

Comment 9 rjoseph 2017-02-21 07:27:37 UTC

We already have a known memory leak during graph switch. Enabling and disabling USS create a graph switch and therefore can increase memory utilization of the client process. This issue is tracked in a different bug.

*** This bug has been marked as a duplicate of bug 1394229 ***

Note You need to log in before you can comment on or make changes to this bug.