1393316 – OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume

Bug 1393316 - OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume

Summary: OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	readdir-ahead
Sub Component:
Version:	rhgs-3.2
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Raghavendra G
QA Contact:	Karan Sandha
Docs Contact:
URL:
Whiteboard:
Depends On:	1356960 1408217 1408220 1408221
Blocks:	1277328 1351528
TreeView+	depends on / blocked

Reported:	2016-11-09 10:18 UTC by Karan Sandha
Modified:	2017-03-23 06:17 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.8.4-11
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1356960
Environment:
Last Closed:	2017-03-23 06:17:38 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Comment 2 Raghavendra G 2016-11-23 03:46:48 UTC

> Based on the statedump, one big leak I see is from dirents which are allocated by DHT but don't seem to be leaking in dht. I think some xlator above dht is not freeing it. Could you let me know the size of the directory you may have?

I think this can be readdir-ahead. Is it possible to turn off readdir-ahead and see whether it helps?

regards,
Raghavendra

Comment 3 Raghavendra G 2016-11-23 03:58:01 UTC

(In reply to Raghavendra G from comment #2)
> > Based on the statedump, one big leak I see is from dirents which are allocated by DHT but don't seem to be leaking in dht. I think some xlator above dht is not freeing it. Could you let me know the size of the directory you may have?
> 
> I think this can be readdir-ahead. Is it possible to turn off readdir-ahead
> and see whether it helps?

The reason I suspect readdir-ahead is that there is no upper limit to amount of dentries readdir-ahead can store as of now. It keeps populating the cache till EOD is reached or an error is encountered in readdir from lower xlators. So, in a scenario where readdirs from application are infrequent and directory is huge, all the dentries of a directory is cached in memory and that could result in OOM. Please note that it is not a leak, but a bug in readdir-ahead to not have an upper limit.

> 
> regards,
> Raghavendra

Comment 4 Raghavendra G 2016-11-23 04:01:31 UTC

From https://bugzilla.redhat.com/show_bug.cgi?id=1356960#c5,

<comment>

I have performed the same steps with  "performance.readdir-ahead off" with gluster 3.8.4.3 build and i am not hitting this issue

</comment>

Comment 6 surabhi 2016-11-29 10:00:44 UTC

As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack

Comment 9 Poornima G 2016-12-09 06:33:00 UTC

The unlimited caching behaviour of readdir-ahead is been thee from day-0. Implementing the upper cache limit for readdir-ahead in 3.2 time frame is difficult as the fix is intrusive.

Can this be deferred for 3.2?

Comment 10 Atin Mukherjee 2016-12-15 05:44:51 UTC

An upstream patch http://review.gluster.org/#/c/16137/ posted for review

Comment 11 Atin Mukherjee 2016-12-22 11:50:55 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93587

Comment 12 Atin Mukherjee 2016-12-22 16:53:27 UTC

a compilation failure was introduced by https://code.engineering.redhat.com/gerrit/#/c/93587 which is now fixed through https://code.engineering.redhat.com/gerrit/#/c/93622/ (this issue was only there in downstream code)

Comment 14 Atin Mukherjee 2016-12-28 12:41:13 UTC

We'd need to pull in one more patch here to ensure rda-low-wmark & rda-high-wmark options are not exposed to the users.

upstream mainline patch : http://review.gluster.org/16297
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93820/

Comment 16 Milind Changire 2017-01-06 15:47:24 UTC

BZ added to erratum https://errata.devel.redhat.com/advisory/24866
Moving to ON_QA

Comment 19 errata-xmlrpc 2017-03-23 06:17:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.