Bug 1393316 - OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume
Summary: OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: readdir-ahead
Version: rhgs-3.2
Hardware: All
OS: All
high
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Raghavendra G
QA Contact: Karan Sandha
URL:
Whiteboard:
Depends On: 1356960 1408217 1408220 1408221
Blocks: 1277328 1351528
TreeView+ depends on / blocked
 
Reported: 2016-11-09 10:18 UTC by Karan Sandha
Modified: 2017-03-23 06:17 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.8.4-11
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1356960
Environment:
Last Closed: 2017-03-23 06:17:38 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Comment 2 Raghavendra G 2016-11-23 03:46:48 UTC
> Based on the statedump, one big leak I see is from dirents which are allocated by DHT but don't seem to be leaking in dht. I think some xlator above dht is not freeing it. Could you let me know the size of the directory you may have?

I think this can be readdir-ahead. Is it possible to turn off readdir-ahead and see whether it helps?

regards,
Raghavendra

Comment 3 Raghavendra G 2016-11-23 03:58:01 UTC
(In reply to Raghavendra G from comment #2)
> > Based on the statedump, one big leak I see is from dirents which are allocated by DHT but don't seem to be leaking in dht. I think some xlator above dht is not freeing it. Could you let me know the size of the directory you may have?
> 
> I think this can be readdir-ahead. Is it possible to turn off readdir-ahead
> and see whether it helps?

The reason I suspect readdir-ahead is that there is no upper limit to amount of dentries readdir-ahead can store as of now. It keeps populating the cache till EOD is reached or an error is encountered in readdir from lower xlators. So, in a scenario where readdirs from application are infrequent and directory is huge, all the dentries of a directory is cached in memory and that could result in OOM. Please note that it is not a leak, but a bug in readdir-ahead to not have an upper limit.

> 
> regards,
> Raghavendra

Comment 4 Raghavendra G 2016-11-23 04:01:31 UTC
From https://bugzilla.redhat.com/show_bug.cgi?id=1356960#c5,

<comment>

I have performed the same steps with  "performance.readdir-ahead off" with gluster 3.8.4.3 build and i am not hitting this issue

</comment>

Comment 6 surabhi 2016-11-29 10:00:44 UTC
As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack

Comment 9 Poornima G 2016-12-09 06:33:00 UTC
The unlimited caching behaviour of readdir-ahead is been thee from day-0. Implementing the upper cache limit for readdir-ahead in 3.2 time frame is difficult as the fix is intrusive.

Can this be deferred for 3.2?

Comment 10 Atin Mukherjee 2016-12-15 05:44:51 UTC
An upstream patch http://review.gluster.org/#/c/16137/ posted for review

Comment 11 Atin Mukherjee 2016-12-22 11:50:55 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93587

Comment 12 Atin Mukherjee 2016-12-22 16:53:27 UTC
a compilation failure was introduced by https://code.engineering.redhat.com/gerrit/#/c/93587 which is now fixed through https://code.engineering.redhat.com/gerrit/#/c/93622/ (this issue was only there in downstream code)

Comment 14 Atin Mukherjee 2016-12-28 12:41:13 UTC
We'd need to pull in one more patch here to ensure rda-low-wmark & rda-high-wmark options are not exposed to the users.

upstream mainline patch : http://review.gluster.org/16297
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93820/

Comment 16 Milind Changire 2017-01-06 15:47:24 UTC
BZ added to erratum https://errata.devel.redhat.com/advisory/24866
Moving to ON_QA

Comment 19 errata-xmlrpc 2017-03-23 06:17:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.