Bug 1393316
Summary: | OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Karan Sandha <ksandha> |
Component: | readdir-ahead | Assignee: | Raghavendra G <rgowdapp> |
Status: | CLOSED ERRATA | QA Contact: | Karan Sandha <ksandha> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.2 | CC: | amukherj, bugs, ksandha, mchangir, pgurusid, pkarampu, ravishankar, rcyriac, rhs-bugs, sbhaloth, storage-qa-internal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | RHGS 3.2.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-11 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1356960 | Environment: | |
Last Closed: | 2017-03-23 06:17:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1356960, 1408217, 1408220, 1408221 | ||
Bug Blocks: | 1277328, 1351528 |
(In reply to Raghavendra G from comment #2) > > Based on the statedump, one big leak I see is from dirents which are allocated by DHT but don't seem to be leaking in dht. I think some xlator above dht is not freeing it. Could you let me know the size of the directory you may have? > > I think this can be readdir-ahead. Is it possible to turn off readdir-ahead > and see whether it helps? The reason I suspect readdir-ahead is that there is no upper limit to amount of dentries readdir-ahead can store as of now. It keeps populating the cache till EOD is reached or an error is encountered in readdir from lower xlators. So, in a scenario where readdirs from application are infrequent and directory is huge, all the dentries of a directory is cached in memory and that could result in OOM. Please note that it is not a leak, but a bug in readdir-ahead to not have an upper limit. > > regards, > Raghavendra From https://bugzilla.redhat.com/show_bug.cgi?id=1356960#c5, <comment> I have performed the same steps with "performance.readdir-ahead off" with gluster 3.8.4.3 build and i am not hitting this issue </comment> As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack The unlimited caching behaviour of readdir-ahead is been thee from day-0. Implementing the upper cache limit for readdir-ahead in 3.2 time frame is difficult as the fix is intrusive. Can this be deferred for 3.2? An upstream patch http://review.gluster.org/#/c/16137/ posted for review downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93587 a compilation failure was introduced by https://code.engineering.redhat.com/gerrit/#/c/93587 which is now fixed through https://code.engineering.redhat.com/gerrit/#/c/93622/ (this issue was only there in downstream code) We'd need to pull in one more patch here to ensure rda-low-wmark & rda-high-wmark options are not exposed to the users. upstream mainline patch : http://review.gluster.org/16297 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93820/ BZ added to erratum https://errata.devel.redhat.com/advisory/24866 Moving to ON_QA Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |
> Based on the statedump, one big leak I see is from dirents which are allocated by DHT but don't seem to be leaking in dht. I think some xlator above dht is not freeing it. Could you let me know the size of the directory you may have? I think this can be readdir-ahead. Is it possible to turn off readdir-ahead and see whether it helps? regards, Raghavendra