Bug 1381140
Summary: | OOM kill of glusterfs fuse mount process seen on both the clients with one doing rename and the other doing delete of same files | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
Component: | readdir-ahead | Assignee: | Raghavendra G <rgowdapp> |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | amukherj, nchilaka, rgowdapp, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | RHGS 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-10 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 06:07:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1351528 |
Description
Nag Pavan Chilakam
2016-10-03 08:49:55 UTC
these can be easily hit by customer, just think if the admin wanted to delete a directory in one shot which has huge list of files. Logs available at rhsqe-repo.lab.eng.blr.redhat.com: /home/repo/sosreports/nchilaka/bug.1381140 Is it possible to rerun the test when glusterfs is mounted with following options? 1. --entry-timeout=0 2. --attribute-timeout=0 I just want to see whether large number of inodes in itable is the culprit. (In reply to Raghavendra G from comment #4) > Is it possible to rerun the test when glusterfs is mounted with following > options? > 1. --entry-timeout=0 > 2. --attribute-timeout=0 > > I just want to see whether large number of inodes in itable is the culprit. Another thing to try is to disable readdirp (along with above options). To summarize, 1. mount glusterfs with --use-readdirp=no --entry-timeout=0 --attribute-timeout=0 2. gluster volume set <volname> performance.force-readdirp off 3. gluster volume set <volname> dht.force-readdirp off regards, Raghavendra Is it possible to provide statedumps associated with this bz? As discussed in triage meeting raised following bugs 1400067 - OOM kill of glusterfs fuse mount process seen on client where i was doing rename 1400071 - OOM kill of glusterfs fuse mount process seen on client where i was doing deletes also, marked these two bugs with internal whiteboard as "3.2.0-beyond" An upstream patch http://review.gluster.org/#/c/16137/ posted for review downstream patch : downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93587 I am blocked with validation of this bug due to 1409472 brick crashed on systemic setup Hi Raghavendra, As part of validation, I am incorporating the steps I mentioned while raising the bug itself to validate this bz However, in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c9 , we mentioned about breaking this bz into 3 different BZs including this one. So how should i go about testing this. Shall I still proceed with what i mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c0 or you want me to test only a part of the test scenario. Also , if you have any other idea for me to incorporate to test this, kindly let me know thanks, nagpavan (In reply to nchilaka from comment #16) > Hi Raghavendra, > As part of validation, I am incorporating the steps I mentioned while > raising the bug itself to validate this bz > However, in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c9 , we > mentioned about breaking this bz into 3 different BZs including this one. > So how should i go about testing this. > Shall I still proceed with what i mentioned in > https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c0 > or you want me to test only a part of the test scenario. We need to test only directory traversal part. No need of rm and mv. Please have the same data set and just do find <glusterfs-mount> or ls -lR <glusterfs-mount> with readdir-ahead enabled. > > Also , if you have any other idea for me to incorporate to test this, kindly > let me know > > > thanks, > nagpavan Validation: Based on comment#17, setup1: had created a 4x2 vol (default settings meaning readdir-ahead enabled) created about 10million files and did below I did a lookup from two different clients on the whole filesystem using find * on c1 and ls -lRt on c2 I did this in a loop of 5 times didn't see any fuse OOM kill Setup2: on my systemic setup did a lookup from two diffrent clients didn't hit any oom kill hence moving to verified 3.8.4.-14 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |