Bug 1227197
Summary: | Disperse volume : Memory leak in client glusterfs | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Bhaskarakiran <byarlaga> | ||||
Component: | disperse | Assignee: | Pranith Kumar K <pkarampu> | ||||
Status: | CLOSED ERRATA | QA Contact: | Bhaskarakiran <byarlaga> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | rhgs-3.1 | CC: | annair, asrivast, byarlaga, mzywusko, nsathyan, pkarampu, rcyriac, rhs-bugs, storage-qa-internal, vagarwal, vbellur | ||||
Target Milestone: | --- | ||||||
Target Release: | RHGS 3.1.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.7.1-7.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-07-29 04:54:50 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1202842, 1223636, 1230612 | ||||||
Attachments: |
|
Description
Bhaskarakiran
2015-06-02 06:33:08 UTC
Created attachment 1033606 [details]
sosreport of cilent
This is seen even if the USS is off. Brought down 2 of the bricks in 4+2, start linux untar and brought them up. untar hung and glusterfs got killed. I feel this is blocker. Please mark it blocker+. With the fix for 1227649, i.e. https://code.engineering.redhat.com/gerrit/49909 I am able to run the test given in the bug description without any OOM killers. The reason for the leaks is the stale lock structures which also ref the inodes which increase and eventually lead to death of the mount. verified this on 3.7.1-3 and didn't see the issue. Marking this as fixed. Ran iozone on 10 files simultaneously and seen the memory leak. Glusterfs is getting killed with OOM messages. Re-opening the bug. This is on 3.7.1-4 [root@rhs-client29 iozone]# 12 Error reading block 587 Error reading block 505 Error reading block 888, fd= 3 Filename testfile.7 Read returned -1 Seeked to 796 Reclen = 4096 Error reading block 562 Error reading block 941, fd= 3 Filename testfile.8 Read returned -1 Seeked to 678 Reclen = 4096 Error reading block 576 Can not fdopen temp file: testfile.3 107 Can not fdopen temp file: testfile.9 107 fdopen: Transport endpoint is not connected read: Software caused connection abort fdopen: Transport endpoint is not connected read: Software caused connection abort Can not fdopen temp file: testfile.2 107 read: Software caused connection abort read: Transport endpoint is not connected read: Software caused connection abort fdopen: Transport endpoint is not connected Can not fdopen temp file: testfile.1 107 fdopen: Transport endpoint is not connected read: Software caused connection abort dmesg output: Out of memory: Kill process 4169 (glusterfs) score 925 or sacrifice child Killed process 4169, UID 0, (glusterfs) total-vm:19552852kB, anon-rss:7615896kB, file-rss:8kB [root@rhs-client29 iozone]# Can you please provide sosreports and more details of the system in terms of resources etc. when the crash happened? Additionally providing the exact command line used with iozone would help. As per Bhaskar not re-creatable in 3.7.1-4. He will close it if it is working fine with 3.7.1-5 as well. Moving it to ON_QA based on comment #9. The command i used is : for i in `seq 1 10`; do /opt/iozone3_430/src/current/iozone -az -i0 -i1 & done and the client is a physical machine with 8GB RAM. I failed to collect the sosreport while the crash happened. I would if i see this again on the latest build. Bhaskar, I need following information: 1) Is this bug intermittent? 2) When this issue happens lot of selfheals are triggered on the mount, in other words do you see lot of failures in the brick logs? The only possibility I see for this is, if the mount triggers too many heals leading to OOM issue. We probably need rate-limiting as a fix for this. Pranith Pranith, 1. No. reproducible with iozone consistently. 2. I haven't observed this. Need to check verified this on 3.7.1-7 build and didn't see the OOM killers. Marking this as fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html |