Bug 1261234
Summary: | Possible memory leak during rebalance with large quantity of files | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Max Gashkov <max> | ||||
Component: | distribute | Assignee: | Susant Kumar Palai <spalai> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 3.7.3 | CC: | bugs, gluster-bugs, jbyers, nbalacha, rgowdapp, rkavunga, spalai | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.7.5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1266877 (view as bug list) | Environment: | |||||
Last Closed: | 2015-10-14 10:27:30 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1266877, 1272933 | ||||||
Attachments: |
|
Hi Max, Can you share rebalance logs? What was the mem-usage of rebalance process when it was OOM killed? Hi, Rebalance log is rather large (about 600M), I can grep for specific strings if needed or share whole file privately (please indicate method for contacting you directly). OOM didn't kill the process, I did. It was around 2G RES at the time and with the other glusterfsd processes it started swapping to the point when system became unstable. (In reply to Max Gashkov from comment #2) > Hi, > > Rebalance log is rather large (about 600M), I can grep for specific strings > if needed or share whole file privately (please indicate method for > contacting you directly). Can you grep for Error messages in rebalance log and update? For contact:On IRC [#gluster] nick: [spalai] > > OOM didn't kill the process, I did. It was around 2G RES at the time and > with the other glusterfsd processes it started swapping to the point when > system became unstable. This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.7.5, please open a new bug report. glusterfs-glusterfs-3.7.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/gluster-users/2015-October/023968.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.5, please open a new bug report. glusterfs-3.7.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/gluster-users/2015-October/023968.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |
Created attachment 1071537 [details] statedump of rebalance process Description of problem: Gluster distributed volume with 4 bricks fails to rebalance due to memory exhaustion. I have a gluster distributed volume with 4 bricks on one physical server (this seems strange but there are reasons for this). Bricks are formatted with ext4. Volume spans 57T of storage space and currently contains ~2.5T in 30M files, mostly located on brick 1. Rebalance fix-layout completed successfully, but main rebalance fails to complete as server runs out of memory. I've tried running echo 2 > /proc/sys/vm/drop_caches After approximately 24hrs server starts thrashing. Version-Release number of selected component (if applicable): glusterfs 3.7.3 built on Jul 28 2015 14:28:57 How reproducible: Always Steps to Reproduce: 1. Start rebalance 2. Wait ~24hrs Actual results: Server starts thrashing due to memory exhaustion. Expected results: Memory occupated by gluster remains relatively constant.