Bug 1207735
Summary: | Disperse volume: Huge memory leak of glusterfsd process | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Bhaskarakiran <byarlaga> | ||||||
Component: | disperse | Assignee: | Pranith Kumar K <pkarampu> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | mainline | CC: | bugs, byarlaga, gluster-bugs, jahernan, jbyers, mzywusko, nsathyan, pkarampu, rgowdapp, rkavunga, vmallika | ||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.8rc2 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1224177 1229282 1247964 1259697 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-06-16 12:46:47 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1186580, 1224177, 1229282, 1247964, 1259697 | ||||||||
Attachments: |
|
Description
Bhaskarakiran
2015-03-31 15:11:28 UTC
Created attachment 1009111 [details]
statedump of node2
On recent builds, seeing bricks and nfs servers getting crashed with OOM messages. sequence of events that happen are : 1. client mount hangs 2. brick crashes 3. export of volume is not shown with rpcinfo 4. nfs server crashes with OOM. I've tried to reproduce this issue with current master and I've been unable. Do you do anything else besides the add-brick and rebalance ? Even with the plain disperse volume and nfs mount the issue persists on 3.7 beta2 build. NFS mounted the volume and ran iozone -a couple of times and seeing the leak. The process is taking almost 40g. 14314 root 20 0 17.1g 8.0g 2528 S 20.0 12.7 41:15.49 glusterfsd 14396 root 20 0 17.1g 8.0g 2528 S 19.4 12.7 42:16.27 glusterfsd 14397 root 20 0 17.1g 8.0g 2528 S 19.4 12.7 43:34.59 glusterfsd 14721 root 20 0 17.1g 8.0g 2528 S 19.4 12.7 43:08.11 glusterfsd 14697 root 20 0 17.1g 8.0g 2528 S 19.0 12.7 41:04.22 glusterfsd 14702 root 20 0 17.1g 8.0g 2528 S 19.0 12.7 41:13.08 glusterfsd 14722 root 20 0 17.1g 8.0g 2528 S 19.0 12.7 40:32.11 glusterfsd 14713 root 20 0 65.3g 40g 2528 S 18.7 64.5 40:38.43 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14735 root 20 0 65.3g 40g 2528 S 18.7 64.5 41:52.18 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14392 root 20 0 17.1g 8.0g 2528 S 18.7 12.7 43:33.64 glusterfsd 14704 root 20 0 17.1g 8.0g 2528 S 18.7 12.7 41:59.24 glusterfsd 14714 root 20 0 65.3g 40g 2528 S 18.4 64.5 39:08.16 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14737 root 20 0 65.3g 40g 2528 S 18.4 64.5 41:03.79 glusterfsd 14701 root 20 0 17.1g 8.0g 2528 S 18.4 12.7 41:18.25 glusterfsd 14684 root 20 0 10.3g 4.4g 2532 S 18.4 7.0 38:15.19 glusterfsd 14388 root 20 0 65.3g 40g 2528 S 18.1 64.5 40:20.30 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14716 root 20 0 65.3g 40g 2528 R 18.1 64.5 40:24.51 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14736 root 20 0 65.3g 40g 2528 R 18.1 64.5 38:40.43 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14703 root 20 0 17.1g 8.0g 2528 S 18.1 12.7 41:06.25 glusterfsd 14331 root 20 0 10.3g 4.4g 2532 S 18.1 7.0 38:29.85 glusterfsd 14294 root 20 0 65.3g 40g 2528 R 17.7 64.5 38:03.70 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14395 root 20 0 65.3g 40g 2528 R 17.7 64.5 38:51.38 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14705 root 20 0 17.1g 8.0g 2528 S 17.7 12.7 43:05.49 glusterfsd 14723 root 20 0 17.1g 8.0g 2528 R 17.7 12.7 42:20.05 glusterfsd 14740 root 20 0 17.1g 8.0g 2528 S 17.7 12.7 39:55.02 glusterfsd 14389 root 20 0 10.3g 4.4g 2532 S 17.7 7.0 39:52.06 glusterfsd 14675 root 20 0 10.3g 4.4g 2532 S 17.7 7.0 38:26.46 glusterfsd 14678 root 20 0 65.3g 40g 2528 S 17.4 64.5 40:18.39 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14734 root 20 0 65.3g 40g 2528 S 17.4 64.5 39:07.99 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14328 root 20 0 10.3g 4.4g 2532 S 17.4 7.0 38:01.29 glusterfsd 14393 root 20 0 10.3g 4.4g 2532 S 17.4 7.0 39:14.94 glusterfsd 14683 root 20 0 10.3g 4.4g 2532 S 17.4 7.0 38:10.70 glusterfsd 14696 root 20 0 65.3g 40g 2528 S 17.1 64.5 39:26.60 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14390 root 20 0 17.1g 8.0g 2528 S 17.1 12.7 41:03.34 glusterfsd 14724 root 20 0 17.1g 8.0g 2528 S 17.1 12.7 41:06.26 glusterfsd 14329 root 20 0 10.3g 4.4g 2532 S 17.1 7.0 38:46.04 glusterfsd 14712 root 20 0 10.3g 4.4g 2532 S 17.1 7.0 38:18.10 glusterfsd 14297 root 20 0 65.3g 40g 2528 S 16.7 64.5 40:29.80 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14670 root 20 0 65.3g 40g 2528 S 16.7 64.5 39:24.16 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14700 root 20 0 65.3g 40g 2528 R 16.7 64.5 40:00.28 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14715 root 20 0 65.3g 40g 2528 S 16.7 64.5 40:53.39 glusterfsd >>>>>>>>>>>>>>>>>>>>>>> 14311 root 20 0 17.1g 8.0g 2528 S 16.7 12.7 39:05.23 glusterfsd 14706 root 20 0 10.3g 4.4g 2532 S 16.7 7.0 37:28.30 glusterfsd 14707 root 20 0 10.3g 4.4g 2532 S 16.7 7.0 37:52.83 glusterfsd Thanks, I'll try again with NFS and iozone. Bhaskarakiran, Do you have sos-reports corresponding to the statedump attached? I need to go through logs to understand the state of the system. REVIEW: http://review.gluster.org/11044 (fd: Do fd_bind on successful open) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) This patch only fixes wrong fd_count being shown in statedump because fd_binds were not happening. Still looking into more fd leaks. REVIEW: http://review.gluster.org/11044 (fd: Do fd_bind on successful open) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: http://review.gluster.org/11045 (features/quota: Fix ref-leak) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) COMMIT: http://review.gluster.org/11045 committed in master by Raghavendra G (rgowdapp) ------ commit 2b7ae84a5feb636f0e41d0ab36c04b7f3fbce520 Author: Pranith Kumar K <pkarampu> Date: Tue Jun 2 17:58:00 2015 +0530 features/quota: Fix ref-leak Change-Id: I0b44b70f07be441e044d9dfc5c2b64bd5b4cac18 BUG: 1207735 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/11045 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp> COMMIT: http://review.gluster.org/11311 committed in master by Raghavendra G (rgowdapp) ------ commit 8ab6608accb62d6320d1fc1fbe651fcafd376270 Author: vmallika <vmallika> Date: Thu Jun 18 14:30:16 2015 +0530 quota/marker: fix mem-leak, free contribution node When removing contribution xattr, we also need to free contribution node in memory Change-Id: I5fe97813a8f39e2f00401976046bd280f2eea54d BUG: 1207735 Signed-off-by: vmallika <vmallika> Reviewed-on: http://review.gluster.org/11311 Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp> Patch #11311 can cause memory corruption or crash because of accessing already freed contribution node. We need to fix this with ref/unref mechanism of cleaning contribution node REVIEW: http://review.gluster.org/11361 (quota/marker: fix mem-leak in marker) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11362 (Revert "quota/marker: fix mem-leak, free contribution node") posted (#1) for review on master by Raghavendra Bhat (raghavendra) Patch #11361, resulted in a memory corruption: http://build.gluster.org/job/rackspace-regression-2GB-triggered/11193/consoleFull #0 0x00007f6d5f53c420 in uuid_unpack (in=0xffffffffffffff28 <Address 0xffffffffffffff28 out of bounds>, uu=0x7f6d4040da70) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/uuid/unpack.c:43 #1 0x00007f6d5f53be0a in gf_uuid_unparse_x (uu=0xffffffffffffff28 <Address 0xffffffffffffff28 out of bounds>, out=0x7f6d4040daf0 "bcaa777a-939a-49de-b41c-0eaa27cb8e02", fmt=0x7f6d5f54eb38 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/uuid/unparse.c:55 #2 0x00007f6d5f53bf2e in gf_uuid_unparse (uu=0xffffffffffffff28 <Address 0xffffffffffffff28 out of bounds>, out=0x7f6d4040daf0 "bcaa777a-939a-49de-b41c-0eaa27cb8e02") at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/uuid/unparse.c:75 10:27 <nbalacha> #3 0x00007f6d5067a200 in mq_inspect_file_xattr_task (opaque=0x7f6d4020db30) we are accessing contribution->gfid here. Seems like contribution is corrupted. This can happen if contribution object is being used by someone else when we free it. s/patch #11361/patch #11311/ REVIEW: http://review.gluster.org/11361 (quota/marker: fix mem-leak in marker) posted (#2) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11361 (quota/marker: fix mem-leak in marker) posted (#3) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11361 (quota/marker: fix mem-leak in marker) posted (#4) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11361 (quota/marker: fix mem-leak in marker) posted (#5) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11362 (Revert "quota/marker: fix mem-leak, free contribution node") posted (#2) for review on master by Raghavendra G (rgowdapp) REVIEW: http://review.gluster.org/11362 (Revert "quota/marker: fix mem-leak, free contribution node") posted (#3) for review on master by Raghavendra G (rgowdapp) COMMIT: http://review.gluster.org/11362 committed in master by Raghavendra G (rgowdapp) ------ commit 01e42f25ebdc44847e8b1dce02f7fd486b40dbc2 Author: Raghavendra Bhat <raghavendra> Date: Tue Jun 23 00:25:38 2015 +0530 Revert "quota/marker: fix mem-leak, free contribution node" This reverts commit 8ab6608accb62d6320d1fc1fbe651fcafd376270. This patch is resulting in memory corruption: http://build.gluster.org/job/rackspace-regression-2GB-triggered/11193/consoleFull contribution object might be being used by some other transaction when we free it. The correct way to handle this is to have a reference based scheme to manage the contribution object. Change-Id: Idf9993ed8268029073a3e2d699865587f20d9aea BUG: 1207735 Signed-off-by: Raghavendra Bhat <raghavendra> Reviewed-on: http://review.gluster.org/11362 Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp> REVIEW: http://review.gluster.org/11361 (quota/marker: fix mem-leak in marker) posted (#7) for review on master by Vijaikumar Mallikarjuna (vmallika) Patch merged: http://review.gluster.org/#/c/11361/ REVIEW: http://review.gluster.org/11457 (quota/marker: improve locking in create_xattr_txn) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: improve locking in create_xattr_txn) posted (#2) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: improve locking in create_xattr_txn) posted (#3) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#4) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#5) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11499 (quota/marker: use smaller stacksize in synctask for marker updation) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11499 (quota/marker: use smaller stacksize in synctask for marker updation) posted (#2) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#6) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#7) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11522 (posix: fix mem-leak in posix_get_ancestry error path) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#8) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11526 (quota: fix mem leak in quota enforcer) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11499 (quota/marker: use smaller stacksize in synctask for marker updation) posted (#3) for review on master by Vijaikumar Mallikarjuna (vmallika) COMMIT: http://review.gluster.org/11522 committed in master by Raghavendra G (rgowdapp) ------ commit a95f5651b8e2159eedb2ab87e2253a233d3ecfe7 Author: vmallika <vmallika> Date: Fri Jul 3 15:16:57 2015 +0530 posix: fix mem-leak in posix_get_ancestry error path Change-Id: I47c8a8f170151f6374fc0420278aedf3ff5443ee BUG: 1207735 Signed-off-by: vmallika <vmallika> Reviewed-on: http://review.gluster.org/11522 Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Sachin Pandit <spandit> Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp> REVIEW: http://review.gluster.org/11499 (quota/marker: use smaller stacksize in synctask for marker updation) posted (#4) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#9) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11499 (quota/marker: use smaller stacksize in synctask for marker updation) posted (#5) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#10) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11588 (posix: check for NULL inode in posix do_xattrop) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#12) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#13) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#14) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#15) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11457 (quota/marker: fix mem leak in marker) posted (#16) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11617 (quota/marker: fix mem-leak in marker) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11616 (quota/marker: inspect file/dir invoked without having quota xattrs requested) posted (#2) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11617 (quota/marker: fix mem-leak in marker) posted (#2) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11617 (quota/marker: fix mem-leak in marker) posted (#3) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11617 (quota/marker: fix mem-leak in marker) posted (#4) for review on master by Vijaikumar Mallikarjuna (vmallika) COMMIT: http://review.gluster.org/11617 committed in master by Raghavendra G (rgowdapp) ------ commit e73db5e7fe1dba5a071725ef3480a4a1d5c7bef7 Author: vmallika <vmallika> Date: Sun Jul 12 21:03:54 2015 +0530 quota/marker: fix mem-leak in marker Free local in error paths Change-Id: I76f69e7d746af8eedea34354ff5a6bf50234e50e BUG: 1207735 Signed-off-by: vmallika <vmallika> Reviewed-on: http://review.gluster.org/11617 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> REVIEW: http://review.gluster.org/11044 (fd: Do fd_bind on successful open) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu) REVIEW: http://review.gluster.org/11044 (fd: Do fd_bind on successful open) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu) COMMIT: http://review.gluster.org/11044 committed in master by Raghavendra G (rgowdapp) ------ commit e55579bdb1d04cca29f3e87427de5f2a5ab5e9b4 Author: Pranith Kumar K <pkarampu> Date: Tue Jun 2 16:39:35 2015 +0530 fd: Do fd_bind on successful open - fd_unref should decrement fd->inode->fd_count only if it is present in the inode's fd list. - successful open/opendir should perform fd_bind. Change-Id: I81dd04f330e2fee86369a6dc7147af44f3d49169 BUG: 1207735 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/11044 Reviewed-by: Anoop C S <anoopcs> Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well. This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |