Bug 1311441 - Fix mem leaks related to gfapi applications
Summary: Fix mem leaks related to gfapi applications
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: libgfapi
Version: 3.7.9
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On: 1295107
Blocks: 1240172 1300924
TreeView+ depends on / blocked
 
Reported: 2016-02-24 09:28 UTC by Soumya Koduri
Modified: 2016-04-19 07:22 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.7.9
Doc Type: Bug Fix
Doc Text:
Clone Of: 1295107
Environment:
Last Closed: 2016-04-19 06:58:43 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Soumya Koduri 2016-02-24 09:28:56 UTC
+++ This bug was initially created as a clone of Bug #1295107 +++

Description of problem:

This bug is to track the changes required to fix mem leaks related to gfapi applications.

--- Additional comment from Vijay Bellur on 2016-01-04 09:22:51 EST ---

REVIEW: http://review.gluster.org/13125 (inode: Retire the inodes from the lru list in inode_table_destroy) posted (#2) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-01-05 00:51:37 EST ---

REVIEW: http://review.gluster.org/13096 (gfapi: Fix inode reference/nlookup counts) posted (#3) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-01-11 05:23:23 EST ---

REVIEW: http://review.gluster.org/13096 (gfapi: Fix inode nlookup counts) posted (#4) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-01-13 01:13:14 EST ---

REVIEW: http://review.gluster.org/13232 (upcall: free the xdr* allocations) posted (#1) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-01-14 06:39:27 EST ---

COMMIT: http://review.gluster.org/13232 committed in master by Kaleb KEITHLEY (kkeithle) 
------
commit 7d6e5dad15b7ecf2e8abce468aea9fd84b876052
Author: Soumya Koduri <skoduri>
Date:   Wed Jan 13 11:34:27 2016 +0530

    upcall: free the xdr* allocations
    
    Free the xdr string allocations after decoding the upcall
    cache_invalidation request.
    
    Change-Id: I0ffc64f587d6c8566cba76cf08148f937a735926
    BUG: 1295107
    Signed-off-by: Soumya Koduri <skoduri>
    Reviewed-on: http://review.gluster.org/13232
    Reviewed-by: Niels de Vos <ndevos>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

--- Additional comment from Vijay Bellur on 2016-02-01 02:33:26 EST ---

REVIEW: http://review.gluster.org/13125 (inode: Retire the inodes from the lru list in inode_table_destroy) posted (#3) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-08 06:03:48 EST ---

REVIEW: http://review.gluster.org/13125 (inode: Retire the inodes from the lru list in inode_table_destroy) posted (#4) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-08 12:26:39 EST ---

REVIEW: http://review.gluster.org/13125 (inode: Retire the inodes from the lru list in inode_table_destroy) posted (#5) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-08 12:26:42 EST ---

REVIEW: http://review.gluster.org/13096 (gfapi: Use inode_forget in case of handle objects) posted (#5) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-16 08:25:11 EST ---

REVIEW: http://review.gluster.org/13125 (inode: Retire the inodes from the lru list in inode_table_destroy) posted (#6) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-16 08:25:14 EST ---

REVIEW: http://review.gluster.org/13096 (gfapi: Use inode_forget in case of handle objects) posted (#6) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-16 08:25:17 EST ---

REVIEW: http://review.gluster.org/13456 (rpc: Fix for rpc_transport_t  leak) posted (#1) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-20 07:47:36 EST ---

REVIEW: http://review.gluster.org/13456 (rpc: Fix for rpc_transport_t leak) posted (#2) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-22 01:59:45 EST ---

REVIEW: http://review.gluster.org/13456 (rpc: Fix for rpc_transport_t leak) posted (#3) for review on master by soumya k (skoduri)

--- Additional comment from Vijay Bellur on 2016-02-23 21:51:17 EST ---

COMMIT: http://review.gluster.org/13096 committed in master by Kaleb KEITHLEY (kkeithle) 
------
commit db747eba347e2909f4fb4fc28a67adb8f188586a
Author: Soumya Koduri <skoduri>
Date:   Mon Feb 8 22:33:56 2016 +0530

    gfapi: Use inode_forget in case of handle objects
    
    Currently with gfapi, even if applications release their
    reference on paritular inode entries, those entries never
    get destroyed as they shall always have positive nlookup
    count unless inode_table exceeds lru limit.
    
    Since inodes and their contexts can consume considerable
    amount of memory, applications may end up consuming lot of
    memory and may even get OOM killed.
    
    To avoid that, have considered below approaches (for handle based access)-
    a) Do 'inode_lookup' only if the corresponding inode is created
    for the first time and forget it during close of the handle
    
    This shall not work as multiple handle objects can refer to
    same inode entry and inode_forget from second handle object
    onwards shall result in assert.
    
    b) Do 'inode_lookup' during a handle or fd creation
    
    But this approach shall affect the performance of the fops which
    operate on neither handle nor fd.
    
    c) current approach taken-
    Applications using glfs handle objects hold a reference
    on corresponding inode entries. Hence it is safe to forget
    those inodes and make nlookup count to '0' while trying to
    delete those handle objects. That way the last unref (i.e,
    from the last handle close) shall result in inode_destroy().
    
    Change-Id: Id2c7ab178894a10c0030c143ba71e7813df8d18c
    Signed-off-by: Soumya Koduri <skoduri>
    BUG: 1295107
    Reviewed-on: http://review.gluster.org/13096
    Reviewed-by: jiffin tony Thottan <jthottan>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Niels de Vos <ndevos>
    CentOS-regression: Gluster Build System <jenkins.com>
    Smoke: Gluster Build System <jenkins.com>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

--- Additional comment from Vijay Bellur on 2016-02-23 23:15:29 EST ---

COMMIT: http://review.gluster.org/13456 committed in master by Jeff Darcy (jdarcy) 
------
commit 8cf29a4207c162be8e2993ae36f49090851cbbfc
Author: Soumya Koduri <skoduri>
Date:   Tue Feb 16 18:50:23 2016 +0530

    rpc: Fix for rpc_transport_t leak
    
    The transport object needs to get unref'ed when the rpc clnt
    object is getting destroyed. But currently in rpc_clnt_disable()
    we set conn->trans to NULL before it gets unref'ed leading to
    transport object leak.
    
    This change is to fix it by setting conn-tran to NULL only when
    it is being unref'ed.
    
    Change-Id: I79ba34e28ae19eb616035f36bbed1c2f47875b94
    BUG: 1295107
    Signed-off-by: Soumya Koduri <skoduri>
    Reviewed-on: http://review.gluster.org/13456
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-by: Jeff Darcy <jdarcy>

--- Additional comment from Vijay Bellur on 2016-02-23 23:53:04 EST ---

REVIEW: http://review.gluster.org/13125 (inode: Retire the inodes from the lru list in inode_table_destroy) posted (#7) for review on master by soumya k (skoduri)

Comment 1 Vijay Bellur 2016-02-24 09:48:46 UTC
REVIEW: http://review.gluster.org/13506 (gfapi: Use inode_forget in case of handle objects) posted (#1) for review on release-3.7 by soumya k (skoduri)

Comment 2 Vijay Bellur 2016-02-24 09:48:49 UTC
REVIEW: http://review.gluster.org/13507 (rpc: Fix for rpc_transport_t leak) posted (#1) for review on release-3.7 by soumya k (skoduri)

Comment 3 Vijay Bellur 2016-02-26 05:18:42 UTC
REVIEW: http://review.gluster.org/13506 (gfapi: Use inode_forget in case of handle objects) posted (#2) for review on release-3.7 by soumya k (skoduri)

Comment 4 Vijay Bellur 2016-02-26 05:18:45 UTC
REVIEW: http://review.gluster.org/13507 (rpc: Fix for rpc_transport_t leak) posted (#2) for review on release-3.7 by soumya k (skoduri)

Comment 5 Vijay Bellur 2016-02-26 05:18:52 UTC
REVIEW: http://review.gluster.org/13527 (inode: Retire the inodes from the lru list in inode_table_destroy) posted (#1) for review on release-3.7 by soumya k (skoduri)

Comment 6 Vijay Bellur 2016-02-27 19:04:31 UTC
COMMIT: http://review.gluster.org/13506 committed in release-3.7 by Kaleb KEITHLEY (kkeithle) 
------
commit 6f12a5767f6912f6913cd9dfa4c6a5484a766000
Author: Soumya Koduri <skoduri>
Date:   Mon Feb 8 22:33:56 2016 +0530

    gfapi: Use inode_forget in case of handle objects
    
    Currently with gfapi, even if applications release their
    reference on paritular inode entries, those entries never
    get destroyed as they shall always have positive nlookup
    count unless inode_table exceeds lru limit.
    
    Since inodes and their contexts can consume considerable
    amount of memory, applications may end up consuming lot of
    memory and may even get OOM killed.
    
    To avoid that, have considered below approaches (for handle based access)-
    a) Do 'inode_lookup' only if the corresponding inode is created
    for the first time and forget it during close of the handle
    
    This shall not work as multiple handle objects can refer to
    same inode entry and inode_forget from second handle object
    onwards shall result in assert.
    
    b) Do 'inode_lookup' during a handle or fd creation
    
    But this approach shall affect the performance of the fops which
    operate on neither handle nor fd.
    
    c) current approach taken-
    Applications using glfs handle objects hold a reference
    on corresponding inode entries. Hence it is safe to forget
    those inodes and make nlookup count to '0' while trying to
    delete those handle objects. That way the last unref (i.e,
    from the last handle close) shall result in inode_destroy().
    
    This is backport of the below patch
    	- http://review.gluster.org/13096
    
    Change-Id: Id2c7ab178894a10c0030c143ba71e7813df8d18c
    Signed-off-by: Soumya Koduri <skoduri>
    BUG: 1311441
    Reviewed-on: http://review.gluster.org/13096
    Reviewed-by: jiffin tony Thottan <jthottan>
    Reviewed-by: Niels de Vos <ndevos>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/13506
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 7 Vijay Bellur 2016-02-28 10:37:48 UTC
REVIEW: http://review.gluster.org/13527 (inode: Retire the inodes from the lru list in inode_table_destroy) posted (#2) for review on release-3.7 by soumya k (skoduri)

Comment 8 Vijay Bellur 2016-02-28 10:37:56 UTC
REVIEW: http://review.gluster.org/13507 (rpc: Fix for rpc_transport_t leak) posted (#3) for review on release-3.7 by soumya k (skoduri)

Comment 9 Vijay Bellur 2016-02-28 22:17:43 UTC
COMMIT: http://review.gluster.org/13527 committed in release-3.7 by Kaleb KEITHLEY (kkeithle) 
------
commit 29adf166aa5f15202c5fe49369ad4f11df799c5b
Author: Soumya Koduri <skoduri>
Date:   Thu Dec 31 13:53:54 2015 +0530

    inode: Retire the inodes from the lru list in inode_table_destroy
    
    Inodes from the lru list are not moved to purge list unless they
    are retired. Also process the lru list first to unset their parent
    as we need to unset their dentry entries (the ones which may not be
    unset during '__inode_passivate' as they were hashed) which in turn
    shall unref their parent inodes which could be in active list.
    
    These parent inodes when unref'ed may well again fall into lru list
    and if we are at the end of traversing the list, we may miss to
    delete/retire that entry. Hence traverse the lru list till it
    gets empty.
    
    This is backport of the below patch
      - http://review.gluster.org/13125
    
    Change-Id: Ib7666e235e9b9644144a7c7933afb5e407e506ca
    BUG: 1311441
    Signed-off-by: Soumya Koduri <skoduri>
    Reviewed-on: http://review.gluster.org/13125
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/13527
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 10 Vijay Bellur 2016-03-03 04:07:00 UTC
REVIEW: http://review.gluster.org/13507 (rpc: Fix for rpc_transport_t leak) posted (#4) for review on release-3.7 by soumya k (skoduri)

Comment 11 Vijay Bellur 2016-03-03 04:07:03 UTC
REVIEW: http://review.gluster.org/13584 (rpc: Fix regression cause by #13507) posted (#1) for review on release-3.7 by soumya k (skoduri)

Comment 12 Vijay Bellur 2016-03-24 07:13:41 UTC
REVIEW: http://review.gluster.org/13507 (rpc: Fix for rpc_transport_t leak) posted (#5) for review on release-3.7 by soumya k (skoduri)

Comment 13 Vijay Bellur 2016-03-24 07:13:44 UTC
REVIEW: http://review.gluster.org/13824 (rpc: Connect back only if rpc is not disabled) posted (#1) for review on release-3.7 by soumya k (skoduri)

Comment 14 Vijay Bellur 2016-03-24 15:02:26 UTC
COMMIT: http://review.gluster.org/13507 committed in release-3.7 by Jeff Darcy (jdarcy) 
------
commit 10d7b9d45a90d345f6ce4ec0e7192e0226602d80
Author: Soumya Koduri <skoduri>
Date:   Tue Feb 16 18:50:23 2016 +0530

    rpc: Fix for rpc_transport_t leak
    
    The transport object needs to get unref'ed when the rpc clnt
    object is getting destroyed. But currently in rpc_clnt_disable()
    we set conn->trans to NULL before it gets unref'ed leading to
    transport object leak.
    
    This change is to fix it by setting conn-tran to NULL only when
    it is being unref'ed.
    
    This is backport of the below patch
    	- http://review.gluster.org/13456
    
    Change-Id: I79ba34e28ae19eb616035f36bbed1c2f47875b94
    BUG: 1311441
    Signed-off-by: Soumya Koduri <skoduri>
    Reviewed-on: http://review.gluster.org/13456
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-by: Jeff Darcy <jdarcy>
    Reviewed-on: http://review.gluster.org/13507
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 15 Vijay Bellur 2016-03-28 10:30:33 UTC
REVIEW: http://review.gluster.org/13824 (rpc: Connect back only if rpc is not disabled) posted (#2) for review on release-3.7 by soumya k (skoduri)

Comment 16 Vijay Bellur 2016-03-31 04:36:10 UTC
COMMIT: http://review.gluster.org/13824 committed in release-3.7 by Kaushal M (kaushal) 
------
commit ac6899c7eaae9983f00645109a569e75f1d0a72a
Author: Kotresh HR <khiremat>
Date:   Wed Mar 2 11:40:24 2016 +0530

    rpc: Connect back only if rpc is not disabled
    
    This is a backport of the below fix -
     http://review.gluster.org/13592
    
    As discussed over http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/14284,
    patch #13456 caused a regression where in if there are any pending
    rpc invocations, we end up accessing freed object. This patch
    fixes it by allowing reconnect during rpc submit only if rpc
    is not disabled.
    
    This is to fix regression caused by below patch -
    http://review.gluster.org/#/c/13507/
    
    Change-Id: I4ef4dd52bd42368bb89129f98bc973e46c6a39f4
    BUG: 1311441
    Signed-off-by: Kotresh HR <khiremat>
    Reviewed-on: http://review.gluster.org/13592
    Reviewed-on: http://review.gluster.org/13824
    Tested-by: soumya k <skoduri>
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 17 Kaushal 2016-04-19 06:58:43 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.10, please open a new bug report.

glusterfs-3.7.10 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-April/026164.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 18 Kaushal 2016-04-19 07:22:46 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 19 Kaushal 2016-04-19 07:22:46 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.