Bug 1417606 - OOM kill of glusterfsd during continuous add-bricks
Summary: OOM kill of glusterfsd during continuous add-bricks
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: upcall
Version: 3.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard:
Depends On: 1412917 1417622
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-30 12:08 UTC by Niels de Vos
Modified: 2017-03-08 12:35 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1412917
Environment:
Last Closed: 2017-03-08 12:35:31 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Niels de Vos 2017-01-30 12:08:52 UTC
+++ This bug was initially created as a clone of Bug #1412917 +++

Hi,


We have found one code(the same is for up_fsetxxatR and up_removexattr/up_fremovexattr) path that holds dict leak but there could be other path also those are having leak .

>>>>>>>>>>>>>>>>>>>>>>

0x7ff423dac373 : mem_get0+0x13/0x90 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7d355 : get_new_dict_full+0x25/0x120 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7dbab : dict_new+0xb/0x20 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7fa0a : dict_copy_with_ref+0x3a/0xe0 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff41419733a : up_setxattr+0x3a/0x450 [/usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so]
 0x7ff423e16684 : default_setxattr_resume+0x1d4/0x250 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423da86ed : call_resume+0x7d/0xd0 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff40fdf9957 : iot_worker+0x117/0x220 [/usr/lib64/glusterfs/3.8.4/xlator/performance/io-threads.so]
 0x7ff422be6dc5 : 0x7ff422be6dc5 [/usr/lib64/libpthread-2.17.so+0x7dc5/0x218000]


>>>>>>>>>>>>>>>>>>>>>>>>>>

I am trying to find other path also, will send a patch after spend some more time on this.


Regards
Mohit Agrawal

--- Additional comment from Worker Ant on 2017-01-13 07:48:07 CET ---

REVIEW: http://review.gluster.org/16392 (upcall: Resolve dict leak in up_removexattr/up_setxattr code path.) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa@redhat.com)

--- Additional comment from Worker Ant on 2017-01-16 09:45:03 CET ---

REVIEW: http://review.gluster.org/16392 (upcall: Resolve leak from up_(f)removexattr in upcall code path) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa@redhat.com)

--- Additional comment from Worker Ant on 2017-01-16 09:57:32 CET ---

REVIEW: http://review.gluster.org/16392 (upcall: Resolve dict leak from up_(f)removexattr in upcall code path) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa@redhat.com)

--- Additional comment from Worker Ant on 2017-01-16 10:32:17 CET ---

COMMIT: http://review.gluster.org/16392 committed in master by Niels de Vos (ndevos@redhat.com) 
------
commit afdd83a9b69573b854e732795c0bcba0a00d6c0f
Author: Mohit Agrawal <moagrawa@redhat.com>
Date:   Fri Jan 13 12:17:05 2017 +0530

    upcall: Resolve dict leak from up_(f)removexattr in upcall code path
    
    Problem: In up_(f)removexattr() dict_for_key_value() is used to create a
             new dict. This dict is not correctly unref'd and gets leaked.
    
    Solution: To avoid the leak up_(f)removexattr() now also does a
              dict_unref() on the newly created dict.
    
    While reviewing the code in up_(f)setxattr() for a similar problem, it
    was noticed that there is an extra dict created. There is no need for
    this copy, upcall_local_init() can just take the dict that was passed as
    argument to the FOP.
    
    BUG: 1412917
    Change-Id: I5bb9a7d99f5087af11c19ae722de62bdb5ad1498
    Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
    Reviewed-on: http://review.gluster.org/16392
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>

Comment 1 Worker Ant 2017-01-30 12:47:22 UTC
REVIEW: https://review.gluster.org/16480 (upcall: Resolve dict leak from up_(f)removexattr in upcall code path) posted (#1) for review on release-3.9 by MOHIT AGRAWAL (moagrawa@redhat.com)

Comment 2 Worker Ant 2017-01-31 15:23:27 UTC
REVIEW: https://review.gluster.org/16480 (upcall: Resolve dict leak from up_(f)removexattr in upcall code path) posted (#2) for review on release-3.9 by MOHIT AGRAWAL (moagrawa@redhat.com)

Comment 3 Worker Ant 2017-01-31 19:52:02 UTC
COMMIT: https://review.gluster.org/16480 committed in release-3.9 by Niels de Vos (ndevos@redhat.com) 
------
commit 4852ca54db76ed36a5b68d4b492b8165bff403bd
Author: Mohit Agrawal <moagrawa@redhat.com>
Date:   Fri Jan 13 12:17:05 2017 +0530

    upcall: Resolve dict leak from up_(f)removexattr in upcall code path
    
    Problem: In up_(f)removexattr() dict_for_key_value() is used to create a
             new dict. This dict is not correctly unref'd and gets leaked.
    
    Solution: To avoid the leak up_(f)removexattr() now also does a
              dict_unref() on the newly created dict.
    
    While reviewing the code in up_(f)setxattr() for a similar problem, it
    was noticed that there is an extra dict created. There is no need for
    this copy, upcall_local_init() can just take the dict that was passed as
    argument to the FOP.
    
    > BUG: 1412917
    > Change-Id: I5bb9a7d99f5087af11c19ae722de62bdb5ad1498
    > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
    > Reviewed-on: http://review.gluster.org/16392
    > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Niels de Vos <ndevos@redhat.com>
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > (cherry picked from commit afdd83a9b69573b854e732795c0bcba0a00d6c0f)
    
    Change-Id: I0a53545528c43c09b88d360d3a12c460476647ba
    BUG: 1417606
    Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
    Reviewed-on: https://review.gluster.org/16480
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>

Comment 4 Kaushal 2017-03-08 12:35:31 UTC
This bug is getting closed because GlusterFS-3.9 has reached its end-of-life [1].

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please open a new bug against the newer release.

[1]: https://www.gluster.org/community/release-schedule/


Note You need to log in before you can comment on or make changes to this bug.