Bug 1417606

Summary: OOM kill of glusterfsd during continuous add-bricks
Product: [Community] GlusterFS Reporter: Niels de Vos <ndevos>
Component: upcallAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.9CC: amukherj, ashah, bugs, moagrawa, nbalacha, pgurusid, rcyriac, rgowdapp, rhs-bugs, skoduri, storage-qa-internal, sunnikri, tdesala
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1412917 Environment:
Last Closed: 2017-03-08 12:35:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1412917, 1417622    
Bug Blocks:    

Description Niels de Vos 2017-01-30 12:08:52 UTC
+++ This bug was initially created as a clone of Bug #1412917 +++

Hi,


We have found one code(the same is for up_fsetxxatR and up_removexattr/up_fremovexattr) path that holds dict leak but there could be other path also those are having leak .

>>>>>>>>>>>>>>>>>>>>>>

0x7ff423dac373 : mem_get0+0x13/0x90 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7d355 : get_new_dict_full+0x25/0x120 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7dbab : dict_new+0xb/0x20 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7fa0a : dict_copy_with_ref+0x3a/0xe0 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff41419733a : up_setxattr+0x3a/0x450 [/usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so]
 0x7ff423e16684 : default_setxattr_resume+0x1d4/0x250 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423da86ed : call_resume+0x7d/0xd0 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff40fdf9957 : iot_worker+0x117/0x220 [/usr/lib64/glusterfs/3.8.4/xlator/performance/io-threads.so]
 0x7ff422be6dc5 : 0x7ff422be6dc5 [/usr/lib64/libpthread-2.17.so+0x7dc5/0x218000]


>>>>>>>>>>>>>>>>>>>>>>>>>>

I am trying to find other path also, will send a patch after spend some more time on this.


Regards
Mohit Agrawal

--- Additional comment from Worker Ant on 2017-01-13 07:48:07 CET ---

REVIEW: http://review.gluster.org/16392 (upcall: Resolve dict leak in up_removexattr/up_setxattr code path.) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-01-16 09:45:03 CET ---

REVIEW: http://review.gluster.org/16392 (upcall: Resolve leak from up_(f)removexattr in upcall code path) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-01-16 09:57:32 CET ---

REVIEW: http://review.gluster.org/16392 (upcall: Resolve dict leak from up_(f)removexattr in upcall code path) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-01-16 10:32:17 CET ---

COMMIT: http://review.gluster.org/16392 committed in master by Niels de Vos (ndevos) 
------
commit afdd83a9b69573b854e732795c0bcba0a00d6c0f
Author: Mohit Agrawal <moagrawa>
Date:   Fri Jan 13 12:17:05 2017 +0530

    upcall: Resolve dict leak from up_(f)removexattr in upcall code path
    
    Problem: In up_(f)removexattr() dict_for_key_value() is used to create a
             new dict. This dict is not correctly unref'd and gets leaked.
    
    Solution: To avoid the leak up_(f)removexattr() now also does a
              dict_unref() on the newly created dict.
    
    While reviewing the code in up_(f)setxattr() for a similar problem, it
    was noticed that there is an extra dict created. There is no need for
    this copy, upcall_local_init() can just take the dict that was passed as
    argument to the FOP.
    
    BUG: 1412917
    Change-Id: I5bb9a7d99f5087af11c19ae722de62bdb5ad1498
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: http://review.gluster.org/16392
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Niels de Vos <ndevos>
    Smoke: Gluster Build System <jenkins.org>

Comment 1 Worker Ant 2017-01-30 12:47:22 UTC
REVIEW: https://review.gluster.org/16480 (upcall: Resolve dict leak from up_(f)removexattr in upcall code path) posted (#1) for review on release-3.9 by MOHIT AGRAWAL (moagrawa)

Comment 2 Worker Ant 2017-01-31 15:23:27 UTC
REVIEW: https://review.gluster.org/16480 (upcall: Resolve dict leak from up_(f)removexattr in upcall code path) posted (#2) for review on release-3.9 by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2017-01-31 19:52:02 UTC
COMMIT: https://review.gluster.org/16480 committed in release-3.9 by Niels de Vos (ndevos) 
------
commit 4852ca54db76ed36a5b68d4b492b8165bff403bd
Author: Mohit Agrawal <moagrawa>
Date:   Fri Jan 13 12:17:05 2017 +0530

    upcall: Resolve dict leak from up_(f)removexattr in upcall code path
    
    Problem: In up_(f)removexattr() dict_for_key_value() is used to create a
             new dict. This dict is not correctly unref'd and gets leaked.
    
    Solution: To avoid the leak up_(f)removexattr() now also does a
              dict_unref() on the newly created dict.
    
    While reviewing the code in up_(f)setxattr() for a similar problem, it
    was noticed that there is an extra dict created. There is no need for
    this copy, upcall_local_init() can just take the dict that was passed as
    argument to the FOP.
    
    > BUG: 1412917
    > Change-Id: I5bb9a7d99f5087af11c19ae722de62bdb5ad1498
    > Signed-off-by: Mohit Agrawal <moagrawa>
    > Reviewed-on: http://review.gluster.org/16392
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Niels de Vos <ndevos>
    > Smoke: Gluster Build System <jenkins.org>
    > (cherry picked from commit afdd83a9b69573b854e732795c0bcba0a00d6c0f)
    
    Change-Id: I0a53545528c43c09b88d360d3a12c460476647ba
    BUG: 1417606
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: https://review.gluster.org/16480
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Niels de Vos <ndevos>
    Smoke: Gluster Build System <jenkins.org>

Comment 4 Kaushal 2017-03-08 12:35:31 UTC
This bug is getting closed because GlusterFS-3.9 has reached its end-of-life [1].

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please open a new bug against the newer release.

[1]: https://www.gluster.org/community/release-schedule/