Bug 1412917

Summary: OOM kill of glusterfsd during continuous add-bricks
Product: [Community] GlusterFS Reporter: Mohit Agrawal <moagrawa>
Component: upcallAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: amukherj, ashah, bugs, moagrawa, nbalacha, ndevos, pgurusid, rcyriac, rgowdapp, rhs-bugs, skoduri, storage-qa-internal, sunnikri, tdesala
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.10.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1411329
: 1417606 1417622 (view as bug list) Environment:
Last Closed: 2017-03-06 17:43:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1411329, 1417606, 1417622    

Comment 1 Mohit Agrawal 2017-01-13 06:31:03 UTC
Hi,


We have found one code(the same is for up_fsetxxatR and up_removexattr/up_fremovexattr) path that holds dict leak but there could be other path also those are having leak .

>>>>>>>>>>>>>>>>>>>>>>

0x7ff423dac373 : mem_get0+0x13/0x90 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7d355 : get_new_dict_full+0x25/0x120 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7dbab : dict_new+0xb/0x20 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423d7fa0a : dict_copy_with_ref+0x3a/0xe0 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff41419733a : up_setxattr+0x3a/0x450 [/usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so]
 0x7ff423e16684 : default_setxattr_resume+0x1d4/0x250 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff423da86ed : call_resume+0x7d/0xd0 [/usr/lib64/libglusterfs.so.0.0.1]
 0x7ff40fdf9957 : iot_worker+0x117/0x220 [/usr/lib64/glusterfs/3.8.4/xlator/performance/io-threads.so]
 0x7ff422be6dc5 : 0x7ff422be6dc5 [/usr/lib64/libpthread-2.17.so+0x7dc5/0x218000]


>>>>>>>>>>>>>>>>>>>>>>>>>>

I am trying to find other path also, will send a patch after spend some more time on this.


Regards
Mohit Agrawal

Comment 2 Worker Ant 2017-01-13 06:48:07 UTC
REVIEW: http://review.gluster.org/16392 (upcall: Resolve dict leak in up_removexattr/up_setxattr code path.) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2017-01-16 08:45:03 UTC
REVIEW: http://review.gluster.org/16392 (upcall: Resolve leak from up_(f)removexattr in upcall code path) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 4 Worker Ant 2017-01-16 08:57:32 UTC
REVIEW: http://review.gluster.org/16392 (upcall: Resolve dict leak from up_(f)removexattr in upcall code path) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 5 Worker Ant 2017-01-16 09:32:17 UTC
COMMIT: http://review.gluster.org/16392 committed in master by Niels de Vos (ndevos) 
------
commit afdd83a9b69573b854e732795c0bcba0a00d6c0f
Author: Mohit Agrawal <moagrawa>
Date:   Fri Jan 13 12:17:05 2017 +0530

    upcall: Resolve dict leak from up_(f)removexattr in upcall code path
    
    Problem: In up_(f)removexattr() dict_for_key_value() is used to create a
             new dict. This dict is not correctly unref'd and gets leaked.
    
    Solution: To avoid the leak up_(f)removexattr() now also does a
              dict_unref() on the newly created dict.
    
    While reviewing the code in up_(f)setxattr() for a similar problem, it
    was noticed that there is an extra dict created. There is no need for
    this copy, upcall_local_init() can just take the dict that was passed as
    argument to the FOP.
    
    BUG: 1412917
    Change-Id: I5bb9a7d99f5087af11c19ae722de62bdb5ad1498
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: http://review.gluster.org/16392
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Niels de Vos <ndevos>
    Smoke: Gluster Build System <jenkins.org>

Comment 6 Niels de Vos 2017-01-30 12:19:54 UTC
Mohit, bug 1417606 is for backporting this change to release-3.9. Because there are some differences in the code with respect to 3.8, I'd like your opinion on backporting it to 3.8 as well. Could you consider doing that, and if it makes sense, please clone this bug for 3.8 as well.

Thanks!

Comment 7 Mohit Agrawal 2017-01-30 12:58:34 UTC
Niels,

  I have raised a request to backport the same in release-3.9.After checked the code of 
  release-3.8 i think we don't need to backport it in release-3.8 because in 3.8 code upcall
  does not support up_setxattr or up_removexattr .



Regards
Mohit Agrawal

Comment 8 Shyamsundar 2017-03-06 17:43:33 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/