Bug 1425623 - Free all xlator specific resources when xlator->fini() gets called
Summary: Free all xlator specific resources when xlator->fini() gets called
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Niels de Vos
QA Contact:
URL:
Whiteboard:
: 1072854 (view as bug list)
Depends On: 1442411 1443145 1444023 1470170
Blocks: 1196020 1370417 1397177 1438817
TreeView+ depends on / blocked
 
Reported: 2017-02-21 22:52 UTC by Niels de Vos
Modified: 2019-07-02 04:03 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.11.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-02 04:03:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Niels de Vos 2017-02-21 22:52:15 UTC
Description of problem:
With gfapi there are (long-living) processes that handle constructing and tearing down of xlator graphs. One major contributor to this is the calling of glfs_init() and glfs_fini() to (re)mount a Gluster volume. Each cycle all the xlators for the volume are initialized, and many do not free all their resources (memory allocation, threads, open files, ...) in the fini() call.

Additional info:
There is a tool "gfapi-load-volfile" that can be used to check the resource leaks in single xlators. It currently is available from https://github.com/nixpanic/gluster-debug and it may be moved to https://github.com/gluster/gluster-debug

This bug will be used to track progress of single bugs that fix resource leaks in specific xlators.

Comment 1 Worker Ant 2017-03-01 19:47:32 UTC
REVIEW: https://review.gluster.org/16796 (libglusterfs: accept random volname in glusterfs_graph_prepare()) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 2 Worker Ant 2017-03-01 20:17:22 UTC
REVIEW: https://review.gluster.org/16806 (template: add a "is-sink" option to allow no subvolumes) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 3 Worker Ant 2017-03-01 20:17:25 UTC
REVIEW: https://review.gluster.org/16807 (template: send EVENT_CHILD_UP on EVENT_PARENT_UP notification) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 4 Worker Ant 2017-03-01 20:17:29 UTC
REVIEW: https://review.gluster.org/16808 (template: add sink_lookup() to make fake mounting work) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 5 Worker Ant 2017-03-01 20:21:53 UTC
REVIEW: https://review.gluster.org/16809 (xlator: do not call dlclose() when debugging) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 6 Worker Ant 2017-03-16 17:27:49 UTC
REVIEW: https://review.gluster.org/16806 (debug/sink: add xlator to aid in resource leak debugging) posted (#2) for review on master by Niels de Vos (ndevos)

Comment 7 Worker Ant 2017-03-28 15:07:22 UTC
REVIEW: https://review.gluster.org/16806 (debug/sink: add xlator to aid in resource leak debugging) posted (#3) for review on master by Niels de Vos (ndevos)

Comment 8 Worker Ant 2017-04-03 12:45:13 UTC
REVIEW: https://review.gluster.org/16809 (xlator: do not call dlclose() when debugging) posted (#2) for review on master by Niels de Vos (ndevos)

Comment 9 Worker Ant 2017-04-03 12:48:25 UTC
REVIEW: https://review.gluster.org/16806 (debug/sink: add xlator to aid in resource leak debugging) posted (#4) for review on master by Niels de Vos (ndevos)

Comment 10 Worker Ant 2017-04-03 13:55:52 UTC
REVIEW: https://review.gluster.org/16809 (xlator: do not call dlclose() when debugging) posted (#3) for review on master by Niels de Vos (ndevos)

Comment 11 Worker Ant 2017-04-04 11:50:55 UTC
REVIEW: https://review.gluster.org/16806 (debug/sink: add xlator to aid in resource leak debugging) posted (#5) for review on master by Niels de Vos (ndevos)

Comment 12 Worker Ant 2017-04-04 12:39:55 UTC
REVIEW: https://review.gluster.org/16806 (debug/sink: add xlator to aid in resource leak debugging) posted (#6) for review on master by Niels de Vos (ndevos)

Comment 13 Worker Ant 2017-04-06 09:27:57 UTC
REVIEW: https://review.gluster.org/16809 (xlator: do not call dlclose() when debugging) posted (#4) for review on master by Niels de Vos (ndevos)

Comment 14 Worker Ant 2017-04-06 13:45:15 UTC
REVIEW: https://review.gluster.org/16809 (xlator: do not call dlclose() when debugging) posted (#5) for review on master by Niels de Vos (ndevos)

Comment 15 Worker Ant 2017-04-07 16:00:50 UTC
REVIEW: https://review.gluster.org/16796 (libglusterfs: accept random volname in glusterfs_graph_prepare()) posted (#2) for review on master by Niels de Vos (ndevos)

Comment 16 Worker Ant 2017-04-07 17:17:15 UTC
COMMIT: https://review.gluster.org/16809 committed in master by Jeff Darcy (jeff.us) 
------
commit ef36ac0d1b72ab2c07ed6e0a3116b7265c3c0164
Author: Niels de Vos <ndevos>
Date:   Mon Feb 27 22:37:00 2017 -0800

    xlator: do not call dlclose() when debugging
    
    Valgrind can not show the symbols if a .so after calling dlclose(). The
    unhelpful ??? in the output gets resolved properly with this change:
    
      ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324
      ==25170==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
      ==25170==    by 0x52C7C0B: __gf_calloc (mem-pool.c:117)
      ==25170==    by 0x12B0638A: ???
      ==25170==    by 0x528FCE6: __xlator_init (xlator.c:472)
      ==25170==    by 0x528FE16: xlator_init (xlator.c:498)
      ==25170==    by 0x52DA8D6: glusterfs_graph_init (graph.c:321)
      ==25170==    by 0x52DB587: glusterfs_graph_activate (graph.c:695)
      ==25170==    by 0x5046407: glfs_process_volfp (glfs-mgmt.c:79)
      ==25170==    by 0x5043B9E: glfs_volumes_init (glfs.c:281)
      ==25170==    by 0x5044FEC: glfs_init_common (glfs.c:986)
      ==25170==    by 0x50451A7: glfs_init@@GFAPI_3.4.0 (glfs.c:1031)
    
    By not calling dlclose(), the dynamically loaded .so is still available
    upon program exit, and Valgrind is able to resolve the symbols. This
    will add an additional leak, so dlclose() is called for normal builds,
    but skipped when configuring with "./configure --enable-valgrind" or
    passing the "run-with-valgrind" xlator option.
    
    URL: http://valgrind.org/docs/manual/faq.html#faq.unhelpful
    Change-Id: I2044e21b1b8fcce32ad1a817fdd795218f967731
    BUG: 1425623
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/16809
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Samikshan Bairagya <samikshan>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Comment 17 Worker Ant 2017-04-13 14:47:41 UTC
REVIEW: https://review.gluster.org/16796 (libglusterfs: accept random volname in glusterfs_graph_prepare()) posted (#3) for review on master by Niels de Vos (ndevos)

Comment 18 Worker Ant 2017-04-13 14:59:56 UTC
REVIEW: https://review.gluster.org/16796 (libglusterfs: accept random volname in glusterfs_graph_prepare()) posted (#4) for review on master by Niels de Vos (ndevos)

Comment 19 Worker Ant 2017-04-14 15:43:43 UTC
REVIEW: https://review.gluster.org/16806 (debug/sink: add xlator to aid in resource leak debugging) posted (#7) for review on master by Niels de Vos (ndevos)

Comment 20 Worker Ant 2017-04-18 13:04:58 UTC
REVIEW: https://review.gluster.org/16806 (debug/sink: add xlator to aid in resource leak debugging) posted (#8) for review on master by Niels de Vos (ndevos)

Comment 21 Worker Ant 2017-04-25 23:10:42 UTC
COMMIT: https://review.gluster.org/16806 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 0451909e0533d357a45dd427226028e095240dac
Author: Niels de Vos <ndevos>
Date:   Tue Feb 21 14:35:52 2017 +0100

    debug/sink: add xlator to aid in resource leak debugging
    
    This new xlator does not allocate any resources on init(). This makes it
    a good option to use for debugging xlator releated resources leaks on
    fini().
    
    By putting the sink xlator as single xlator in a .vol file, and loading
    it through gfapi, we can investigate the resource leaks that are
    happening through gfapi (and the Gluster core). By extending the .vol
    file with additional xlators, it is possible to analyze resource leaks
    of single xlators.
    
    Change-Id: Idb5faa861b623dd5b2a988b181e669b0d52c2a0e
    BUG: 1425623
    Fixes: #176
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/16806
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 22 Worker Ant 2017-04-26 01:18:58 UTC
COMMIT: https://review.gluster.org/16796 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 1538c98f5e33e0794830d5153f17a96ff28c9914
Author: Niels de Vos <ndevos>
Date:   Mon Feb 27 18:45:16 2017 -0800

    libglusterfs: accept random volname in glusterfs_graph_prepare()
    
    When the call to glfs_new("volname") passes a name for the volume and it
    does not match the name of the subvolume in the graph, glfs_init() will
    fail. This is easily reproducible by a gfapi program that loads the
    volume from a .vol file, and not from a GlusterD server.
    
    Change-Id: I33e77fbee7d12eaefe7c384fad6aecfa3582ea5a
    BUG: 1425623
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/16796
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-by: Prashanth Pai <ppai>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 23 Niels de Vos 2017-05-02 11:21:54 UTC
*** Bug 1072854 has been marked as a duplicate of this bug. ***

Comment 24 Shyamsundar 2017-05-30 18:44:46 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 25 Niels de Vos 2017-05-31 06:01:19 UTC
Reopening, this is still a work in progress.

Comment 26 Amar Tumballi 2019-07-02 04:03:42 UTC
Considering we have done significant work on clearing the memory in 'fini()' (due to brick-mux and also self-heal-mux features), inclined towards CLOSING this. We will open specific bugs per component if anything is pending.


Note You need to log in before you can comment on or make changes to this bug.