Bug 1359363 - changelog/rpc: Memory leak- rpc_clnt_t object is never freed
Summary: changelog/rpc: Memory leak- rpc_clnt_t object is never freed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 3.7.13
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On: 1316178
Blocks: 1359364
TreeView+ depends on / blocked
 
Reported: 2016-07-23 05:57 UTC by Kotresh HR
Modified: 2016-08-02 11:23 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.7.14
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1316178
: 1359364 (view as bug list)
Environment:
Last Closed: 2016-08-02 07:24:47 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kotresh HR 2016-07-23 05:57:19 UTC
+++ This bug was initially created as a clone of Bug #1316178 +++

Description of problem:
changelog rpc-clnt object is never freed and the memory is leaked.
Changelog should unref the rpc-clnt object after it's usage on DISCONNECT,
since the proper ref and unref of rpc-clnt object was not built up in rpc
layer, this was not done.

Version-Release number of selected component (if applicable):

How reproducible:
Always

Steps to Reproduce:
1. Setup geo-replication 
2. stop and start geo-replication multiple times will leak rpc_clnt object on brick process.


Actual results:
Memory leak

Expected results:
Process should not leak memory

Comment 1 Vijay Bellur 2016-07-23 05:58:53 UTC
REVIEW: http://review.gluster.org/14993 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#1) for review on release-3.7 by Kotresh HR (khiremat)

Comment 2 Vijay Bellur 2016-07-26 10:07:32 UTC
REVIEW: http://review.gluster.org/14993 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#2) for review on release-3.7 by Kotresh HR (khiremat)

Comment 3 Vijay Bellur 2016-07-27 10:13:38 UTC
COMMIT: http://review.gluster.org/14993 committed in release-3.7 by Raghavendra G (rgowdapp) 
------
commit 5a32c26bb59bdd20bdfc9ea00ce90c7d1c64aa04
Author: Kotresh HR <khiremat>
Date:   Mon Mar 7 11:45:07 2016 +0530

    changelog/rpc: Fix rpc_clnt_t mem leaks
    
    Backport of http://review.gluster.org/13658
    
    PROBLEM:
       1. Freeing up rpc_clnt object might lead to crashes. Well,
          it was not a necessity to free rpc-clnt object till now
          because all the existing use cases needs to reconnect
          back on disconnects. Hence timer code was not taking
          ref on rpc-clnt object.
    
          Glusterd had some use-cases that led to crash due to
          ping-timer and they fixed only those code paths that
          involve ping-timer.
    
          Now, since changelog has an use-case where rpc-clnt
          need to be freed up, we need to fix timer code to take
          refs
    
       2. In changelog, because of issue 1, only mydata was being
          freed which is incorrect. And there are races where
          rpc-clnt object would access the freed mydata which
          would lead to crashes.
    
          Since changelog xlator resides on brick side and is long
          living process, if multiple libgfchangelog consumers
          register to changelog and disconnect/reconnect mulitple
          times, it would result in leak of 'rpc-clnt' object
          for every connect/disconnect.
    
    SOLUTION:
       1. Handle ref/unref of 'rpc_clnt' structure in timer
          functions properly.
       2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT
          after disabling timers and free mydata on RPC_CLNT_DESTROY.
    
    RPC SETUP IN CHANGELOG:
       1. changelog xlator initiates rpc server say 'changelog_rpc_server'
       2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server'
       3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server'
       4. In return changelog_rpc_server initiates a rpc client and connects back
          to 'libgfchangelog_rpc_server'
    
    REF/UNREF HANDLING IN TIMER FUNCTIONS:
    Let's say rpc clnt refcount = 1
       1. Take the ref before reigstering callback to timer queue
               >>>>  rpc_clnt_ref (say ref count becomes = 2)
       2. Register a callback to timer say 'callback1'
       3. If register fails:
               >>>> rpc_clnt_unref (ref count = 1)
       4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end
          in 'callback1'. This is corresponding to ref taken in step 1
               >>>> rpc_clnt_unref (ref count = 1)
       5. The cycle from step-1 to step-4 continues....until timer cancel event happens
       6. timer cancel of say 'callback1'
               If timer cancel fails:
                     Do nothing, Step-4 would have unrefd
               If timer cancel succeeds:
                     >>>> rpc_clnt_unref (ref count = 1)
    
    Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09
    BUG: 1359363
    Signed-off-by: Kotresh HR <khiremat>
    (cherry picked from commit 637ce9e2e27e9f598a4a6c5a04cd339efaa62076)
    Reviewed-on: http://review.gluster.org/14993
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 4 Kaushal 2016-08-02 07:24:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.14, please open a new bug report.

glusterfs-3.7.14 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-August/050319.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.