This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1316178 - changelog/rpc: Memory leak- rpc_clnt_t object is never freed
changelog/rpc: Memory leak- rpc_clnt_t object is never freed
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: rpc (Show other bugs)
mainline
Unspecified Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Kotresh HR
:
Depends On:
Blocks: 1359363 1359364
  Show dependency treegraph
 
Reported: 2016-03-09 10:11 EST by Kotresh HR
Modified: 2017-03-08 04:19 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1359363 (view as bug list)
Environment:
Last Closed: 2017-03-08 04:19:22 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kotresh HR 2016-03-09 10:11:59 EST
Description of problem:
changelog rpc-clnt object is never freed and the memory is leaked.
Changelog should unref the rpc-clnt object after it's usage on DISCONNECT,
since the proper ref and unref of rpc-clnt object was not built up in rpc
layer, this was not done.

Version-Release number of selected component (if applicable):
3.7.9

How reproducible:
Always

Steps to Reproduce:
1. Setup geo-replication 
2. stop and start geo-replication multiple times will leak rpc_clnt object on brick process.


Actual results:
Memory leak

Expected results:
Process should not leak memory

Additional info:
Comment 1 Vijay Bellur 2016-03-09 10:40:52 EST
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#1) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 2 Vijay Bellur 2016-03-15 08:46:33 EDT
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#2) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 3 Vijay Bellur 2016-03-20 08:59:14 EDT
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#3) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 4 Mike McCune 2016-03-28 18:53:35 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 5 Vijay Bellur 2016-07-11 05:18:33 EDT
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#4) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 6 Vijay Bellur 2016-07-13 08:25:25 EDT
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#5) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 7 Vijay Bellur 2016-07-15 11:08:44 EDT
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#6) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 8 Vijay Bellur 2016-07-21 03:12:42 EDT
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#7) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 9 Vijay Bellur 2016-07-21 08:53:18 EDT
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#8) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 10 Vijay Bellur 2016-07-22 02:05:28 EDT
REVIEW: http://review.gluster.org/13658 (changelog/rpc: Fix rpc_clnt_t mem leaks) posted (#9) for review on master by Kotresh HR (khiremat@redhat.com)
Comment 11 Vijay Bellur 2016-07-22 11:12:55 EDT
COMMIT: http://review.gluster.org/13658 committed in master by Jeff Darcy (jdarcy@redhat.com) 
------
commit 637ce9e2e27e9f598a4a6c5a04cd339efaa62076
Author: Kotresh HR <khiremat@redhat.com>
Date:   Mon Mar 7 11:45:07 2016 +0530

    changelog/rpc: Fix rpc_clnt_t mem leaks
    
    PROBLEM:
       1. Freeing up rpc_clnt object might lead to crashes. Well,
          it was not a necessity to free rpc-clnt object till now
          because all the existing use cases needs to reconnect
          back on disconnects. Hence timer code was not taking
          ref on rpc-clnt object.
    
          Glusterd had some use-cases that led to crash due to
          ping-timer and they fixed only those code paths that
          involve ping-timer.
    
          Now, since changelog has an use-case where rpc-clnt
          need to be freed up, we need to fix timer code to take
          refs
    
       2. In changelog, because of issue 1, only mydata was being
          freed which is incorrect. And there are races where
          rpc-clnt object would access the freed mydata which
          would lead to crashes.
    
          Since changelog xlator resides on brick side and is long
          living process, if multiple libgfchangelog consumers
          register to changelog and disconnect/reconnect mulitple
          times, it would result in leak of 'rpc-clnt' object
          for every connect/disconnect.
    
    SOLUTION:
       1. Handle ref/unref of 'rpc_clnt' structure in timer
          functions properly.
       2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT
          after disabling timers and free mydata on RPC_CLNT_DESTROY.
    
    RPC SETUP IN CHANGELOG:
       1. changelog xlator initiates rpc server say 'changelog_rpc_server'
       2. libgfchangelog initiates one rpc server say 'libgfchangelog_rpc_server'
       3. libgfchangelog initiates rpc client and connects to 'changelog_rpc_server'
       4. In return changelog_rpc_server initiates a rpc client and connects back
          to 'libgfchangelog_rpc_server'
    
    REF/UNREF HANDLING IN TIMER FUNCTIONS:
    Let's say rpc clnt refcount = 1
       1. Take the ref before reigstering callback to timer queue
               >>>>  rpc_clnt_ref (say ref count becomes = 2)
       2. Register a callback to timer say 'callback1'
       3. If register fails:
               >>>> rpc_clnt_unref (ref count = 1)
       4. On timer expiration, 'callback1' gets called. So unref rpc clnt at the end
          in 'callback1'. This is corresponding to ref taken in step 1
               >>>> rpc_clnt_unref (ref count = 1)
       5. The cycle from step-1 to step-4 continues....until timer cancel event happens
       6. timer cancel of say 'callback1'
               If timer cancel fails:
                     Do nothing, Step-4 would have unrefd
               If timer cancel succeeds:
                     >>>> rpc_clnt_unref (ref count = 1)
    
    Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09
    BUG: 1316178
    Signed-off-by: Kotresh HR <khiremat@redhat.com>
    Reviewed-on: http://review.gluster.org/13658
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Comment 12 Kotresh HR 2017-03-08 04:19:22 EST
v3.10.0 Contains the Fix

Note You need to log in before you can comment on or make changes to this bug.