Bug 1481600 - rpc: client_t and related objects leaked due to incorrect ref counts
Summary: rpc: client_t and related objects leaked due to incorrect ref counts
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Milind Changire
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1487033 1487036
TreeView+ depends on / blocked
 
Reported: 2017-08-15 07:13 UTC by Milind Changire
Modified: 2017-12-08 17:38 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.13.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1487033 1487036 (view as bug list)
Environment:
Last Closed: 2017-12-08 17:38:25 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Milind Changire 2017-08-15 07:13:59 UTC
Description of problem:
Problem:
1. asymmetrical ref counting
First call to gf_client_get() creates a new client_t object and sets the bind 
count and the ref count to 1. Additional gf_client_get() just increments the bind 
count but not the ref count. This is causing confusion as to when should the ref 
count be decremented on a gf_client_put(), since currently gf_client_put() only 
decrements the bind count

2. missing unref on some handshake and glusterfs program actors server_submit_reply() on following actors is called with frame pointer as NULL 
Handshake actors:
GETSPEC:
    can't add ref on this request since client obj hasn't been created when this 
    request hits the server; so this is good; no unref required either;
    The tests, however, request a the spec by an explicit
    'gluster system getspec' command after a SETVOLUME. So there would be a ref
    that needs to be accounted for.

SETVOLUME:
    does gf_client_get() which inits bind count and ref count to 1; this is good 
    case;

SET_LK_VER:
    Actually the actor function does a gf_client_get() as well as a 
    gf_client_put(). But the problem is rpcsvc_request_create() path adds a ref 
    to the client_t on receiving this request but fails to drop the ref in 
    server_submit_reply() since the frame pointer passed is NULL.

PING:
    rpcsvc_request_init() adds a ref but can't unref the request since frame is 
    NULL in server_submit_reply()

Glusterfs actors:
RELEASE:
    rpcsvc_request_init() adds a ref but can't unref the request since frame is 
    NULL in server_submit_reply()

RELEASE_DIR:
    rpcsvc_request_init() adds a ref but can't unref the request since frame is 
    NULL in server_submit_reply()

NULL:
    rpcsvc_request_init() adds a ref but can't unref the request since frame is 
    NULL in server_submit_reply() 

Version-Release number of selected component (if applicable):


How reproducible:
always (with valgrind)

Comment 1 Worker Ant 2017-08-16 14:08:37 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#9) for review on master by Milind Changire (mchangir)

Comment 2 Worker Ant 2017-08-22 10:18:15 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#10) for review on master by Milind Changire (mchangir)

Comment 3 Worker Ant 2017-08-22 10:22:15 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#11) for review on master by Milind Changire (mchangir)

Comment 4 Worker Ant 2017-08-22 10:23:30 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#12) for review on master by Milind Changire (mchangir)

Comment 5 Worker Ant 2017-08-30 03:23:20 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#13) for review on master by Milind Changire (mchangir)

Comment 6 Worker Ant 2017-08-30 03:26:32 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#14) for review on master by Milind Changire (mchangir)

Comment 7 Worker Ant 2017-08-30 03:31:12 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#15) for review on master by Milind Changire (mchangir)

Comment 8 Worker Ant 2017-08-30 05:52:37 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#16) for review on master by Milind Changire (mchangir)

Comment 9 Worker Ant 2017-08-30 05:56:02 UTC
REVIEW: https://review.gluster.org/17982 (rpc: destroy transport after client_t) posted (#17) for review on master by Milind Changire (mchangir)

Comment 10 Worker Ant 2017-08-31 03:45:13 UTC
COMMIT: https://review.gluster.org/17982 committed in master by Raghavendra G (rgowdapp) 
------
commit 24b95089a18a6a40e7703cb344e92025d67f3086
Author: Milind Changire <mchangir>
Date:   Wed Aug 30 11:25:29 2017 +0530

    rpc: destroy transport after client_t
    
    Problem:
    1. Ref counting increment on the client_t object is done in
       rpcsvc_request_init() which is incorrect.
    2. Ref not taken when delegating to grace_time_handler()
    
    Solution:
    1. Only fop requests which require processing down the graph via
       stack 'frames' now ref count the request in get_frame_from_request()
    2. Take ref on client_t object in server_rpc_notify() but avoid
       dropping in RPCSVC_EVENT_TRANSPORT_DESRTROY. Drop the ref
       unconditionally when exiting out of grace_time_handler().
       Also, avoid dropping ref on client_t in
       RPCSVC_EVENT_TRANSPORT_DESTROY when ref mangement as been
       delegated to grace_time_handler()
    
    Change-Id: Ic16246bebc7ea4490545b26564658f4b081675e4
    BUG: 1481600
    Reported-by: Raghavendra G <rgowdapp>
    Signed-off-by: Milind Changire <mchangir>
    Reviewed-on: https://review.gluster.org/17982
    Tested-by: Raghavendra G <rgowdapp>
    Reviewed-by: Raghavendra G <rgowdapp>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>

Comment 11 Milind Changire 2017-08-31 05:59:16 UTC
Correction to the problem description:

Description of problem:
Problem:
1. incorrectly placed gf_client_get() in rpc_request_init()
   gf_client_ref() in rpc_request_init() should be moved to 
   get_frame_from_request()

2. incorrect ref handling in server_rpc_notify() and grace_time_handler()
   2.1 last ref count on client_t should be dropped in 
       RPCSVC_EVENT_TRANSPORT_DESTROY only for non-grace-time-handling case
   2.2 ref should be taken on client_t before being delegated to 
       grace_tim_handler()
   2.3 ref should be dropped from client_t in server_setvolume() when the 
       grace_time_handler() is successfully canceled for a re-connected client

Comment 12 Shyamsundar 2017-12-08 17:38:25 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.