Bug 764065 (GLUSTER-2333)

Summary: make glusterd more rpc friendly
Product: [Community] GlusterFS Reporter: Amar Tumballi <amarts>
Component: glusterdAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 3.1.2CC: gluster-bugs, rabhat, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTNR Mount Type: ---
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amar Tumballi 2011-01-29 05:21:39 UTC
Current way of handling rpc programs in glusterd is not ideal with backward/forward compatibility issues. With RPC, every procedure should be having a separate 'actor', so same procedure in different versions can be forwarded to different 'actor' functions.

Current way of having switch case based on procedure number (procnum) won't work if the same procedure needs some XDR modification, and we are bumping the version number of the program.

For example, the patch http://patches.gluster.com/patch/5988/ does modification of XDR structure, but this can't be handled properly with current implementation.

Comment 1 Anand Avati 2011-02-22 07:28:32 UTC
PATCH: http://patches.gluster.com/patch/6222 in master (glusterd: make it more RPC friendly)

Comment 2 Raghavendra Bhat 2011-02-23 03:25:55 UTC
Before "glusterd_handle_rpc_msg" was the actor for most of the procedures, and glusterd_handle_rpc_msg used to call different functions specific for the procedure number in swich case based mechanish.

This is the structure which shows the actor for different procedure numbers.



rpcsvc_actor_t glusterd1_mgmt_actors[] = {
        [GD_MGMT_NULL]        = { "NULL",       GD_MGMT_NULL, glusterd_null, NULL, NULL},
        [GD_MGMT_PROBE_QUERY] = { "PROBE_QUERY", GD_MGMT_PROBE_QUERY, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_FRIEND_ADD] = { "FRIEND_ADD", GD_MGMT_FRIEND_ADD, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_FRIEND_REMOVE] = { "FRIEND_REMOVE", GD_MGMT_FRIEND_REMOVE, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_FRIEND_UPDATE] = { "FRIEND_UPDATE", GD_MGMT_FRIEND_UPDATE, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLUSTER_LOCK] = { "CLUSTER_LOCK", GD_MGMT_CLUSTER_LOCK, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLUSTER_UNLOCK] = { "CLUSTER_UNLOCK", GD_MGMT_CLUSTER_UNLOCK, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_STAGE_OP] = { "STAGE_OP", GD_MGMT_STAGE_OP, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_COMMIT_OP] = { "COMMIT_OP", GD_MGMT_COMMIT_OP, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_PROBE] = { "CLI_PROBE", GD_MGMT_CLI_PROBE, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_CREATE_VOLUME] = { "CLI_CREATE_VOLUME", GD_MGMT_CLI_CREATE_VOLUME, glusterd_handle_rpc_msg, NULL,NULL},
        [GD_MGMT_CLI_DEFRAG_VOLUME] = { "CLI_DEFRAG_VOLUME", GD_MGMT_CLI_DEFRAG_VOLUME, glusterd_handle_defrag_volume, NULL,NULL},
        [GD_MGMT_CLI_DEPROBE] = { "FRIEND_REMOVE", GD_MGMT_CLI_DEPROBE, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_LIST_FRIENDS] = { "LIST_FRIENDS", GD_MGMT_CLI_LIST_FRIENDS, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_START_VOLUME] = { "START_VOLUME", GD_MGMT_CLI_START_VOLUME, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_STOP_VOLUME] = { "STOP_VOLUME", GD_MGMT_CLI_STOP_VOLUME, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_DELETE_VOLUME] = { "DELETE_VOLUME", GD_MGMT_CLI_DELETE_VOLUME, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_GET_VOLUME] = { "GET_VOLUME", GD_MGMT_CLI_GET_VOLUME, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_ADD_BRICK] = { "ADD_BRICK", GD_MGMT_CLI_ADD_BRICK, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_REPLACE_BRICK] = { "REPLACE_BRICK", GD_MGMT_CLI_REPLACE_BRICK, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_REMOVE_BRICK] = { "REMOVE_BRICK", GD_MGMT_CLI_REMOVE_BRICK, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_LOG_FILENAME] = { "LOG FILENAME", GD_MGMT_CLI_LOG_FILENAME, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_LOG_LOCATE] = { "LOG LOCATE", GD_MGMT_CLI_LOG_LOCATE, glusterd_handle_log_locate, NULL, NULL},
        [GD_MGMT_CLI_LOG_ROTATE] = { "LOG FILENAME", GD_MGMT_CLI_LOG_ROTATE, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_SET_VOLUME] = { "SET_VOLUME", GD_MGMT_CLI_SET_VOLUME, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_SYNC_VOLUME] = { "SYNC_VOLUME", GD_MGMT_CLI_SYNC_VOLUME, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_RESET_VOLUME] = { "RESET_VOLUME", GD_MGMT_CLI_RESET_VOLUME, glusterd_handle_rpc_msg, NULL, NULL},
        [GD_MGMT_CLI_FSM_LOG] = { "FSM_LOG", GD_MGMT_CLI_FSM_LOG, glusterd_handle_rpc_msg, NULL, NULL}
};




But now abobe method has been changed. And we have different actors for different procedures (instead of having single actor which calls different procedure specific functions on switch case). This is the new table showing the actors for different procedures.



rpcsvc_actor_t glusterd1_mgmt_actors[] = {
        [GD_MGMT_NULL]        = { "NULL",       GD_MGMT_NULL, glusterd_null, NULL, NULL},
        [GD_MGMT_PROBE_QUERY] = { "PROBE_QUERY", GD_MGMT_PROBE_QUERY, glusterd_handle_probe_query, NULL, NULL},
        [GD_MGMT_FRIEND_ADD] = { "FRIEND_ADD", GD_MGMT_FRIEND_ADD, glusterd_handle_incoming_friend_req, NULL, NULL},
        [GD_MGMT_FRIEND_REMOVE] = { "FRIEND_REMOVE", GD_MGMT_FRIEND_REMOVE, glusterd_handle_incoming_unfriend_req, NULL, NULL},
        [GD_MGMT_FRIEND_UPDATE] = { "FRIEND_UPDATE", GD_MGMT_FRIEND_UPDATE, glusterd_handle_friend_update, NULL, NULL},
        [GD_MGMT_CLUSTER_LOCK] = { "CLUSTER_LOCK", GD_MGMT_CLUSTER_LOCK, glusterd_handle_cluster_lock, NULL, NULL},
        [GD_MGMT_CLUSTER_UNLOCK] = { "CLUSTER_UNLOCK", GD_MGMT_CLUSTER_UNLOCK, glusterd_handle_cluster_unlock, NULL, NULL},
        [GD_MGMT_STAGE_OP] = { "STAGE_OP", GD_MGMT_STAGE_OP, glusterd_handle_stage_op, NULL, NULL},
        [GD_MGMT_COMMIT_OP] = { "COMMIT_OP", GD_MGMT_COMMIT_OP, glusterd_handle_commit_op, NULL, NULL},
        [GD_MGMT_CLI_PROBE] = { "CLI_PROBE", GD_MGMT_CLI_PROBE, glusterd_handle_cli_probe, NULL, NULL},
        [GD_MGMT_CLI_CREATE_VOLUME] = { "CLI_CREATE_VOLUME", GD_MGMT_CLI_CREATE_VOLUME, glusterd_handle_create_volume, NULL,NULL},
        [GD_MGMT_CLI_DEFRAG_VOLUME] = { "CLI_DEFRAG_VOLUME", GD_MGMT_CLI_DEFRAG_VOLUME, glusterd_handle_defrag_volume, NULL,NULL},
        [GD_MGMT_CLI_DEPROBE] = { "FRIEND_REMOVE", GD_MGMT_CLI_DEPROBE, glusterd_handle_cli_deprobe, NULL, NULL},
        [GD_MGMT_CLI_LIST_FRIENDS] = { "LIST_FRIENDS", GD_MGMT_CLI_LIST_FRIENDS, glusterd_handle_cli_list_friends, NULL, NULL},
        [GD_MGMT_CLI_START_VOLUME] = { "START_VOLUME", GD_MGMT_CLI_START_VOLUME, glusterd_handle_cli_start_volume, NULL, NULL},
        [GD_MGMT_CLI_STOP_VOLUME] = { "STOP_VOLUME", GD_MGMT_CLI_STOP_VOLUME, glusterd_handle_cli_stop_volume, NULL, NULL},
        [GD_MGMT_CLI_DELETE_VOLUME] = { "DELETE_VOLUME", GD_MGMT_CLI_DELETE_VOLUME, glusterd_handle_cli_delete_volume, NULL, NULL},
        [GD_MGMT_CLI_GET_VOLUME] = { "GET_VOLUME", GD_MGMT_CLI_GET_VOLUME, glusterd_handle_cli_get_volume, NULL, NULL},
        [GD_MGMT_CLI_ADD_BRICK] = { "ADD_BRICK", GD_MGMT_CLI_ADD_BRICK, glusterd_handle_add_brick, NULL, NULL},
        [GD_MGMT_CLI_REPLACE_BRICK] = { "REPLACE_BRICK", GD_MGMT_CLI_REPLACE_BRICK, glusterd_handle_replace_brick, NULL, NULL},
        [GD_MGMT_CLI_REMOVE_BRICK] = { "REMOVE_BRICK", GD_MGMT_CLI_REMOVE_BRICK, glusterd_handle_remove_brick, NULL, NULL},
        [GD_MGMT_CLI_LOG_FILENAME] = { "LOG FILENAME", GD_MGMT_CLI_LOG_FILENAME, glusterd_handle_log_filename, NULL, NULL},
        [GD_MGMT_CLI_LOG_LOCATE] = { "LOG LOCATE", GD_MGMT_CLI_LOG_LOCATE, glusterd_handle_log_locate, NULL, NULL},
        [GD_MGMT_CLI_LOG_ROTATE] = { "LOG FILENAME", GD_MGMT_CLI_LOG_ROTATE, glusterd_handle_log_rotate, NULL, NULL},
        [GD_MGMT_CLI_SET_VOLUME] = { "SET_VOLUME", GD_MGMT_CLI_SET_VOLUME, glusterd_handle_set_volume, NULL, NULL},
        [GD_MGMT_CLI_SYNC_VOLUME] = { "SYNC_VOLUME", GD_MGMT_CLI_SYNC_VOLUME, glusterd_handle_sync_volume, NULL, NULL},
        [GD_MGMT_CLI_RESET_VOLUME] = { "RESET_VOLUME", GD_MGMT_CLI_RESET_VOLUME, glusterd_handle_reset_volume, NULL, NULL},
        [GD_MGMT_CLI_FSM_LOG] = { "FSM_LOG", GD_MGMT_CLI_FSM_LOG, glusterd_handle_fsm_log, NULL, NULL},

        [GD_MGMT_CLI_GSYNC_SET] = {"GSYNC_SET", GD_MGMT_CLI_GSYNC_SET, glusterd_handle_gsync_set, NULL, NULL},
};

Comment 3 Amar Tumballi 2011-02-25 11:01:33 UTC
I will be reopening this bug to accomplish few more things:


* Get away with current implementation of having common actor for different procedures and having a switch case --> This patch got in master branch, for which bug is verified too.

* Get away with incremental enums for MGMT and CLI procedure values : fix looks something like -> http://patches.gluster.com/patch/5985/ but I will be sending in some thing more cleaner.

* Bring in clear distinction between 'cli <---> glusterd' and 'glusterd <---> glusterd' procedures/programs.
  -> the above point also will be handled with this fix.

* Keep the RPC related code separate from the translator/transport/module specific code.  (check for client.c using procedure function pointing to specific RPC function). 
  -> fix looks something like http://patches.gluster.com/patch/6263/

* Clean up the glusterd op state machine code and  make sure the XDR structures are not passed between functions, instead make use of 'dict_t *'

Comment 4 Amar Tumballi 2011-02-25 11:15:46 UTC
(In reply to comment #3)
> 
> * Get away with incremental enums for MGMT and CLI procedure values : fix looks
> something like -> http://patches.gluster.com/patch/5985/ but I will be sending
> in some thing more cleaner.

http://patches.gluster.com/patch/6270/

> 
> * Bring in clear distinction between 'cli <---> glusterd' and 'glusterd <--->
> glusterd' procedures/programs.
>   -> the above point also will be handled with this fix.
>
 
http://patches.gluster.com/patch/6270/

> * Keep the RPC related code separate from the translator/transport/module
> specific code.  (check for client.c using procedure function pointing to
> specific RPC function). 
>   -> fix looks something like http://patches.gluster.com/patch/6263/

http://patches.gluster.com/patch/6268/ and http://patches.gluster.com/patch/6269/

> 
> * Clean up the glusterd op state machine code and  make sure the XDR structures
> are not passed between functions, instead make use of 'dict_t *'

This can be treated bit lower priority compared to above 3 patches. 

These patches are based out of git head

commit 020c175f61f0727f59add7c41acad6871c3474de
Author: Lakshmipathi G <lakshmipathi>
Date:   Tue Feb 22 16:29:07 2011 +0800

This may break some of already submitted patches. But I still think this fix is critical and should go in first, and other fixes in glusterd should be made above this. I will be sending rebalance fix patches above these patches.

Comment 5 Anand Avati 2011-03-01 07:05:43 UTC
PATCH: http://patches.gluster.com/patch/6268 in master (glusterd: keep mgmt program peerinfo specific)

Comment 6 Anand Avati 2011-03-01 07:05:50 UTC
PATCH: http://patches.gluster.com/patch/6269 in master (glusterd: dependency on 'priv->mgmt' completely removed)

Comment 7 Anand Avati 2011-03-01 07:05:54 UTC
PATCH: http://patches.gluster.com/patch/6270 in master (glusterd: separate out cli specific programs and mgmt specific programs)

Comment 8 Anand Avati 2011-03-01 07:05:59 UTC
PATCH: http://patches.gluster.com/patch/6295 in master (glusterd: remove rpc code from internals of glusterd)

Comment 9 Amar Tumballi 2011-03-01 07:10:41 UTC
(In reply to comment #4)

> > 
> > * Clean up the glusterd op state machine code and  make sure the XDR structures
> > are not passed between functions, instead make use of 'dict_t *'
> 
> This can be treated bit lower priority compared to above 3 patches. 
> 

This is also done now with http://patches.gluster.com/patch/6295 making glusterd RPC friendly.

-Amar

Comment 10 Anand Avati 2011-03-01 20:10:49 UTC
PATCH: http://patches.gluster.com/patch/6317 in master (gluster rebalance: send the proper 'procnum' to glusterd)