Current way of handling rpc programs in glusterd is not ideal with backward/forward compatibility issues. With RPC, every procedure should be having a separate 'actor', so same procedure in different versions can be forwarded to different 'actor' functions. Current way of having switch case based on procedure number (procnum) won't work if the same procedure needs some XDR modification, and we are bumping the version number of the program. For example, the patch http://patches.gluster.com/patch/5988/ does modification of XDR structure, but this can't be handled properly with current implementation.
PATCH: http://patches.gluster.com/patch/6222 in master (glusterd: make it more RPC friendly)
Before "glusterd_handle_rpc_msg" was the actor for most of the procedures, and glusterd_handle_rpc_msg used to call different functions specific for the procedure number in swich case based mechanish. This is the structure which shows the actor for different procedure numbers. rpcsvc_actor_t glusterd1_mgmt_actors[] = { [GD_MGMT_NULL] = { "NULL", GD_MGMT_NULL, glusterd_null, NULL, NULL}, [GD_MGMT_PROBE_QUERY] = { "PROBE_QUERY", GD_MGMT_PROBE_QUERY, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_FRIEND_ADD] = { "FRIEND_ADD", GD_MGMT_FRIEND_ADD, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_FRIEND_REMOVE] = { "FRIEND_REMOVE", GD_MGMT_FRIEND_REMOVE, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_FRIEND_UPDATE] = { "FRIEND_UPDATE", GD_MGMT_FRIEND_UPDATE, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLUSTER_LOCK] = { "CLUSTER_LOCK", GD_MGMT_CLUSTER_LOCK, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLUSTER_UNLOCK] = { "CLUSTER_UNLOCK", GD_MGMT_CLUSTER_UNLOCK, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_STAGE_OP] = { "STAGE_OP", GD_MGMT_STAGE_OP, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_COMMIT_OP] = { "COMMIT_OP", GD_MGMT_COMMIT_OP, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_PROBE] = { "CLI_PROBE", GD_MGMT_CLI_PROBE, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_CREATE_VOLUME] = { "CLI_CREATE_VOLUME", GD_MGMT_CLI_CREATE_VOLUME, glusterd_handle_rpc_msg, NULL,NULL}, [GD_MGMT_CLI_DEFRAG_VOLUME] = { "CLI_DEFRAG_VOLUME", GD_MGMT_CLI_DEFRAG_VOLUME, glusterd_handle_defrag_volume, NULL,NULL}, [GD_MGMT_CLI_DEPROBE] = { "FRIEND_REMOVE", GD_MGMT_CLI_DEPROBE, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_LIST_FRIENDS] = { "LIST_FRIENDS", GD_MGMT_CLI_LIST_FRIENDS, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_START_VOLUME] = { "START_VOLUME", GD_MGMT_CLI_START_VOLUME, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_STOP_VOLUME] = { "STOP_VOLUME", GD_MGMT_CLI_STOP_VOLUME, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_DELETE_VOLUME] = { "DELETE_VOLUME", GD_MGMT_CLI_DELETE_VOLUME, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_GET_VOLUME] = { "GET_VOLUME", GD_MGMT_CLI_GET_VOLUME, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_ADD_BRICK] = { "ADD_BRICK", GD_MGMT_CLI_ADD_BRICK, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_REPLACE_BRICK] = { "REPLACE_BRICK", GD_MGMT_CLI_REPLACE_BRICK, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_REMOVE_BRICK] = { "REMOVE_BRICK", GD_MGMT_CLI_REMOVE_BRICK, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_LOG_FILENAME] = { "LOG FILENAME", GD_MGMT_CLI_LOG_FILENAME, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_LOG_LOCATE] = { "LOG LOCATE", GD_MGMT_CLI_LOG_LOCATE, glusterd_handle_log_locate, NULL, NULL}, [GD_MGMT_CLI_LOG_ROTATE] = { "LOG FILENAME", GD_MGMT_CLI_LOG_ROTATE, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_SET_VOLUME] = { "SET_VOLUME", GD_MGMT_CLI_SET_VOLUME, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_SYNC_VOLUME] = { "SYNC_VOLUME", GD_MGMT_CLI_SYNC_VOLUME, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_RESET_VOLUME] = { "RESET_VOLUME", GD_MGMT_CLI_RESET_VOLUME, glusterd_handle_rpc_msg, NULL, NULL}, [GD_MGMT_CLI_FSM_LOG] = { "FSM_LOG", GD_MGMT_CLI_FSM_LOG, glusterd_handle_rpc_msg, NULL, NULL} }; But now abobe method has been changed. And we have different actors for different procedures (instead of having single actor which calls different procedure specific functions on switch case). This is the new table showing the actors for different procedures. rpcsvc_actor_t glusterd1_mgmt_actors[] = { [GD_MGMT_NULL] = { "NULL", GD_MGMT_NULL, glusterd_null, NULL, NULL}, [GD_MGMT_PROBE_QUERY] = { "PROBE_QUERY", GD_MGMT_PROBE_QUERY, glusterd_handle_probe_query, NULL, NULL}, [GD_MGMT_FRIEND_ADD] = { "FRIEND_ADD", GD_MGMT_FRIEND_ADD, glusterd_handle_incoming_friend_req, NULL, NULL}, [GD_MGMT_FRIEND_REMOVE] = { "FRIEND_REMOVE", GD_MGMT_FRIEND_REMOVE, glusterd_handle_incoming_unfriend_req, NULL, NULL}, [GD_MGMT_FRIEND_UPDATE] = { "FRIEND_UPDATE", GD_MGMT_FRIEND_UPDATE, glusterd_handle_friend_update, NULL, NULL}, [GD_MGMT_CLUSTER_LOCK] = { "CLUSTER_LOCK", GD_MGMT_CLUSTER_LOCK, glusterd_handle_cluster_lock, NULL, NULL}, [GD_MGMT_CLUSTER_UNLOCK] = { "CLUSTER_UNLOCK", GD_MGMT_CLUSTER_UNLOCK, glusterd_handle_cluster_unlock, NULL, NULL}, [GD_MGMT_STAGE_OP] = { "STAGE_OP", GD_MGMT_STAGE_OP, glusterd_handle_stage_op, NULL, NULL}, [GD_MGMT_COMMIT_OP] = { "COMMIT_OP", GD_MGMT_COMMIT_OP, glusterd_handle_commit_op, NULL, NULL}, [GD_MGMT_CLI_PROBE] = { "CLI_PROBE", GD_MGMT_CLI_PROBE, glusterd_handle_cli_probe, NULL, NULL}, [GD_MGMT_CLI_CREATE_VOLUME] = { "CLI_CREATE_VOLUME", GD_MGMT_CLI_CREATE_VOLUME, glusterd_handle_create_volume, NULL,NULL}, [GD_MGMT_CLI_DEFRAG_VOLUME] = { "CLI_DEFRAG_VOLUME", GD_MGMT_CLI_DEFRAG_VOLUME, glusterd_handle_defrag_volume, NULL,NULL}, [GD_MGMT_CLI_DEPROBE] = { "FRIEND_REMOVE", GD_MGMT_CLI_DEPROBE, glusterd_handle_cli_deprobe, NULL, NULL}, [GD_MGMT_CLI_LIST_FRIENDS] = { "LIST_FRIENDS", GD_MGMT_CLI_LIST_FRIENDS, glusterd_handle_cli_list_friends, NULL, NULL}, [GD_MGMT_CLI_START_VOLUME] = { "START_VOLUME", GD_MGMT_CLI_START_VOLUME, glusterd_handle_cli_start_volume, NULL, NULL}, [GD_MGMT_CLI_STOP_VOLUME] = { "STOP_VOLUME", GD_MGMT_CLI_STOP_VOLUME, glusterd_handle_cli_stop_volume, NULL, NULL}, [GD_MGMT_CLI_DELETE_VOLUME] = { "DELETE_VOLUME", GD_MGMT_CLI_DELETE_VOLUME, glusterd_handle_cli_delete_volume, NULL, NULL}, [GD_MGMT_CLI_GET_VOLUME] = { "GET_VOLUME", GD_MGMT_CLI_GET_VOLUME, glusterd_handle_cli_get_volume, NULL, NULL}, [GD_MGMT_CLI_ADD_BRICK] = { "ADD_BRICK", GD_MGMT_CLI_ADD_BRICK, glusterd_handle_add_brick, NULL, NULL}, [GD_MGMT_CLI_REPLACE_BRICK] = { "REPLACE_BRICK", GD_MGMT_CLI_REPLACE_BRICK, glusterd_handle_replace_brick, NULL, NULL}, [GD_MGMT_CLI_REMOVE_BRICK] = { "REMOVE_BRICK", GD_MGMT_CLI_REMOVE_BRICK, glusterd_handle_remove_brick, NULL, NULL}, [GD_MGMT_CLI_LOG_FILENAME] = { "LOG FILENAME", GD_MGMT_CLI_LOG_FILENAME, glusterd_handle_log_filename, NULL, NULL}, [GD_MGMT_CLI_LOG_LOCATE] = { "LOG LOCATE", GD_MGMT_CLI_LOG_LOCATE, glusterd_handle_log_locate, NULL, NULL}, [GD_MGMT_CLI_LOG_ROTATE] = { "LOG FILENAME", GD_MGMT_CLI_LOG_ROTATE, glusterd_handle_log_rotate, NULL, NULL}, [GD_MGMT_CLI_SET_VOLUME] = { "SET_VOLUME", GD_MGMT_CLI_SET_VOLUME, glusterd_handle_set_volume, NULL, NULL}, [GD_MGMT_CLI_SYNC_VOLUME] = { "SYNC_VOLUME", GD_MGMT_CLI_SYNC_VOLUME, glusterd_handle_sync_volume, NULL, NULL}, [GD_MGMT_CLI_RESET_VOLUME] = { "RESET_VOLUME", GD_MGMT_CLI_RESET_VOLUME, glusterd_handle_reset_volume, NULL, NULL}, [GD_MGMT_CLI_FSM_LOG] = { "FSM_LOG", GD_MGMT_CLI_FSM_LOG, glusterd_handle_fsm_log, NULL, NULL}, [GD_MGMT_CLI_GSYNC_SET] = {"GSYNC_SET", GD_MGMT_CLI_GSYNC_SET, glusterd_handle_gsync_set, NULL, NULL}, };
I will be reopening this bug to accomplish few more things: * Get away with current implementation of having common actor for different procedures and having a switch case --> This patch got in master branch, for which bug is verified too. * Get away with incremental enums for MGMT and CLI procedure values : fix looks something like -> http://patches.gluster.com/patch/5985/ but I will be sending in some thing more cleaner. * Bring in clear distinction between 'cli <---> glusterd' and 'glusterd <---> glusterd' procedures/programs. -> the above point also will be handled with this fix. * Keep the RPC related code separate from the translator/transport/module specific code. (check for client.c using procedure function pointing to specific RPC function). -> fix looks something like http://patches.gluster.com/patch/6263/ * Clean up the glusterd op state machine code and make sure the XDR structures are not passed between functions, instead make use of 'dict_t *'
(In reply to comment #3) > > * Get away with incremental enums for MGMT and CLI procedure values : fix looks > something like -> http://patches.gluster.com/patch/5985/ but I will be sending > in some thing more cleaner. http://patches.gluster.com/patch/6270/ > > * Bring in clear distinction between 'cli <---> glusterd' and 'glusterd <---> > glusterd' procedures/programs. > -> the above point also will be handled with this fix. > http://patches.gluster.com/patch/6270/ > * Keep the RPC related code separate from the translator/transport/module > specific code. (check for client.c using procedure function pointing to > specific RPC function). > -> fix looks something like http://patches.gluster.com/patch/6263/ http://patches.gluster.com/patch/6268/ and http://patches.gluster.com/patch/6269/ > > * Clean up the glusterd op state machine code and make sure the XDR structures > are not passed between functions, instead make use of 'dict_t *' This can be treated bit lower priority compared to above 3 patches. These patches are based out of git head commit 020c175f61f0727f59add7c41acad6871c3474de Author: Lakshmipathi G <lakshmipathi> Date: Tue Feb 22 16:29:07 2011 +0800 This may break some of already submitted patches. But I still think this fix is critical and should go in first, and other fixes in glusterd should be made above this. I will be sending rebalance fix patches above these patches.
PATCH: http://patches.gluster.com/patch/6268 in master (glusterd: keep mgmt program peerinfo specific)
PATCH: http://patches.gluster.com/patch/6269 in master (glusterd: dependency on 'priv->mgmt' completely removed)
PATCH: http://patches.gluster.com/patch/6270 in master (glusterd: separate out cli specific programs and mgmt specific programs)
PATCH: http://patches.gluster.com/patch/6295 in master (glusterd: remove rpc code from internals of glusterd)
(In reply to comment #4) > > > > * Clean up the glusterd op state machine code and make sure the XDR structures > > are not passed between functions, instead make use of 'dict_t *' > > This can be treated bit lower priority compared to above 3 patches. > This is also done now with http://patches.gluster.com/patch/6295 making glusterd RPC friendly. -Amar
PATCH: http://patches.gluster.com/patch/6317 in master (gluster rebalance: send the proper 'procnum' to glusterd)