Description of problem: The fuse mount logs are flooded with messages where the client is trying to connect to an already brick. Version-Release number of selected component (if applicable): How reproducible: Consistently Steps to Reproduce: 1. Create a 3 brick pure distribute volume. 2. gluster v set <volname> client-log-level DEBUG 3. FUSE mount the volume 4. gluster v remove-brick <volname> <path to first brick> force (reply yes when prompted) 5. Check the FUSE mount logs. Actual results: The following messages keep getting logged: [2018-02-26 05:42:54.371392] D [MSGID: 0] [common-utils.c:347:gf_resolve_ip6] 0-resolver: returning ip-192.168.122.6 (port-24007) for hostname: 192.168.122.6 and port: 24007 [2018-02-26 05:42:54.371854] D [socket.c:3051:socket_fix_ssl_opts] 0-vol3-client-0: disabling SSL for portmapper connection [2018-02-26 05:42:54.373246] D [MSGID: 0] [client.c:2240:client_rpc_notify] 0-vol3-client-0: got RPC_CLNT_CONNECT [2018-02-26 05:42:54.374799] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7fcf33cb1f77] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc4)[0x7fcf33a7ae65] (--> /usr/local/lib/libgfrpc.so.0(+0x1967e)[0x7fcf33a7b67e] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_check_and_start_ping+0x54)[0x7fcf33a7b87e] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_submit+0x4c6)[0x7fcf33a76cdd] ))))) 0-: 192.168.122.6:24007: ping timer event already removed [2018-02-26 05:42:54.375191] D [rpc-clnt-ping.c:336:rpc_clnt_start_ping] 0-vol3-client-0: returning as transport is already disconnected OR there are no frames (1 || 1) [2018-02-26 05:42:54.382344] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7fcf33cb1f77] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc4)[0x7fcf33a7ae65] (--> /usr/local/lib/libgfrpc.so.0(+0x1967e)[0x7fcf33a7b67e] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_check_and_start_ping+0x54)[0x7fcf33a7b87e] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_submit+0x4c6)[0x7fcf33a76cdd] ))))) 0-: 192.168.122.6:24007: ping timer event already removed [2018-02-26 05:42:54.382825] D [rpc-clnt-ping.c:336:rpc_clnt_start_ping] 0-vol3-client-0: returning as transport is already disconnected OR there are no frames (1 || 1) [2018-02-26 05:42:54.384708] D [logging.c:1979:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk [2018-02-26 05:42:54.380516] D [MSGID: 0] [client-handshake.c:1512:server_has_portmap] 0-vol3-client-0: detected portmapper on server [2018-02-26 05:42:54.384704] D [MSGID: 0] [client-handshake.c:1571:client_query_portmap_cbk] 0-vol3-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2018-02-26 05:42:54.385029] D [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-02-26 05:42:54.385101] D [MSGID: 0] [client.c:2269:client_rpc_notify] 0-vol3-client-0: got RPC_CLNT_DISCONNECT [2018-02-26 05:42:54.385184] D [MSGID: 0] [client.c:2293:client_rpc_notify] 0-vol3-client-0: disconnected from vol3-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2018-02-26 05:42:54.387099] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7fcf33cb1f77] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc4)[0x7fcf33a7ae65] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc9)[0x7fcf33a74631] (--> /usr/local/lib/libgfrpc.so.0(+0x13044)[0x7fcf33a75044] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x193)[0x7fcf33a752c9] ))))) 0-: 192.168.122.6:24007: ping timer event already removed ^C 0-vol3-client-0 is the brick that was removed. Expected results: The older graph should not retry connections to the removed brick. Additional info:
Status?
(In reply to Yaniv Kaul from comment #1) > Status? This is an rpc issue. @Raghavendra, Can you please update?
(In reply to Nithya Balachandran from comment #2) > (In reply to Yaniv Kaul from comment #1) > > Status? > > This is an rpc issue. @Raghavendra, Can you please update? Is the removed brick removed from graph even before remove-brick is committed? Can you explain what are the effects of remove-brick and remove-brick commit on the graph?
(In reply to Raghavendra G from comment #3) > (In reply to Nithya Balachandran from comment #2) > > (In reply to Yaniv Kaul from comment #1) > > > Status? > > > > This is an rpc issue. @Raghavendra, Can you please update? > > Is the removed brick removed from graph even before remove-brick is > committed? Can you explain what are the effects of remove-brick and > remove-brick commit on the graph? A remove-brick force or remove-brick commit will update the graph to remove the bricks. This was a while ago but IIRC, the old graph kept trying to connect to the brick even though it was no longer active. The new graph connections worked as expected.
I tried the test myself. I did see client connecting to removed brick even after "remove-brick force". However, those msgs went away after I did an ls on the mountpoint. This behaviour is expected as fuse-bridge only switches to new graph (that doesn't have the removed-brick) in the first fop from mount point after the new graph is constructed. Since, the old graph is still active, rpc continues to try connecting to removed brick. I am closing this as "Not a Bug" as this is working as the design. However, if you feel this behaviour of mounts is not acceptable, please re-open the bug changing the component to "fuse" (fuse-bridge) as that is the component responsible for maintaining which is the active graph.