Bug 1548953 - Client keeps trying to connect to removed brick
Summary: Client keeps trying to connect to removed brick
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-26 05:47 UTC by Nithya Balachandran
Modified: 2019-07-02 09:25 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-07-02 09:25:12 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nithya Balachandran 2018-02-26 05:47:39 UTC
Description of problem:

The fuse mount logs are flooded with messages where the client is trying to connect to an already brick.

Version-Release number of selected component (if applicable):


How reproducible:
Consistently


Steps to Reproduce:
1. Create a 3 brick pure distribute volume.
2. gluster v set <volname> client-log-level DEBUG
3. FUSE mount the volume
4. gluster v remove-brick <volname> <path to first brick> force (reply yes when prompted)
5. Check the FUSE mount logs.

Actual results:


The following messages keep getting logged:


[2018-02-26 05:42:54.371392] D [MSGID: 0] [common-utils.c:347:gf_resolve_ip6] 0-resolver: returning ip-192.168.122.6 (port-24007) for hostname: 192.168.122.6 and port: 24007
[2018-02-26 05:42:54.371854] D [socket.c:3051:socket_fix_ssl_opts] 0-vol3-client-0: disabling SSL for portmapper connection
[2018-02-26 05:42:54.373246] D [MSGID: 0] [client.c:2240:client_rpc_notify] 0-vol3-client-0: got RPC_CLNT_CONNECT
[2018-02-26 05:42:54.374799] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7fcf33cb1f77] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc4)[0x7fcf33a7ae65] (--> /usr/local/lib/libgfrpc.so.0(+0x1967e)[0x7fcf33a7b67e] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_check_and_start_ping+0x54)[0x7fcf33a7b87e] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_submit+0x4c6)[0x7fcf33a76cdd] ))))) 0-: 192.168.122.6:24007: ping timer event already removed
[2018-02-26 05:42:54.375191] D [rpc-clnt-ping.c:336:rpc_clnt_start_ping] 0-vol3-client-0: returning as transport is already disconnected OR there are no frames (1 || 1)
[2018-02-26 05:42:54.382344] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7fcf33cb1f77] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc4)[0x7fcf33a7ae65] (--> /usr/local/lib/libgfrpc.so.0(+0x1967e)[0x7fcf33a7b67e] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_check_and_start_ping+0x54)[0x7fcf33a7b87e] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_submit+0x4c6)[0x7fcf33a76cdd] ))))) 0-: 192.168.122.6:24007: ping timer event already removed
[2018-02-26 05:42:54.382825] D [rpc-clnt-ping.c:336:rpc_clnt_start_ping] 0-vol3-client-0: returning as transport is already disconnected OR there are no frames (1 || 1)
[2018-02-26 05:42:54.384708] D [logging.c:1979:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk
[2018-02-26 05:42:54.380516] D [MSGID: 0] [client-handshake.c:1512:server_has_portmap] 0-vol3-client-0: detected portmapper on server
[2018-02-26 05:42:54.384704] D [MSGID: 0] [client-handshake.c:1571:client_query_portmap_cbk] 0-vol3-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-02-26 05:42:54.385029] D [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-02-26 05:42:54.385101] D [MSGID: 0] [client.c:2269:client_rpc_notify] 0-vol3-client-0: got RPC_CLNT_DISCONNECT
[2018-02-26 05:42:54.385184] D [MSGID: 0] [client.c:2293:client_rpc_notify] 0-vol3-client-0: disconnected from vol3-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2018-02-26 05:42:54.387099] D [rpc-clnt-ping.c:99:rpc_clnt_remove_ping_timer_locked] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7fcf33cb1f77] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0xc4)[0x7fcf33a7ae65] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc9)[0x7fcf33a74631] (--> /usr/local/lib/libgfrpc.so.0(+0x13044)[0x7fcf33a75044] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x193)[0x7fcf33a752c9] ))))) 0-: 192.168.122.6:24007: ping timer event already removed
^C



0-vol3-client-0 is the brick that was removed.


Expected results:

The older graph should not retry connections to the removed brick. 

Additional info:

Comment 1 Yaniv Kaul 2019-07-01 06:05:16 UTC
Status?

Comment 2 Nithya Balachandran 2019-07-01 06:59:59 UTC
(In reply to Yaniv Kaul from comment #1)
> Status?

This is an rpc issue. @Raghavendra, Can you please update?

Comment 3 Raghavendra G 2019-07-02 05:58:12 UTC
(In reply to Nithya Balachandran from comment #2)
> (In reply to Yaniv Kaul from comment #1)
> > Status?
> 
> This is an rpc issue. @Raghavendra, Can you please update?

Is the removed brick removed from graph even before remove-brick is committed? Can you explain what are the effects of remove-brick and remove-brick commit on the graph?

Comment 4 Nithya Balachandran 2019-07-02 06:06:44 UTC
(In reply to Raghavendra G from comment #3)
> (In reply to Nithya Balachandran from comment #2)
> > (In reply to Yaniv Kaul from comment #1)
> > > Status?
> > 
> > This is an rpc issue. @Raghavendra, Can you please update?
> 
> Is the removed brick removed from graph even before remove-brick is
> committed? Can you explain what are the effects of remove-brick and
> remove-brick commit on the graph?

A remove-brick force or remove-brick commit will update the graph to remove the bricks.

This was a while ago but IIRC, the old graph kept trying to connect to the brick even though it was no longer active. The new graph connections worked as expected.

Comment 5 Raghavendra G 2019-07-02 09:25:12 UTC
I tried the test myself. I did see client connecting to removed brick even after "remove-brick force". However, those msgs went away after I did an ls on the mountpoint. This behaviour is expected as fuse-bridge only switches to new graph (that doesn't have the removed-brick) in the first fop from mount point after the new graph is constructed. Since, the old graph is still active, rpc continues to try connecting to removed brick.

I am closing this as "Not a Bug" as this is working as the design. However, if you feel this behaviour of mounts is not acceptable, please re-open the bug changing the component to "fuse" (fuse-bridge) as that is the component responsible for maintaining which is the active graph.


Note You need to log in before you can comment on or make changes to this bug.