Created attachment 1519119 [details] Glusterfs client logs Description of problem: After turning the Read ahead translator "off", some of the clients (fuse) got disconnected (umount) to one of the data volumes. Attached the glusterfs logs from the client that experienced the disconnect. The following is an excerpt of the messages in the glusterfs/data.log.* logfiles: --- [2019-01-07 07:40:44.625789] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: switched to graph 8 [2019-01-07 07:40:44.629594] I [MSGID: 114021] [client.c:2369:notify] 6-el_data-client-0: current graph is no longer active, destroying rpc_client [2019-01-07 07:40:44.629651] I [MSGID: 114021] [client.c:2369:notify] 6-el_data-client-1: current graph is no longer active, destroying rpc_client [2019-01-07 07:40:44.629668] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 6-el_data-client-0: disconnected from el_data-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-01-07 07:40:44.629724] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 6-el_data-client-1: disconnected from el_data-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2019-01-07 07:40:44.629732] E [MSGID: 108006] [afr-common.c:5118:__afr_handle_child_down_event] 6-el_data-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2019-01-07 07:40:44.869481] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt: Volume file changed [2019-01-07 07:40:44.916540] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt: Volume file changed ---- Version-Release number of selected component (if applicable): 3.12.13 How reproducible: Turn the Read ahead translator "off" on the server side. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
(In reply to Amgad from comment #0) > Created attachment 1519119 [details] > Glusterfs client logs > > Description of problem: > > After turning the Read ahead translator "off", some of the clients (fuse) > got disconnected (umount) to one of the data volumes. Attached the glusterfs > logs from the client that experienced the disconnect. > > The following is an excerpt of the messages in the glusterfs/data.log.* > logfiles: > --- > [2019-01-07 07:40:44.625789] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: > switched to graph 8 > [2019-01-07 07:40:44.629594] I [MSGID: 114021] [client.c:2369:notify] > 6-el_data-client-0: current graph is no longer active, destroying rpc_client > [2019-01-07 07:40:44.629651] I [MSGID: 114021] [client.c:2369:notify] > 6-el_data-client-1: current graph is no longer active, destroying rpc_client > [2019-01-07 07:40:44.629668] I [MSGID: 114018] > [client.c:2285:client_rpc_notify] 6-el_data-client-0: disconnected from > el_data-client-0. Client process will keep trying to connect to glusterd > until brick's port is available > [2019-01-07 07:40:44.629724] I [MSGID: 114018] > [client.c:2285:client_rpc_notify] 6-el_data-client-1: disconnected from > el_data-client-1. Client process will keep trying to connect to glusterd > until brick's port is available > [2019-01-07 07:40:44.629732] E [MSGID: 108006] > [afr-common.c:5118:__afr_handle_child_down_event] 6-el_data-replicate-0: All > subvolumes are down. Going offline until atleast one of them comes back up. > [2019-01-07 07:40:44.869481] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt: > Volume file changed > [2019-01-07 07:40:44.916540] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt: > Volume file changed > ---- > > Version-Release number of selected component (if applicable): > 3.12.13 > > How reproducible: > Turn the Read ahead translator "off" on the server side. > Do you mean read-ahead or readdir-ahead? They are 2 different translators and the memory leak was in readdir-ahead. Do clients lose access to the volume and do you see errors on the mount point? The graph switch messages in the logs are expected. @Amar, please assign the BZ to the appropriate person.
Thanks Nithya: It is the "readdir-ahead" translator. Yes the client loses access to the volume and we saw the graph switch messages. For instance in the attached logs, you can see: ------- [2019-01-04 04:55:51.453074] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: switched to graph 4 [2019-01-04 04:55:51.454990] I [MSGID: 114021] [client.c:2369:notify] 2-el_data-client-0: current graph is no longer active, destroying rpc_client [2019-01-04 04:55:51.455028] I [MSGID: 114021] [client.c:2369:notify] 2-el_data-client-1: current graph is no longer active, destroying rpc_client [2019-01-04 04:55:51.455037] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 2-el_data-client-0: disconnected from el_data-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-01-04 04:55:51.455080] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 2-el_data-client-1: disconnected from el_data-client-1. Client process will keep trying to connect to glusterd until brick's port is available -------- Regards, Amgad
> [2019-01-04 04:55:51.453074] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: switched to graph 4 Notice the graph number is 4 here. > [2019-01-04 04:55:51.454990] I [MSGID: 114021] [client.c:2369:notify] 2-el_data-client-0: current graph is no longer active, destroying rpc_client > [2019-01-04 04:55:51.455028] I [MSGID: 114021] [client.c:2369:notify] 2-el_data-client-1: current graph is no longer active, destroying rpc_client > [2019-01-04 04:55:51.455037] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 2-el_data-client-0: disconnected from el_data-client-0. Client process will keep trying to connect to glusterd until brick's port is available > [2019-01-04 04:55:51.455080] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 2-el_data-client-1: disconnected from el_data-client-1. Client process will keep trying to connect to glusterd until brick's port is available All the above logs are from graph 2, (2-el_data-client-0 signifies, in graph 2, volume name el_data, and client-0 is protocol/client translator). So this is expected. I don't think this is the reason for the umount. These logs can turn up in any such graph change options (like write-behind on/off, read-ahead on/off, etc). Not sure if this is enough to debug this umount issues.
Thanks Amar: The full log-set on the client experienced the disconnect was attached to the ticket for your investigation. Please let us know if any additional traces can be turned-on when it happens again to help debugging on your side. Regards, Amgad
Amgad, we recommend to you to upgrade to glusterfs-6.1 (or 6.2 which comes out in another week), so you can avail some of the latest fixes.
Hi Amgad, I tried to reproduce this issue on 3.12.13 but could not see any issue with Toggling readdir-ahead translator. None of clients face unmount of volumes. There was nothing suspicious on my set up. Same goes with the client logs provided by you. As amar already mentioned these graph switch messages are expected with toggle of any translator. Are you still using 3.12.13 and experiencing the disconnect?
Hi Amgad, As 3.12 is already EOL, recommended is to be on a release that is still maintained. If you are still facing this issue do let us know, else would like to close this issue?
I tried to reproduce this with several cases and didn't ran into the scenario where toggling readdir-ahead causes clients to unmount. Closing this bug as WORKSFORME.
upgraded to release 5.5, then 6.5+ to avoid the issue
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days