Description of problem: I have a cluster running for 6 months+ now between two nodes, mounted on the f27 server with: mount -t glusterfs serverA:/gv0 /opt/root/home Only one server got updated to F27, the other is still F26 (latest updates installed). The cluster worked quite good with both beeing F26, but now it auto failes after some hours. On F27 Server: I mounted it again this morning, like i did yesterday and several other times since the update to f27. As you can see in the logfile, i got an nfs lib error at 7 oclock, no idea why, i never had the nfs component installed before. Just to be on the safe side, i installed it on F27 ( ServerA ). As it looks, it just worked for 7 hours, than the ServerA failed and unmounted the glusterfs homepartition. Before i upgrade any other server it needs to run stable. Any i idea why i suddenly failes after the switch to F27 ? Version-Release number of selected component (if applicable): glusterfs-server-3.12.9-1.fc27.x86_64 glusterfs-libs-3.12.9-1.fc27.x86_64 glusterfs-3.12.9-1.fc27.x86_64 glusterfs-fuse-3.12.9-1.fc27.x86_64 glusterfs-client-xlators-3.12.9-1.fc27.x86_64 glusterfs-cli-3.12.9-1.fc27.x86_64 glusterfs-api-3.12.9-1.fc27.x86_64 glusterfs-gnfs-3.12.9-1.fc27.x86_64 How reproducible: repeadetly since upgrade to Fedora27 ############## LOGFILE: [2018-05-06 07:00:34.802143] W [MSGID: 101095] [xlator.c:162:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.12.9/xlator/nfs/server.so: cannot open shared object file: No such file or directory The message "W [MSGID: 101095] [xlator.c:162:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.12.9/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 30 times between [2018-05-06 07:00:34.802143] and [2018-05-06 07:00:34.80293 3] [2018-05-06 11:48:56.655721] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2018-05-06 11:48:57.369668] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2018-05-06 11:48:57.369746] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2018-05-06 11:48:57.369768] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory [2018-05-06 11:48:57.426644] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.12.9/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2018-05-06 11:48:57.426690] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2018-05-06 11:48:57.426786] W [rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2018-05-06 11:48:57.426807] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2018-05-06 11:48:57.727272] I [MSGID: 106228] [glusterd.c:499:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory] [2018-05-06 11:48:57.824477] I [MSGID: 106513] [glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004 [2018-05-06 11:48:58.143837] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: a75a171e-2799-4e02-b0da-596828b04355 [2018-05-06 11:48:58.596488] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2018-05-06 11:48:58.596653] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2018-05-06 11:48:58.596702] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-05-06 11:48:58.596877] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2018-05-06 11:48:58.781336] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2018-05-06 11:48:58.781548] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2018-05-06 11:48:58.781579] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 10 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [2018-05-06 11:48:58.791922] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2018-05-06 11:48:58.824784] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600 [2018-05-06 11:48:58.835238] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped [2018-05-06 11:48:58.835273] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped [2018-05-06 11:48:58.835306] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd service [2018-05-06 11:48:59.840046] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600 [2018-05-06 11:48:59.840355] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped [2018-05-06 11:48:59.840397] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad service is stopped [2018-05-06 11:48:59.840445] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600 [2018-05-06 11:48:59.840624] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2018-05-06 11:48:59.840649] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped [2018-05-06 11:48:59.840689] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600 [2018-05-06 11:48:59.840844] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped [2018-05-06 11:48:59.840869] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped [2018-05-06 11:48:59.840938] I [glusterd-utils.c:6047:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/brick1/gv0 [2018-05-06 11:48:59.848912] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-05-06 11:49:00.587434] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600 [2018-05-06 11:49:00.635892] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-05-06 11:49:00.637297] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick ServerA:/data/brick1/gv0 has disconnected from glusterd. [2018-05-06 11:49:00.789226] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31004 [2018-05-06 11:49:01.079940] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1, host: ServerB, port: 0 [2018-05-06 11:49:01.461366] I [glusterd-utils.c:5953:glusterd_brick_start] 0-management: discovered already-running brick /data/brick1/gv0 [2018-05-06 11:49:01.461418] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152 [2018-05-06 11:49:01.461524] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 11:49:01.461555] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2018-05-06 11:49:01.461646] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 11:49:01.461746] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 11:49:02.846303] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to ServerB (0), ret: 0, op_ret: 0 [2018-05-06 11:49:03.149341] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 11:49:03.149391] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2018-05-06 11:49:03.149473] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 11:49:03.670573] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152 [2018-05-06 13:17:36.791406] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2018-05-06 13:17:37.281142] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2018-05-06 13:17:37.281213] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2018-05-06 13:17:37.281236] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory [2018-05-06 13:17:37.420527] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.12.9/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2018-05-06 13:17:37.420567] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2018-05-06 13:17:37.420676] W [rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2018-05-06 13:17:37.420689] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2018-05-06 13:17:37.739849] I [MSGID: 106228] [glusterd.c:499:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory] [2018-05-06 13:17:37.812542] I [MSGID: 106513] [glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004 [2018-05-06 13:17:37.905139] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: a75a171e-2799-4e02-b0da-596828b04355 [2018-05-06 13:17:38.488994] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2018-05-06 13:17:38.489145] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2018-05-06 13:17:38.489205] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-05-06 13:17:38.489422] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2018-05-06 13:17:38.874413] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2018-05-06 13:17:38.874709] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2018-05-06 13:17:38.874754] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 10 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [2018-05-06 13:17:38.881539] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2018-05-06 13:17:39.026688] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600 [2018-05-06 13:17:39.027684] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped [2018-05-06 13:17:39.027723] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped [2018-05-06 13:17:39.027756] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd service [2018-05-06 13:17:40.032724] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600 [2018-05-06 13:17:40.033035] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped [2018-05-06 13:17:40.033077] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad service is stopped [2018-05-06 13:17:40.033123] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600 [2018-05-06 13:17:40.033338] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2018-05-06 13:17:40.033364] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped [2018-05-06 13:17:40.033403] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600 [2018-05-06 13:17:40.033560] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped [2018-05-06 13:17:40.033584] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped [2018-05-06 13:17:40.033651] I [glusterd-utils.c:6047:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/brick1/gv0 [2018-05-06 13:17:40.042423] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-05-06 13:17:40.149586] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600 [2018-05-06 13:17:40.327280] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-05-06 13:17:40.354649] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick ServerA:/data/brick1/gv0 has disconnected from glusterd. [2018-05-06 13:17:40.453435] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31004 [2018-05-06 13:17:40.586867] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 13:17:42.561652] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to ServerB (0), ret: 0, op_ret: 0 [2018-05-06 13:17:42.566492] I [glusterd-utils.c:5953:glusterd_brick_start] 0-management: discovered already-running brick /data/brick1/gv0 [2018-05-06 13:17:42.566526] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152 [2018-05-06 13:17:42.566676] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1, host: ServerB, port: 0 [2018-05-06 13:17:42.571582] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 13:17:42.571612] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2018-05-06 13:17:42.571723] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152 [2018-05-06 13:17:42.571814] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 13:17:42.571854] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 13:17:42.571874] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2018-05-06 13:17:42.573342] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 14:57:23.197351] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2018-05-06 14:57:23.807082] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2018-05-06 14:57:23.807156] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2018-05-06 14:57:23.807179] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory [2018-05-06 14:57:23.883044] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.12.9/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2018-05-06 14:57:23.883084] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2018-05-06 14:57:23.883144] W [rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2018-05-06 14:57:23.883158] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2018-05-06 14:57:24.159501] I [MSGID: 106228] [glusterd.c:499:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory] [2018-05-06 14:57:24.309030] I [MSGID: 106513] [glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004 [2018-05-06 14:57:24.425746] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: a75a171e-2799-4e02-b0da-596828b04355 [2018-05-06 14:57:25.182967] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2018-05-06 14:57:25.183078] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2018-05-06 14:57:25.183109] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-05-06 14:57:25.183217] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2018-05-06 14:57:25.394753] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2018-05-06 14:57:25.394930] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2018-05-06 14:57:25.394957] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 10 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [2018-05-06 14:57:25.398286] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2018-05-06 14:57:25.496473] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600 [2018-05-06 14:57:25.513176] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped [2018-05-06 14:57:25.513238] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped [2018-05-06 14:57:25.513364] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd service [2018-05-06 14:57:26.519720] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600 [2018-05-06 14:57:26.520040] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped [2018-05-06 14:57:26.520082] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad service is stopped [2018-05-06 14:57:26.520131] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600 [2018-05-06 14:57:26.520311] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2018-05-06 14:57:26.520336] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped [2018-05-06 14:57:26.520379] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600 [2018-05-06 14:57:26.520724] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped [2018-05-06 14:57:26.520752] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped [2018-05-06 14:57:26.520827] I [glusterd-utils.c:6047:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/brick1/gv0 [2018-05-06 14:57:26.531661] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2018-05-06 14:57:27.075567] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600 [2018-05-06 14:57:27.188774] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-05-06 14:57:27.193467] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick ServerA:/data/brick1/gv0 has disconnected from glusterd. [2018-05-06 14:57:27.223127] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31004 [2018-05-06 14:57:27.366381] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 14:57:29.109957] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to ServerB (0), ret: 0, op_ret: 0 [2018-05-06 14:57:29.702712] I [glusterd-utils.c:5953:glusterd_brick_start] 0-management: discovered already-running brick /data/brick1/gv0 [2018-05-06 14:57:29.702774] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152 [2018-05-06 14:57:29.702894] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1, host: ServerB, port: 0 [2018-05-06 14:57:29.904897] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 14:57:29.904962] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2018-05-06 14:57:29.905130] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152 [2018-05-06 14:57:29.905246] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 14:57:29.905290] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1 [2018-05-06 14:57:29.905314] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2018-05-06 14:57:29.905482] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
Cause found: root pts/0 79.239.204.244 Sun May 6 23:29 still logged in root pts/0 79.239.204.244 Sun May 6 20:43 - 21:51 (01:07) reboot system boot 4.16.5-200.fc27. Sun May 6 19:29 still running root pts/0 79.239.204.244 Sun May 6 17:41 - crash (01:48) root pts/0 79.239.204.244 Sun May 6 17:29 - 17:29 (00:00) reboot system boot 4.16.5-200.fc27. Sun May 6 16:57 still running reboot system boot 4.16.5-200.fc27. Sun May 6 15:17 still running reboot system boot 4.16.5-200.fc27. Sun May 6 13:48 still running root pts/0 79.239.204.244 Sun May 6 09:00 - 09:15 (00:14) reboot system boot 4.16.5-200.fc27. Sat May 5 21:53 still running root pts/2 79.239.207.236 Sat May 5 15:10 - 15:18 (00:08) root pts/1 79.239.207.236 Sat May 5 15:01 - 15:21 (00:20) root pts/0 79.239.207.236 Sat May 5 14:56 - 15:18 (00:21) reboot system boot 4.16.5-200.fc27. Sat May 5 11:59 still running root pts/0 79.249.249.11 Fri May 4 12:43 - 13:19 (00:36) reboot system boot 4.16.5-200.fc27. Fri May 4 12:24 still running root pts/1 79.249.249.11 Fri May 4 12:05 - 12:08 (00:02) root pts/0 79.249.249.11 Fri May 4 11:54 - crash (00:29) reboot system boot 4.16.5-200.fc27. Fri May 4 11:54 still running root pts/1 79.249.249.11 Fri May 4 11:15 - 11:53 (00:37) root pts/0 79.249.249.11 Fri May 4 11:05 - 11:53 (00:47) The system is crashing, and as the mount on systemstarts for glusterfs mounts does not work, because the glusterfsd starts after systemd tries to mount fstab entries, they are unmounted when i try to check them. A) Please change this bugreports component to KERNEL B) The Server crashes without traces. It just found it out, because /var/log/messages got half written on the crash and showed some binary content. Afterwards i found the kernel boot sequence . I will try the last workign f26 kernel as a temp fix.
Confirmation: Since the Reboot with a 4.15.17-200 kernel, the server runs as smooth as ever. kernel: 4.16.5-200 is buggy!
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs. Fedora 27 has now been rebased to 4.17.7-100.fc27. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.