Bug 1575403 - Mount dismounts every few hours by itself [NEEDINFO]
Summary: Mount dismounts every few hours by itself
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 27
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kaleb KEITHLEY
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-06 15:56 UTC by customercare
Modified: 2018-08-29 15:09 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-29 15:09:49 UTC
Type: Bug
Embargoed:
jforbes: needinfo?


Attachments (Terms of Use)

Description customercare 2018-05-06 15:56:50 UTC
Description of problem:

I have a cluster running for 6 months+ now between two nodes, mounted on the f27 server with:

 mount -t glusterfs serverA:/gv0 /opt/root/home

Only one server got updated to F27, the other is still F26 (latest updates installed). The cluster worked quite good with both beeing F26,
but now it auto failes after some hours.

On F27 Server:

I mounted it again this morning, like i did yesterday and several other times since the update to f27. 

As you can see in the logfile, i got an nfs lib error at 7 oclock, no idea why, i never had the nfs component installed before. Just to be on the safe side, i installed it on F27 ( ServerA ).

As it looks, it just worked for 7 hours, than the ServerA failed and unmounted the glusterfs homepartition.

Before i upgrade any other server it needs to run stable. Any i idea why i suddenly failes after the switch to F27 ?


Version-Release number of selected component (if applicable):

glusterfs-server-3.12.9-1.fc27.x86_64
glusterfs-libs-3.12.9-1.fc27.x86_64
glusterfs-3.12.9-1.fc27.x86_64
glusterfs-fuse-3.12.9-1.fc27.x86_64
glusterfs-client-xlators-3.12.9-1.fc27.x86_64
glusterfs-cli-3.12.9-1.fc27.x86_64
glusterfs-api-3.12.9-1.fc27.x86_64
glusterfs-gnfs-3.12.9-1.fc27.x86_64



How reproducible:

repeadetly since upgrade to Fedora27

############## LOGFILE:

[2018-05-06 07:00:34.802143] W [MSGID: 101095] [xlator.c:162:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.12.9/xlator/nfs/server.so: cannot open shared object file: No such file or directory
The message "W [MSGID: 101095] [xlator.c:162:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.12.9/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 30 times between [2018-05-06 07:00:34.802143] and [2018-05-06 07:00:34.80293
3]
[2018-05-06 11:48:56.655721] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2018-05-06 11:48:57.369668] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536
[2018-05-06 11:48:57.369746] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory
[2018-05-06 11:48:57.369768] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory
[2018-05-06 11:48:57.426644] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.12.9/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2018-05-06 11:48:57.426690] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2018-05-06 11:48:57.426786] W [rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2018-05-06 11:48:57.426807] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2018-05-06 11:48:57.727272] I [MSGID: 106228] [glusterd.c:499:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2018-05-06 11:48:57.824477] I [MSGID: 106513] [glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[2018-05-06 11:48:58.143837] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: a75a171e-2799-4e02-b0da-596828b04355
[2018-05-06 11:48:58.596488] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2018-05-06 11:48:58.596653] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2018-05-06 11:48:58.596702] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-05-06 11:48:58.596877] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2018-05-06 11:48:58.781336] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2018-05-06 11:48:58.781548] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2018-05-06 11:48:58.781579] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 10
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:  
+------------------------------------------------------------------------------+
[2018-05-06 11:48:58.791922] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-05-06 11:48:58.824784] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2018-05-06 11:48:58.835238] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2018-05-06 11:48:58.835273] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped
[2018-05-06 11:48:58.835306] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd service
[2018-05-06 11:48:59.840046] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2018-05-06 11:48:59.840355] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped
[2018-05-06 11:48:59.840397] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad service is stopped
[2018-05-06 11:48:59.840445] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2018-05-06 11:48:59.840624] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2018-05-06 11:48:59.840649] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped
[2018-05-06 11:48:59.840689] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2018-05-06 11:48:59.840844] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2018-05-06 11:48:59.840869] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped
[2018-05-06 11:48:59.840938] I [glusterd-utils.c:6047:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/brick1/gv0
[2018-05-06 11:48:59.848912] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-05-06 11:49:00.587434] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2018-05-06 11:49:00.635892] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-05-06 11:49:00.637297] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick ServerA:/data/brick1/gv0 has disconnected from glusterd.
[2018-05-06 11:49:00.789226] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31004
[2018-05-06 11:49:01.079940] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1, host: ServerB, port: 0
[2018-05-06 11:49:01.461366] I [glusterd-utils.c:5953:glusterd_brick_start] 0-management: discovered already-running brick /data/brick1/gv0
[2018-05-06 11:49:01.461418] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152
[2018-05-06 11:49:01.461524] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 11:49:01.461555] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2018-05-06 11:49:01.461646] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 11:49:01.461746] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 11:49:02.846303] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to ServerB (0), ret: 0, op_ret: 0
[2018-05-06 11:49:03.149341] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 11:49:03.149391] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2018-05-06 11:49:03.149473] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 11:49:03.670573] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152
[2018-05-06 13:17:36.791406] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2018-05-06 13:17:37.281142] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536
[2018-05-06 13:17:37.281213] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory
[2018-05-06 13:17:37.281236] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory
[2018-05-06 13:17:37.420527] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.12.9/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2018-05-06 13:17:37.420567] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2018-05-06 13:17:37.420676] W [rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2018-05-06 13:17:37.420689] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2018-05-06 13:17:37.739849] I [MSGID: 106228] [glusterd.c:499:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2018-05-06 13:17:37.812542] I [MSGID: 106513] [glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[2018-05-06 13:17:37.905139] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: a75a171e-2799-4e02-b0da-596828b04355
[2018-05-06 13:17:38.488994] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2018-05-06 13:17:38.489145] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2018-05-06 13:17:38.489205] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-05-06 13:17:38.489422] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2018-05-06 13:17:38.874413] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2018-05-06 13:17:38.874709] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2018-05-06 13:17:38.874754] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 10
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:  
+------------------------------------------------------------------------------+
[2018-05-06 13:17:38.881539] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-05-06 13:17:39.026688] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2018-05-06 13:17:39.027684] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2018-05-06 13:17:39.027723] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped
[2018-05-06 13:17:39.027756] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd service
[2018-05-06 13:17:40.032724] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2018-05-06 13:17:40.033035] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped
[2018-05-06 13:17:40.033077] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad service is stopped
[2018-05-06 13:17:40.033123] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2018-05-06 13:17:40.033338] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2018-05-06 13:17:40.033364] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped
[2018-05-06 13:17:40.033403] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2018-05-06 13:17:40.033560] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2018-05-06 13:17:40.033584] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped
[2018-05-06 13:17:40.033651] I [glusterd-utils.c:6047:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/brick1/gv0
[2018-05-06 13:17:40.042423] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-05-06 13:17:40.149586] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2018-05-06 13:17:40.327280] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-05-06 13:17:40.354649] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick ServerA:/data/brick1/gv0 has disconnected from glusterd.
[2018-05-06 13:17:40.453435] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31004
[2018-05-06 13:17:40.586867] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 13:17:42.561652] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to ServerB (0), ret: 0, op_ret: 0
[2018-05-06 13:17:42.566492] I [glusterd-utils.c:5953:glusterd_brick_start] 0-management: discovered already-running brick /data/brick1/gv0
[2018-05-06 13:17:42.566526] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152
[2018-05-06 13:17:42.566676] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1, host: ServerB, port: 0
[2018-05-06 13:17:42.571582] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 13:17:42.571612] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2018-05-06 13:17:42.571723] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152
[2018-05-06 13:17:42.571814] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 13:17:42.571854] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 13:17:42.571874] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2018-05-06 13:17:42.573342] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 14:57:23.197351] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2018-05-06 14:57:23.807082] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536
[2018-05-06 14:57:23.807156] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory
[2018-05-06 14:57:23.807179] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory
[2018-05-06 14:57:23.883044] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.12.9/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2018-05-06 14:57:23.883084] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2018-05-06 14:57:23.883144] W [rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2018-05-06 14:57:23.883158] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2018-05-06 14:57:24.159501] I [MSGID: 106228] [glusterd.c:499:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2018-05-06 14:57:24.309030] I [MSGID: 106513] [glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[2018-05-06 14:57:24.425746] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: a75a171e-2799-4e02-b0da-596828b04355
[2018-05-06 14:57:25.182967] I [MSGID: 106498] [glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2018-05-06 14:57:25.183078] W [MSGID: 106062] [glusterd-handler.c:3400:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2018-05-06 14:57:25.183109] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-05-06 14:57:25.183217] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-management: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2018-05-06 14:57:25.394753] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2018-05-06 14:57:25.394930] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2018-05-06 14:57:25.394957] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 10
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:  
+------------------------------------------------------------------------------+
[2018-05-06 14:57:25.398286] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-05-06 14:57:25.496473] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2018-05-06 14:57:25.513176] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2018-05-06 14:57:25.513238] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped
[2018-05-06 14:57:25.513364] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd service
[2018-05-06 14:57:26.519720] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2018-05-06 14:57:26.520040] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped
[2018-05-06 14:57:26.520082] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad service is stopped
[2018-05-06 14:57:26.520131] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2018-05-06 14:57:26.520311] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2018-05-06 14:57:26.520336] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped
[2018-05-06 14:57:26.520379] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2018-05-06 14:57:26.520724] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2018-05-06 14:57:26.520752] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped
[2018-05-06 14:57:26.520827] I [glusterd-utils.c:6047:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/brick1/gv0
[2018-05-06 14:57:26.531661] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-05-06 14:57:27.075567] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2018-05-06 14:57:27.188774] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-05-06 14:57:27.193467] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick ServerA:/data/brick1/gv0 has disconnected from glusterd.
[2018-05-06 14:57:27.223127] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31004
[2018-05-06 14:57:27.366381] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 14:57:29.109957] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to ServerB (0), ret: 0, op_ret: 0
[2018-05-06 14:57:29.702712] I [glusterd-utils.c:5953:glusterd_brick_start] 0-management: discovered already-running brick /data/brick1/gv0
[2018-05-06 14:57:29.702774] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152
[2018-05-06 14:57:29.702894] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1, host: ServerB, port: 0
[2018-05-06 14:57:29.904897] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 14:57:29.904962] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2018-05-06 14:57:29.905130] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0 on port 49152
[2018-05-06 14:57:29.905246] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 14:57:29.905290] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1
[2018-05-06 14:57:29.905314] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2018-05-06 14:57:29.905482] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f14fb600-d0a3-4e7d-bd1e-ec04165a16f1

Comment 1 customercare 2018-05-06 21:46:10 UTC
Cause found:

root     pts/0        79.239.204.244   Sun May  6 23:29   still logged in
root     pts/0        79.239.204.244   Sun May  6 20:43 - 21:51  (01:07)
reboot   system boot  4.16.5-200.fc27. Sun May  6 19:29   still running
root     pts/0        79.239.204.244   Sun May  6 17:41 - crash  (01:48)
root     pts/0        79.239.204.244   Sun May  6 17:29 - 17:29  (00:00)
reboot   system boot  4.16.5-200.fc27. Sun May  6 16:57   still running
reboot   system boot  4.16.5-200.fc27. Sun May  6 15:17   still running
reboot   system boot  4.16.5-200.fc27. Sun May  6 13:48   still running
root     pts/0        79.239.204.244   Sun May  6 09:00 - 09:15  (00:14)
reboot   system boot  4.16.5-200.fc27. Sat May  5 21:53   still running
root     pts/2        79.239.207.236   Sat May  5 15:10 - 15:18  (00:08)
root     pts/1        79.239.207.236   Sat May  5 15:01 - 15:21  (00:20)
root     pts/0        79.239.207.236   Sat May  5 14:56 - 15:18  (00:21)
reboot   system boot  4.16.5-200.fc27. Sat May  5 11:59   still running
root     pts/0        79.249.249.11    Fri May  4 12:43 - 13:19  (00:36)
reboot   system boot  4.16.5-200.fc27. Fri May  4 12:24   still running
root     pts/1        79.249.249.11    Fri May  4 12:05 - 12:08  (00:02)
root     pts/0        79.249.249.11    Fri May  4 11:54 - crash  (00:29)
reboot   system boot  4.16.5-200.fc27. Fri May  4 11:54   still running
root     pts/1        79.249.249.11    Fri May  4 11:15 - 11:53  (00:37)
root     pts/0        79.249.249.11    Fri May  4 11:05 - 11:53  (00:47)

The system is crashing, and as the mount on systemstarts for glusterfs mounts does not work, because the glusterfsd starts after systemd tries to mount fstab entries, they are unmounted when i try to check them.


A) Please change this bugreports component to KERNEL

B) The Server crashes without traces. It just found it out, because /var/log/messages got half written on the crash and showed some binary content. Afterwards i found the kernel boot sequence . 

I will try the last workign f26 kernel as a temp fix.

Comment 2 customercare 2018-05-09 09:00:10 UTC
Confirmation: 

Since the Reboot with a 4.15.17-200 kernel, the server runs as smooth as ever.

kernel: 4.16.5-200 is buggy!

Comment 3 Justin M. Forbes 2018-07-23 15:14:46 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 4 Justin M. Forbes 2018-08-29 15:09:49 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.