Description of problem: We have a 6 node cluster where only one node we are seeing glusterd die in systemctl: * glusterd.service loaded failed failed GlusterFS, a clustered file-system server In messages we see: Apr 23 10:36:49 rv-gluster-node03 systemd: Stopping GlusterFS, a clustered file-system server... Apr 23 10:36:49 rv-gluster-node03 systemd: glusterd.service: main process exited, code=exited, status=15/n/a Apr 23 10:36:49 rv-gluster-node03 systemd: Unit glusterd.service entered failed state. Apr 23 10:36:49 rv-gluster-node03 systemd: glusterd.service failed. Apr 23 10:36:49 rv-gluster-node03 systemd: Starting GlusterFS, a clustered file-system server... In glusterd log: [2018-04-24 18:12:02.021650] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-04-24 18:12:02.021690] I [MSGID: 106499] [glusterd-handler.c:4370:__glusterd_handle_status_volume] 0-management: Received status volume req for volume rv-gluster-ctdb [2018-04-24 18:13:25.124344] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-04-24 18:13:25.124373] I [MSGID: 106488] [glusterd-handler.c:1540:__glusterd_handle_cli_get_volume] 0-management: Received get vol req [2018-04-24 18:13:25.124534] I [socket.c:3659:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2018-04-24 18:13:25.124545] E [rpcsvc.c:1333:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to r pc-transport (socket.management) [2018-04-24 18:13:25.124560] E [MSGID: 106430] [glusterd-utils.c:539:glusterd_submit_reply] 0-glusterd: Reply submission failed [2018-04-24 18:14:43.203322] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-04-24 18:14:43.203483] I [socket.c:3659:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2018-04-24 18:14:43.203499] E [rpcsvc.c:1333:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to r pc-transport (socket.management) [2018-04-24 18:14:43.203344] I [MSGID: 106488] [glusterd-handler.c:1540:__glusterd_handle_cli_get_volume] 0-management: Received get vol req [2018-04-24 18:14:43.203513] E [MSGID: 106430] [glusterd-utils.c:539:glusterd_submit_reply] 0-glusterd: Reply submission failed I am pretty sure NFS is failing because /sbin/rpc.statd is a child of glusterd. Version-Release number of selected component (if applicable): glusterfs-3.8.4-54.el7rhgs.x86_64 Fri Jan 12 16:23:58 2018 How reproducible: Randomly but often. Steps to Reproduce: 1. Normal operation 2. Check systemctl status glusterd 3. Check messages Actual results: Gluserd dies / NFS clients loose access for a short period of time. Expected results: Normal operation. Additional info: This very impactful as automated job running against gluster NFS mounts fail.