1571477 – GlusterD dies which interrupts access to gNFS based mounts.

Bug 1571477 - GlusterD dies which interrupts access to gNFS based mounts.

Summary: GlusterD dies which interrupts access to gNFS based mounts.

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Atin Mukherjee
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-24 23:34 UTC by Ben Turner
Modified:	2021-09-09 13:01 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-28 02:07:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ben Turner 2018-04-24 23:34:44 UTC

Description of problem:

We have a 6 node cluster where only one node we are seeing glusterd die in systemctl:

* glusterd.service loaded failed failed GlusterFS, a clustered file-system server

In messages we see:

Apr 23 10:36:49 rv-gluster-node03 systemd: Stopping GlusterFS, a clustered file-system server...
Apr 23 10:36:49 rv-gluster-node03 systemd: glusterd.service: main process exited, code=exited, status=15/n/a
Apr 23 10:36:49 rv-gluster-node03 systemd: Unit glusterd.service entered failed state.
Apr 23 10:36:49 rv-gluster-node03 systemd: glusterd.service failed.
Apr 23 10:36:49 rv-gluster-node03 systemd: Starting GlusterFS, a clustered file-system server...

In glusterd log:

[2018-04-24 18:12:02.021650] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-04-24 18:12:02.021690] I [MSGID: 106499] [glusterd-handler.c:4370:__glusterd_handle_status_volume] 0-management: Received status volume req for volume rv-gluster-ctdb
[2018-04-24 18:13:25.124344] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-04-24 18:13:25.124373] I [MSGID: 106488] [glusterd-handler.c:1540:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2018-04-24 18:13:25.124534] I [socket.c:3659:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2018-04-24 18:13:25.124545] E [rpcsvc.c:1333:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to r
pc-transport (socket.management)
[2018-04-24 18:13:25.124560] E [MSGID: 106430] [glusterd-utils.c:539:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2018-04-24 18:14:43.203322] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-04-24 18:14:43.203483] I [socket.c:3659:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2018-04-24 18:14:43.203499] E [rpcsvc.c:1333:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 5) to r
pc-transport (socket.management)
[2018-04-24 18:14:43.203344] I [MSGID: 106488] [glusterd-handler.c:1540:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2018-04-24 18:14:43.203513] E [MSGID: 106430] [glusterd-utils.c:539:glusterd_submit_reply] 0-glusterd: Reply submission failed

I am pretty sure NFS is failing because /sbin/rpc.statd
 is a child of glusterd.

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-54.el7rhgs.x86_64                           Fri Jan 12 16:23:58 2018

How reproducible:

Randomly but often.

Steps to Reproduce:
1.  Normal operation
2.  Check systemctl status glusterd
3.  Check messages

Actual results:

Gluserd dies / NFS clients loose access for a short period of time.

Expected results:

Normal operation.

Additional info:

This very impactful as automated job running against gluster NFS mounts fail.

Note You need to log in before you can comment on or make changes to this bug.