Bug 1528641

Summary:	Brick processes fail to start
Product:	[Community] GlusterFS	Reporter:	robdewit <rob>
Component:	rpc	Assignee:	bugs <bugs>
Status:	CLOSED WORKSFORME	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, mchangir, rob, rob.dewit
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-01-07 09:17:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description robdewit 2017-12-22 14:07:14 UTC

Description of problem:


Version-Release number of selected component (if applicable): 3.12.4


How reproducible:

I can not reproduce, but I can describe the enviroment:
+ 2 node cluster on bare metal
+ 20 volumes
+ 2 fold replication
+ One vg controlled by manual heketi (SATA disks)
+ One vg controlled by Kubernetes hekeri (SSD disks)


Actual results: 

On one node (of a two node cluster) some brick processes do not start (or exit at quickly start). Both manual heketi volumes and automatic heketi volumes are affected. 

Logfile shows 
The message "I [MSGID: 106005] [glusterd-handler.c:6063:__glusterd_brick_rpc_notify] 0-management: Brick 10.10.0.68:/local.mnt/glfs0/brick7-2/brick has disconnected from glusterd." repeated 39 times between [2017-12-22 08:17:45.141586] and [2017-12-22 08:19:42
.156703]
The message "I [MSGID: 106005] [glusterd-handler.c:6063:__glusterd_brick_rpc_notify] 0-management: Brick 10.10.0.68:/var/lib/heketi/mounts/vg_27ab4f2ccdc2674a3270206903ab1cad/brick_1da8d26de2936277d5aadecee18f3591/brick has disconnected from glusterd." repeate
d 39 times between [2017-12-22 08:17:45.144381] and [2017-12-22 08:19:42.159691]
[2017-12-22 08:19:45.155640] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:45.157038] I [MSGID: 106005] [glusterd-handler.c:6063:__glusterd_brick_rpc_notify] 0-management: Brick 10.10.0.68:/local.mnt/glfs0/brick7-2/brick has disconnected from glusterd.
[2017-12-22 08:19:45.158571] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:45.160072] I [MSGID: 106005] [glusterd-handler.c:6063:__glusterd_brick_rpc_notify] 0-management: Brick 10.10.0.68:/var/lib/heketi/mounts/vg_27ab4f2ccdc2674a3270206903ab1cad/brick_1da8d26de2936277d5aadecee18f3591/brick has disconnected from glusterd.
[2017-12-22 08:19:48.155914] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:48.159153] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:51.155802] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:51.158906] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:54.156162] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:54.159027] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:57.156709] I [socket.c:2474:socket_event_handler] 0-
...and so on...

Expected results:

Having bricks running on both nodes so replication does actually add some redundancy.


Additional info:

Might be related to bug 1484885 - [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance

Comment 1 Shyamsundar 2018-10-23 14:54:22 UTC

Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 2 Milind Changire 2019-01-06 12:18:24 UTC

This might be an insufficient transport.listen-backlog case.

Rob,
Could you set the vol file option transport.listen-backlog to 1024 in the /etc/glusterfs/glusterd.vol file on both the nodes and restart the nodes and get back on the status.

In the mean time, a dump of the volume info of the all the volumes help provide an insight into the state of affairs.

Comment 3 Rob de Wit 2019-01-07 09:05:16 UTC

We've expanded the cluster with another node since then and this behavior has not occurred after that as far as I recall.

Could this have been caused by the number of volumes or rather by some latency in I/O (disk or network)?

I'd rather not mess with the settings since the cluster have been running OK now for several months.

Comment 4 Milind Changire 2019-01-07 09:14:04 UTC

Glad to hear things are working for you.

My hypothesis is that glusterd starting a large number of bricks causes a flood/rush of brick process to attempt to connect back to glusterd. This causes SYN Flooding and eventually drop of connection requests causing loss of service due to insufficient resources for holding connection requests until they are acknowledged. Hence the reference to tweak the glusterd vol file option: transport.listen-backlog.

You could take a look at /var/log/messages and "grep -i" for "SYN Flooding" and see if that's the case. 

If things are working for you, you could close this BZ as WORKSFORME.