1543708 – glusterd fails to attach brick during restart of the node

Bug 1543708 - glusterd fails to attach brick during restart of the node

Summary: glusterd fails to attach brick during restart of the node

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:	brick-multiplexing
Depends On:	1540607
Blocks:	1535732 1540600 1543706 glusterfs-3.12.8
TreeView+	depends on / blocked

Reported:	2018-02-09 03:40 UTC by Atin Mukherjee
Modified:	2018-04-24 06:53 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.12.8
Clone Of:	1540607
Environment:
Last Closed:	2018-04-24 06:53:38 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Worker Ant 2018-02-09 04:06:47 UTC

REVIEW: https://review.gluster.org/19532 (glusterd: import volumes in separate synctask) posted (#1) for review on release-3.12 by Atin Mukherjee

Comment 2 Atin Mukherjee 2018-02-28 04:57:15 UTC

Description of problem:
In a 3 node cluster with brick multiplexing is enabled, when one of the node is down and a volume goes through some option changes through volume set, on reboot of the node all the bricks fail to attach and hence looses the brick multiplexing feature. And other observation is the entire handshake process becomes very very slow and can take even hours and in between if some one brings down glusterd then we're going to loose certain volume info files.


Version-Release number of selected component (if applicable):
3.12.2

How reproducible:
Always

Steps to Reproduce:
1. Create a 3 node cluster, enable brick multiplexing and setup 20 1 X 3 volumes and start them.
2. Now bring down glusterd on first node and perform volume set operation for all 20 volumes from any of the other nodes.
3. bring back glusterd instance on 1st node.

Actual results:
Bricks failed to attach and multiplexing mode is lost. And handshake becomes damn slow.

Expected results:
Bricks should come up in a multiplexed mode.

Comment 3 Worker Ant 2018-03-23 05:34:00 UTC

REVIEW: https://review.gluster.org/19532 (glusterd: import volumes in separate synctask) posted (#5) for review on release-3.12 by Atin Mukherjee

Comment 4 Worker Ant 2018-04-06 12:46:59 UTC

COMMIT: https://review.gluster.org/19532 committed in release-3.12 by "jiffin tony Thottan" <jthottan> with a commit message- glusterd: import volumes in separate synctask

With brick multiplexing, to attach a brick to an existing brick process
the prerequisite is to have the compatible brick to finish it's
initialization and portmap sign in and hence the thread might have to go
to a sleep and context switch the synctask to allow the brick process to
communicate with glusterd. In normal code path, this works fine as
glusterd_restart_bricks () is launched through a separate synctask.

In case there's a mismatch of the volume when glusterd restarts,
glusterd_import_friend_volume is invoked and then it tries to call
glusterd_start_bricks () from the main thread which eventually may land
into the similar situation. Now since this is not done through a
separate synctask, the 1st brick will never be able to get its turn to
finish all of its handshaking and as a consequence to it, all the bricks
will fail to get attached to it.

Solution : Execute import volume and glusterd restart bricks in separate
synctask. Importing snaps had to be also done through synctask as
there's a dependency of the parent volume need to be available for the
importing snap functionality to work.

>mainline patch : https://review.gluster.org/#/c/19357/
                  https://review.gluster.org/#/c/19536/

Change-Id: I290b244d456afcc9b913ab30be4af040d340428c
BUG: 1543708
Signed-off-by: Atin Mukherjee <amukherj>

Comment 5 Jiffin 2018-04-24 06:53:38 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.8, please open a new bug report.

glusterfs-3.12.8 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-devel/2018-April/054749.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.