REVIEW: https://review.gluster.org/19532 (glusterd: import volumes in separate synctask) posted (#1) for review on release-3.12 by Atin Mukherjee
Description of problem:
In a 3 node cluster with brick multiplexing is enabled, when one of the node is down and a volume goes through some option changes through volume set, on reboot of the node all the bricks fail to attach and hence looses the brick multiplexing feature. And other observation is the entire handshake process becomes very very slow and can take even hours and in between if some one brings down glusterd then we're going to loose certain volume info files.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a 3 node cluster, enable brick multiplexing and setup 20 1 X 3 volumes and start them.
2. Now bring down glusterd on first node and perform volume set operation for all 20 volumes from any of the other nodes.
3. bring back glusterd instance on 1st node.
Bricks failed to attach and multiplexing mode is lost. And handshake becomes damn slow.
Bricks should come up in a multiplexed mode.
REVIEW: https://review.gluster.org/19532 (glusterd: import volumes in separate synctask) posted (#5) for review on release-3.12 by Atin Mukherjee
COMMIT: https://review.gluster.org/19532 committed in release-3.12 by "jiffin tony Thottan" <firstname.lastname@example.org> with a commit message- glusterd: import volumes in separate synctask
With brick multiplexing, to attach a brick to an existing brick process
the prerequisite is to have the compatible brick to finish it's
initialization and portmap sign in and hence the thread might have to go
to a sleep and context switch the synctask to allow the brick process to
communicate with glusterd. In normal code path, this works fine as
glusterd_restart_bricks () is launched through a separate synctask.
In case there's a mismatch of the volume when glusterd restarts,
glusterd_import_friend_volume is invoked and then it tries to call
glusterd_start_bricks () from the main thread which eventually may land
into the similar situation. Now since this is not done through a
separate synctask, the 1st brick will never be able to get its turn to
finish all of its handshaking and as a consequence to it, all the bricks
will fail to get attached to it.
Solution : Execute import volume and glusterd restart bricks in separate
synctask. Importing snaps had to be also done through synctask as
there's a dependency of the parent volume need to be available for the
importing snap functionality to work.
>mainline patch : https://review.gluster.org/#/c/19357/
Signed-off-by: Atin Mukherjee <email@example.com>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.8, please open a new bug report.
glusterfs-3.12.8 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.