REVIEW: https://review.gluster.org/19532 (glusterd: import volumes in separate synctask) posted (#1) for review on release-3.12 by Atin Mukherjee
Description of problem: In a 3 node cluster with brick multiplexing is enabled, when one of the node is down and a volume goes through some option changes through volume set, on reboot of the node all the bricks fail to attach and hence looses the brick multiplexing feature. And other observation is the entire handshake process becomes very very slow and can take even hours and in between if some one brings down glusterd then we're going to loose certain volume info files. Version-Release number of selected component (if applicable): 3.12.2 How reproducible: Always Steps to Reproduce: 1. Create a 3 node cluster, enable brick multiplexing and setup 20 1 X 3 volumes and start them. 2. Now bring down glusterd on first node and perform volume set operation for all 20 volumes from any of the other nodes. 3. bring back glusterd instance on 1st node. Actual results: Bricks failed to attach and multiplexing mode is lost. And handshake becomes damn slow. Expected results: Bricks should come up in a multiplexed mode.
REVIEW: https://review.gluster.org/19532 (glusterd: import volumes in separate synctask) posted (#5) for review on release-3.12 by Atin Mukherjee
COMMIT: https://review.gluster.org/19532 committed in release-3.12 by "jiffin tony Thottan" <jthottan> with a commit message- glusterd: import volumes in separate synctask With brick multiplexing, to attach a brick to an existing brick process the prerequisite is to have the compatible brick to finish it's initialization and portmap sign in and hence the thread might have to go to a sleep and context switch the synctask to allow the brick process to communicate with glusterd. In normal code path, this works fine as glusterd_restart_bricks () is launched through a separate synctask. In case there's a mismatch of the volume when glusterd restarts, glusterd_import_friend_volume is invoked and then it tries to call glusterd_start_bricks () from the main thread which eventually may land into the similar situation. Now since this is not done through a separate synctask, the 1st brick will never be able to get its turn to finish all of its handshaking and as a consequence to it, all the bricks will fail to get attached to it. Solution : Execute import volume and glusterd restart bricks in separate synctask. Importing snaps had to be also done through synctask as there's a dependency of the parent volume need to be available for the importing snap functionality to work. >mainline patch : https://review.gluster.org/#/c/19357/ https://review.gluster.org/#/c/19536/ Change-Id: I290b244d456afcc9b913ab30be4af040d340428c BUG: 1543708 Signed-off-by: Atin Mukherjee <amukherj>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.8, please open a new bug report. glusterfs-3.12.8 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-devel/2018-April/054749.html [2] https://www.gluster.org/pipermail/gluster-users/