1703343 – Bricks fail to come online after node reboot on a scaled setup

Bug 1703343 - Bricks fail to come online after node reboot on a scaled setup

Summary: Bricks fail to come online after node reboot on a scaled setup

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Mohit Agrawal
QA Contact:
Docs Contact:
URL:
Whiteboard:	brick-multiplexing
Depends On:
Blocks:	1638192
TreeView+	depends on / blocked

Reported:	2019-04-26 07:28 UTC by Mohit Agrawal
Modified:	2020-03-12 14:30 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:	1638192
Environment:
Last Closed:	2020-03-12 14:30:20 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22635	0	None	Abandoned	glusterd: Multiple bricks are spawned if a node is reboot	2019-08-13 19:30:36 UTC

Comment 1 Mohit Agrawal 2019-04-26 07:48:16 UTC

Multiple bricks are spawned on a node if the node is reboot during 
volumes starting from another node in the cluster

Reproducer steps
1) Setup a cluster of 3 nodes
2) Enable brick_mux and create and start 50 volumes from node 1
3) Stop all the volumes from any node
4) Start all the volumes from node 2 after put 1 sec delay
   for i in {1..50}; do gluster v start testvol$i --mode=script; sleep 1; done
5) At the time of volumes are starting on node 2 run command on node 1
   pkill -f gluster; glusterd
6) Wait some time to finish volumes startups and check the no. of glusterfsd
   are running on node1.

RCA: At the time of glusterd starts it gets friend update request from a peer 
     node and has version changes for the volumes those are started when
     node was down.glusterd deletes volfile and reference for old version volumes
     from glusterd internal data structures and create new volfile.glusterd was not
     able to attached volume because data structure changes were happening after brick
     start so data was going through RPC packet in attach request was not correct and
     brick process sending disconnect to glusterd then glusterd try to spawn a new 
     brick so multiple brick processes are spawned

Regards,
Mohit Agrawal

Comment 2 Worker Ant 2019-04-26 07:52:56 UTC

REVIEW: https://review.gluster.org/22635 (glusterd: Multiple bricks are spawned if a node is reboot) posted (#1) for review on master by MOHIT AGRAWAL

Comment 3 Worker Ant 2020-03-12 14:30:20 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/1055, and will be tracked there from now on. Visit GitHub issues URL for further details

Note You need to log in before you can comment on or make changes to this bug.