Bug 1703343 - Bricks fail to come online after node reboot on a scaled setup
Summary: Bricks fail to come online after node reboot on a scaled setup
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard: brick-multiplexing
Depends On:
Blocks: 1638192
TreeView+ depends on / blocked
 
Reported: 2019-04-26 07:28 UTC by Mohit Agrawal
Modified: 2020-03-12 14:30 UTC (History)
10 users (show)

Fixed In Version:
Clone Of: 1638192
Environment:
Last Closed: 2020-03-12 14:30:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 22635 0 None Abandoned glusterd: Multiple bricks are spawned if a node is reboot 2019-08-13 19:30:36 UTC

Comment 1 Mohit Agrawal 2019-04-26 07:48:16 UTC
Multiple bricks are spawned on a node if the node is reboot during 
volumes starting from another node in the cluster

Reproducer steps
1) Setup a cluster of 3 nodes
2) Enable brick_mux and create and start 50 volumes from node 1
3) Stop all the volumes from any node
4) Start all the volumes from node 2 after put 1 sec delay
   for i in {1..50}; do gluster v start testvol$i --mode=script; sleep 1; done
5) At the time of volumes are starting on node 2 run command on node 1
   pkill -f gluster; glusterd
6) Wait some time to finish volumes startups and check the no. of glusterfsd
   are running on node1.

RCA: At the time of glusterd starts it gets friend update request from a peer 
     node and has version changes for the volumes those are started when
     node was down.glusterd deletes volfile and reference for old version volumes
     from glusterd internal data structures and create new volfile.glusterd was not
     able to attached volume because data structure changes were happening after brick
     start so data was going through RPC packet in attach request was not correct and
     brick process sending disconnect to glusterd then glusterd try to spawn a new 
     brick so multiple brick processes are spawned

Regards,
Mohit Agrawal

Comment 2 Worker Ant 2019-04-26 07:52:56 UTC
REVIEW: https://review.gluster.org/22635 (glusterd: Multiple bricks are spawned if a node is reboot) posted (#1) for review on master by MOHIT AGRAWAL

Comment 3 Worker Ant 2020-03-12 14:30:20 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1055, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.