Description of problem: ############################################################################## On a three node cluster, Enable brick-mux and create a replica X3 volume. Stop glusterd on node 3 and perform replace-brick on node 1. Replace brick succeeds, now start the glusterd on the node 3. Now Perform add-brick (3 bricks) to the volume. Add-brick succeeds, the brick on the node went offline. Version-Release number of selected component (if applicable): 3.8.4-54.3 How reproducible: 2/2 Steps to Reproduce: 1. Create a replica 3 volume and mount it. start io 2. Stop glusterd on one node(N3) 3. Perform replace brick operation on node (N1) 4. Start glusterd on node where it was stopped(N3) 5. Add 3 bricks to the volume, perform this operation on Node (N1) 6. One brick on node(N2) is offline Actual results: Brick on node (N2) is offline Expected results: All bricks should be online in the volume Additional info:
RCA: glusterd maintains a boolean flag 'port_registered' which is used to determine if a brick has completed its portmap sign in process. This flag is (re)set in pmap_sigin and pmap_signout events. In case of brick multiplexing this flag is the identifier to determine if the very first brick with which the process is spawned up has completed its sign in process. However in case of glusterd restart when a brick is already identified as running, glusterd does a pmap_registry_bind to ensure its portmap table is updated but this flag isn't which is fine in case of non brick multiplex case but causes an issue given the subsequent brick attach can depend on this flag. With replace-brick operation, I think this is more visible as the brick to be replaced is first attached and then the old brick is brought down, so there's eventually no provision for a pmap_signin here as in brick multiplexing only for the very first brick the pmap_signin happens.
Facing similar kind of issue with remove-brick operation while node down. Not reproducible every time. steps to reproduce: ------------------ -> Create 3 node cluster n1, n2, n3. -> Create 2x3 distributed_replicate volume Vol1. -> Add one more replica set to same volume Vol1 using add-brick command. -> Shutdown the node n2 -> Then perform remove-brick operation on node n1(remove-brick op fails as expected) -> Shutdown the node n3 -> Then perform remove-brick operation on node n1(remove-brick op fails ) -> Power on both the nodes n2 and n3 -> Then perform 'gluster vol status Vol1' -> Some of the bricks will go to offline, it will be random.
None of the above steps have 100% reproducers. Instead with the following steps this can be easily reproducible: 1. Create and start a volume (with more than one brick) 2. remove the first brick 3. add one more brick , this operation will take a very significant long time (because of this bug) 4. check volume status, all bricks barring the newly added one will report a N/A status.
upstream patch : https://review.gluster.org/19800
downstream patch : https://code.engineering.redhat.com/gerrit/134827
Build : 3.12.2-8 Performed the steps mentioned in comment 5 and comment 6. All the bricks are online. after performing add-brick operation. Hence marking it as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607