1636957 – peer went into rejected state, if glusterd restarts while creating a volume

Bug 1636957 - peer went into rejected state, if glusterd restarts while creating a volume

Summary: peer went into rejected state, if glusterd restarts while creating a volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.4
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.z Batch Update 2
Assignee:	Sanju
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-08 11:05 UTC by Bala Konda Reddy M
Modified:	2018-12-17 17:07 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.12.2-27
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-12-17 17:07:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3827	0	None	None	None	2018-12-17 17:07:27 UTC

Description Bala Konda Reddy M 2018-10-08 11:05:13 UTC

Description of problem:
On a three node cluster(N1,N2,N3), Create volume using only two nodes(N1 and N2). while volume is in creating state perform glusterd restart on the last node. The volume create is succeeding but throwing below error which says volume create failed and peer status is showing the last node is in peer rejected state
gluster vol list |wc -l is mismatching in the cluster

###############################################################
gluster vol create testvol6 replica 2 10.70.37.213:/bricks/brick1/test6 10.70.37.75:/bricks/brick1/test6 10.70.37.213:/bricks/brick2/test7 10.70.37.75:/bricks/brick2/test7
Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid this.
Do you still want to continue?
(y/n) y
volume create: testvol6: failed: Commit failed on dhcp37-94.lab.eng.blr.redhat.com. Please check log file for details.

################################################################
[root@dhcp37-213 ~]# gluster peer status
Number of Peers: 2

Hostname: dhcp37-75.lab.eng.blr.redhat.com
Uuid: d8e9a211-1a54-4288-b790-ea13a603c93b
State: Peer in Cluster (Connected)

Hostname: dhcp37-94.lab.eng.blr.redhat.com
Uuid: 56bb1e66-264b-4b2a-97d3-9fd8125ec57a
State: Peer Rejected (Connected)

##################################################################

Version-Release number of selected component (if applicable):
3.12.2-18.1

How reproducible:
1/1

Steps to Reproduce:
1. Form a cluster with three nodes
2. Create a volume with bricks from only two nodes
3. While create is going, issue glusterd restart on node N3(These two operations should run at a same time)

Actual results:
Volume create is throwing failed and commit failed but volume is creating.
gluster peer status is Peer Rejected for the third node. volume count mismatch is seen on the third node

Expected results:
Volume create should succeed and sync with third node once it is up. peer status should be connected state

Additional info:

Comment 2 Atin Mukherjee 2018-10-08 11:44:18 UTC

1. volume count mismatch - not at all a surprise in this case as glusterd doesn't have any transaction rollback mechanism. So in N1 & N2 the volume create might be through but in N3 it wasn't as glusterd got restarted during commit phase.


3. Peer going into reject state - which peer was showing rejected. Did we check the glusterd log of the node which was rejected?

Comment 3 Atin Mukherjee 2018-10-12 03:28:24 UTC

It seems to me that the root cause of this is same as of BZ 1635136 & 1637459 . If so can you please update the bug status with the patch link?

Comment 4 Sanju 2018-10-13 05:50:40 UTC

upstream patch: https://review.gluster.org/#/c/glusterfs/+/21336/

Comment 10 errata-xmlrpc 2018-12-17 17:07:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3827

Note You need to log in before you can comment on or make changes to this bug.