Bug 1516313

Summary:	Bringing down data bricks in cyclic order results in arbiter brick becoming the source for heal.
Product:	[Community] GlusterFS	Reporter:	Karthik U S <ksubrahm>
Component:	arbiter	Assignee:	Karthik U S <ksubrahm>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	3.13	CC:	amukherj, asrivast, bugs, knarra, nchilaka, ravishankar, rcyriac, rhinduja, rhs-bugs, sabose, sasundar, storage-qa-internal
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.13.2	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1482064	Environment:
Last Closed:	2018-01-23 21:37:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1482064, 1566131
Bug Blocks:	1401969, 1530334

Comment 1 Worker Ant 2017-11-22 13:05:02 UTC

REVIEW: https://review.gluster.org/18843 (cluster/afr: Fix for arbiter becoming source) posted (#1) for review on release-3.13 by Karthik U S

Comment 2 Worker Ant 2017-11-27 18:13:57 UTC

COMMIT: https://review.gluster.org/18843 committed in release-3.13 by \"Karthik U S\" <ksubrahm> with a commit message- cluster/afr: Fix for arbiter becoming source

Problem:
When eager-lock is on, and two writes happen in parallel on a FD
we were observing the following behaviour:
- First write fails on one data brick
- Since the post-op is not yet happened, the inode refresh will get
  both the data bricks as readable and set it in the inode context
- In flight split brain check see both the data bricks as readable
  and allows the second write
- Second write fails on the other data brick
- Now the post-op happens and marks both the data bricks as bad and
  arbiter will become source for healing

Fix:
Adding one more variable called write_suvol in inode context and it
will have the in memory representation of the writable subvols. Inode
refresh will not update this value and its lifetime is pre-op through
unlock in the afr transaction. Initially the pre-op will set this
value same as read_subvol in inode context and then in the in flight
split brain check we will use this value instead of read_subvol.
After all the checks we will update the value of this and set the
read_subvol same as this to avoid having incorrect value in that.

Change-Id: I2ef6904524ab91af861d59690974bbc529ab1af3
BUG: 1516313
Signed-off-by: karthik-us <ksubrahm>
(cherry picked from commit 19f9bcff4aada589d4321356c2670ed283f02c03)

Comment 3 Shyamsundar 2017-12-08 17:46:04 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 4 Karthik U S 2018-01-03 06:01:17 UTC

Found some issues in the patch which can again lead to the same problem of arbiter becoming source. Changing the state back to Post.

Comment 5 Worker Ant 2018-01-15 06:39:16 UTC

REVIEW: https://review.gluster.org/19192 (cluster/afr: Fixing the flaws in arbiter becoming source patch) posted (#1) for review on release-3.13 by Karthik U S

Comment 6 Worker Ant 2018-01-18 18:20:31 UTC

COMMIT: https://review.gluster.org/19192 committed in release-3.13 by \"Karthik U S\" <ksubrahm> with a commit message- cluster/afr: Fixing the flaws in arbiter becoming source patch

Problem:
Setting the write_subvol value to read_subvol in case of metadata
transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03)
might lead to the original problem of arbiter becoming source.

Scenario:
1) All bricks are up and good
2) 2 writes w1 and w2 are in progress in parallel
3) ctx->read_subvol is good for all the subvolumes
4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on
   the disk
5) read/lookup comes on the same file and refreshes read_subvols back
   to all good
6) metadata transaction happens which makes ctx->write_subvol to be
   assigned with ctx->read_subvol which is all good
7) w2 succeeds on brick1 and fails on brick0 and this will update the
   brick in reverse order leading to arbiter becoming source

Fix:
Instead of setting the ctx->write_subvol to ctx->read_subvol in the
pre-op statge, if there is a metadata transaction, check in the
function __afr_set_in_flight_sb_status() if it is a data/metadata
transaction. Use the value of ctx->write_subvol if it is a data
transactions and ctx->read_subvol value for other transactions.

With this patch we assign the value of ctx->write_subvol in the
afr_transaction_perform_fop() with the on disk value, instead of
assigning it in the afr_changelog_pre_op() with the in memory value.

Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4
BUG: 1516313
Signed-off-by: karthik-us <ksubrahm>
(cherry picked from commit ba149bac92d169ae2256dbc75202dc9e5d06538e)

Comment 7 Shyamsundar 2018-01-23 21:37:19 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.2, please open a new bug report.

glusterfs-3.13.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-January/000089.html
[2] https://www.gluster.org/pipermail/gluster-users/