Bug 1581548

Summary:	writes succeed when only good brick is down in 1x3 volume
Product:	[Community] GlusterFS	Reporter:	Ravishankar N <ravishankar>
Component:	replicate	Assignee:	Ravishankar N <ravishankar>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.1	CC:	bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-v4.1.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1581057	Environment:
Last Closed:	2018-06-20 18:06:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ravishankar N 2018-05-23 04:22:41 UTC

Description of problem:
writes succeed when only good brick is down in 1x3 volume

Version-Release number of selected component (if applicable):
rhgs-3.4.0

How reproducible:
Always

Steps to Reproduce:
*Create a file in a 1x3 replicate volume mounted using fuse
*Disable shd and client side data-self-heal
*open an fd on the file for writing
*Kill B1 (brick1) and write to file.
*Bring it back up using volume start force, then bring B2 down
*write to file.
*Bring B2 back up, and kill B3.
*The next write should fail (since the only good brick B3 is down) but it succeeds.

Actual results:
Writes succeed.

Expected results:
Writes must fail with EIO.

Additional info:
The fix https://review.gluster.org/#/c/20036 for the issue is merged in master via BZ 1577672 which tracks brick mux failures.

Comment 1 Worker Ant 2018-05-23 04:25:52 UTC

REVIEW: https://review.gluster.org/20066 (afr: fix bug-1363721.t failure) posted (#1) for review on release-4.1 by Ravishankar N

Comment 2 Worker Ant 2018-05-25 12:58:11 UTC

COMMIT: https://review.gluster.org/20066 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- afr: fix bug-1363721.t failure

Problem:
In the .t, when the only good brick was brought down, writes on the fd were
still succeeding on the bad bricks. The inflight split-brain check was
marking the write as failure but since the write succeeded on all the
bad bricks, afr_txn_nothing_failed() was set to true and we were
unwinding writev with success to DHT and then catching the failure in
post-op in the background.

Fix:
Don't wind the FOP phase if the write_subvol (which is populated with readable
subvols obtained in pre-op cbk) does not have at least 1 good brick which was up
when the transaction started.

Note: This fix is not related to brick muliplexing. I ran the .t
10 times with this fix and brick-mux enabled without any failures.

Change-Id: I915c9c366aa32cd342b1565827ca2d83cb02ae85
updates: bz#1581548
Signed-off-by: Ravishankar N <ravishankar>
(cherry picked from commit 985a1d15db910e012ddc1dcdc2e333cc28a9968b)

Comment 3 Shyamsundar 2018-06-20 18:06:39 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/