Bug 1581548

Summary: writes succeed when only good brick is down in 1x3 volume
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-v4.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1581057 Environment:
Last Closed: 2018-06-20 18:06:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ravishankar N 2018-05-23 04:22:41 UTC
Description of problem:
writes succeed when only good brick is down in 1x3 volume

Version-Release number of selected component (if applicable):
rhgs-3.4.0

How reproducible:
Always

Steps to Reproduce:
*Create a file in a 1x3 replicate volume mounted using fuse
*Disable shd and client side data-self-heal
*open an fd on the file for writing
*Kill B1 (brick1) and write to file.
*Bring it back up using volume start force, then bring B2 down
*write to file.
*Bring B2 back up, and kill B3.
*The next write should fail (since the only good brick B3 is down) but it succeeds.

Actual results:
Writes succeed.

Expected results:
Writes must fail with EIO.

Additional info:
The fix https://review.gluster.org/#/c/20036 for the issue is merged in master via BZ 1577672 which tracks brick mux failures.

Comment 1 Worker Ant 2018-05-23 04:25:52 UTC
REVIEW: https://review.gluster.org/20066 (afr: fix bug-1363721.t failure) posted (#1) for review on release-4.1 by Ravishankar N

Comment 2 Worker Ant 2018-05-25 12:58:11 UTC
COMMIT: https://review.gluster.org/20066 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- afr: fix bug-1363721.t failure

Problem:
In the .t, when the only good brick was brought down, writes on the fd were
still succeeding on the bad bricks. The inflight split-brain check was
marking the write as failure but since the write succeeded on all the
bad bricks, afr_txn_nothing_failed() was set to true and we were
unwinding writev with success to DHT and then catching the failure in
post-op in the background.

Fix:
Don't wind the FOP phase if the write_subvol (which is populated with readable
subvols obtained in pre-op cbk) does not have at least 1 good brick which was up
when the transaction started.

Note: This fix is not related to brick muliplexing. I ran the .t
10 times with this fix and brick-mux enabled without any failures.

Change-Id: I915c9c366aa32cd342b1565827ca2d83cb02ae85
updates: bz#1581548
Signed-off-by: Ravishankar N <ravishankar>
(cherry picked from commit 985a1d15db910e012ddc1dcdc2e333cc28a9968b)

Comment 3 Shyamsundar 2018-06-20 18:06:39 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/