Bug 1581548 - writes succeed when only good brick is down in 1x3 volume
Summary: writes succeed when only good brick is down in 1x3 volume
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 4.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-23 04:22 UTC by Ravishankar N
Modified: 2018-06-20 18:06 UTC (History)
1 user (show)

Fixed In Version: glusterfs-v4.1.0
Clone Of: 1581057
Environment:
Last Closed: 2018-06-20 18:06:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Ravishankar N 2018-05-23 04:22:41 UTC
Description of problem:
writes succeed when only good brick is down in 1x3 volume

Version-Release number of selected component (if applicable):
rhgs-3.4.0

How reproducible:
Always

Steps to Reproduce:
*Create a file in a 1x3 replicate volume mounted using fuse
*Disable shd and client side data-self-heal
*open an fd on the file for writing
*Kill B1 (brick1) and write to file.
*Bring it back up using volume start force, then bring B2 down
*write to file.
*Bring B2 back up, and kill B3.
*The next write should fail (since the only good brick B3 is down) but it succeeds.

Actual results:
Writes succeed.

Expected results:
Writes must fail with EIO.

Additional info:
The fix https://review.gluster.org/#/c/20036 for the issue is merged in master via BZ 1577672 which tracks brick mux failures.

Comment 1 Worker Ant 2018-05-23 04:25:52 UTC
REVIEW: https://review.gluster.org/20066 (afr: fix bug-1363721.t failure) posted (#1) for review on release-4.1 by Ravishankar N

Comment 2 Worker Ant 2018-05-25 12:58:11 UTC
COMMIT: https://review.gluster.org/20066 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- afr: fix bug-1363721.t failure

Problem:
In the .t, when the only good brick was brought down, writes on the fd were
still succeeding on the bad bricks. The inflight split-brain check was
marking the write as failure but since the write succeeded on all the
bad bricks, afr_txn_nothing_failed() was set to true and we were
unwinding writev with success to DHT and then catching the failure in
post-op in the background.

Fix:
Don't wind the FOP phase if the write_subvol (which is populated with readable
subvols obtained in pre-op cbk) does not have at least 1 good brick which was up
when the transaction started.

Note: This fix is not related to brick muliplexing. I ran the .t
10 times with this fix and brick-mux enabled without any failures.

Change-Id: I915c9c366aa32cd342b1565827ca2d83cb02ae85
updates: bz#1581548
Signed-off-by: Ravishankar N <ravishankar>
(cherry picked from commit 985a1d15db910e012ddc1dcdc2e333cc28a9968b)

Comment 3 Shyamsundar 2018-06-20 18:06:39 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.