Bug 1550896

Summary: No rollback of renames on succeeded subvols during failure
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Raghavendra G <rgowdapp>
Component: distributeAssignee: Csaba Henk <csaba>
Status: CLOSED ERRATA QA Contact: Prasad Desala <tdesala>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: bugs, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1412069 Environment:
Last Closed: 2018-09-04 06:44:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1412069    
Bug Blocks: 1503137, 1550771    

Description Raghavendra G 2018-03-02 08:27:14 UTC
+++ This bug was initially created as a clone of Bug #1412069 +++

Description of problem:
As with dht, dirs are present on all subvolumes, renaming them is a compound operation and thus a partial success + partial failure scenario is possible, resulting in an inconsistent state. For purposes of reproduction, such a scenario can easily be produced by stopping the volume, edit the volfile of a certain subvolume to get at an "option read-only on" setting, and then restart the volume. Thus those operations that are to make change on the affected subvolume will fail with EROFS. 

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2017-01-11 02:01:30 EST ---

REVIEW: http://review.gluster.org/15739 (feature/dht: undo partially successful dir rename) posted (#7) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Worker Ant on 2017-01-11 10:40:29 EST ---

COMMIT: http://review.gluster.org/15739 committed in master by Raghavendra G (rgowdapp) 
------
commit bb438d849a4a3941c1a9b525213f695f0a2c961b
Author: Csaba Henk <csaba>
Date:   Thu Oct 27 07:30:48 2016 +0200

    feature/dht: undo partially successful dir rename
    
    As with dht, dirs are present on all subvolumes,
    renaming them is a compound operation and thus a
    partial success + partial failure scenario is
    possible, resulting in an inconsistent state.
    
    For purposes of reproduction, such a scenario can
    easily be produced by stopping the volume, edit the
    volfile of a certain subvolume to get at an
    "option read-only on" setting, and then restart
    the volume. Thus those operations that are to make change
    on the affected subvolume will fail with EROFS.
    
    To handle such scenarios, we introduce an in-memory cache
    where we record the return values obtained from the
    subvolumes. At the final stage of the dir rename operation
    we check if it's a partial success/fail situation. If yes,
    then we perform a reverse rename op on those subvolumes
    where the operation succeeded.
    
    Change-Id: I3d05f74f53932cb984a918d252a7309c1009a51d
    BUG: 1412069
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/15739
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>

--- Additional comment from Shyamsundar on 2017-03-06 12:43:33 EST ---

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 7 Prasad Desala 2018-04-23 09:37:07 UTC
Reproduced this issue on 3.3.1 and followed the same steps for verifying this
BZ on 3.4.0 (3.12.2-8.el7rhgs.x86_64).

1) Created a distributed-replicate volume and start it.
2) FUSE mount it on a client.
3) On mount point, create a directory "dir1"
4) Select a replica pair and for all the bricks in this replica pair set
read-only option to on by making changes in the brick vol file.
5) Stop and start the volume.
6) From mount point, rename the directory from dir1 to dir2.

Before fix, dir1 is not renamed on the read-only bricks and on other bricks
rename is successful leading to inconsistency across the nodes and both dir1
and dir2 are having same gfid.

After fix, all the backend bricks are having the same directory.

Moving this BZ to Verified.

Comment 9 errata-xmlrpc 2018-09-04 06:44:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607