Bug 1123950

Summary: Rename of a file from 2 clients racing and resulting in an error on both clients
Product: [Community] GlusterFS Reporter: Shyamsundar <srangana>
Component: distributeAssignee: Shyamsundar <srangana>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, nbalacha
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1138390 1139999 (view as bug list) Environment:
Last Closed: 2015-05-14 17:26:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1138390, 1139999    

Description Shyamsundar 2014-07-28 17:35:24 UTC
Description of problem:
This problem is hit as a part of this test case, tests/bugs/bug-1117851.t about once every 100 files (based on the backend disk for the volume (i.e ram disk/ssd/others)).

The issue being seen when hashed and cached subvols for a file are the same and it is being renamed to another file whose hased subvol is different.

The root cause of this issue is due to the fact that both clients race to create the link and linkto file in the above scenario, and the losing client goes ahead and deletes the linkto file in its cleanup, thereby the actual rename attempted by the winning client fails, ending up in both clients failing to rename the file.

Fixing the part of the client that fails to create the linkto file, to not delete the linkto file will not be sufficient, as the losing client could have won that race (as link and linkto are wound in parallel). Which is present in this review, http://review.gluster.org/#/c/8338/

The additional fix to handle this failure, is to make the wind's to create the link and the linkto sequential, so that whichever client wins the link race, can then go ahead with creating the linkto file and hence have a clear client proceeding and the other client getting the required errors.

Version-Release number of selected component (if applicable):
Gluster master

How reproducible:
1 in 100 renames if run on bircks on SSD or RAM disk

Steps to Reproduce:
Test case, tests/bugs/bug-1117851.t treating warnings on file rename failures as errors (see comment in tesst case file)

Also, this should be a fork from bug #1117851, but as this is not a data loss, only refering the original bug here.

Comment 1 Anand Avati 2014-07-28 19:37:34 UTC
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#1) for review on master by Shyamsundar Ranganathan (srangana)

Comment 2 Anand Avati 2014-07-30 11:30:09 UTC
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#2) for review on master by Shyamsundar Ranganathan (srangana)

Comment 3 Anand Avati 2014-07-30 14:22:25 UTC
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#3) for review on master by Shyamsundar Ranganathan (srangana)

Comment 4 Anand Avati 2014-07-31 13:54:35 UTC
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#4) for review on master by Shyamsundar Ranganathan (srangana)

Comment 5 Anand Avati 2014-08-13 17:49:49 UTC
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#5) for review on master by Shyamsundar Ranganathan (srangana)

Comment 6 Shyamsundar 2014-09-02 15:00:56 UTC
Abandoned: http://review.gluster.org/8382

This change is made differently where handling the linkto creation was needed first due to FUSE behavior.

These changes can be found here,

    http://review.gluster.org/#/c/8563/
    http://review.gluster.org/#/c/8570/

These changes would now make the winning client not fail a rename, in case it failed to rename the linkto file. Hence when one client wins the link race, and the other still deletes the linkto file, the rename failure by the winning client is not a critical failure, hence resolving the issue.

The test case modified as a part of this commit will be posted as a separate commit for inclusion post which this bug can be marked for verification.

Comment 7 Anand Avati 2014-09-02 16:42:40 UTC
REVIEW: http://review.gluster.org/8579 (cluster/dht: Modified test case to note rename failures as errors) posted (#1) for review on master by Shyamsundar Ranganathan (srangana)

Comment 8 Anand Avati 2014-09-02 18:48:55 UTC
COMMIT: http://review.gluster.org/8579 committed in master by Vijay Bellur (vbellur) 
------
commit 4adfb6fb7c371c6bc03acdaf61f1cca496388356
Author: Shyam <srangana>
Date:   Tue Sep 2 12:37:07 2014 -0400

    cluster/dht: Modified test case to note rename failures as errors
    
    The bug referenced in this change, had an race condition that is now
    fixed by the following commits that are posted for review.
    
        http://review.gluster.org/#/c/8563/
        http://review.gluster.org/#/c/8570/
    
    These changes would now make the winning client not fail a rename,
    in case it failed to rename the linkto file. Hence when one client
    wins the link race, and the other still deletes the linkto file,
    the rename failure by the winning client is not a critical failure,
    hence it resolves the issue posted in the bug.
    
    As a result modifying the test case to treat the rename failures
    as errors, to catch any future issues.
    
    Change-Id: Ibe9caac7ee87dcbc4f581cfbd36173b734859ccb
    BUG: 1123950
    Signed-off-by: Shyam <srangana>
    Reviewed-on: http://review.gluster.org/8579
    Reviewed-by: Jeff Darcy <jdarcy>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 9 Anand Avati 2014-09-15 07:31:41 UTC
REVIEW: http://review.gluster.org/8729 (cluster/dht: Modified test case to note rename failures as errors) posted (#1) for review on release-3.5 by N Balachandran (nbalacha)

Comment 10 Anand Avati 2014-09-23 08:54:11 UTC
REVIEW: http://review.gluster.org/8729 (cluster/dht: Modified test case to note rename failures as errors) posted (#2) for review on release-3.5 by N Balachandran (nbalacha)

Comment 11 Nithya Balachandran 2015-04-22 08:15:36 UTC
http://review.gluster.org/8729 was incorrectlt posted against this BZ. 

Moving this to Modified based on Comment#8

Comment 12 Nithya Balachandran 2015-04-22 08:15:49 UTC
http://review.gluster.org/8729 was incorrectly posted against this BZ. 

Moving this to Modified based on Comment#8

Comment 13 Niels de Vos 2015-05-14 17:26:50 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 14 Niels de Vos 2015-05-14 17:35:30 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 15 Niels de Vos 2015-05-14 17:37:52 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 16 Niels de Vos 2015-05-14 17:42:55 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user