Bug 1117851

Summary: DHT :- data loss - file is missing on renaming same file from multiple client at same time
Product: [Community] GlusterFS Reporter: Jeff Darcy <jdarcy>
Component: distributeAssignee: Jeff Darcy <jdarcy>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-bugs, nsathyan, racpatel, ssamanta
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.6.0beta1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1117135
: 1129527 1138387 1139988 (view as bug list) Environment:
Last Closed: 2014-11-11 08:36:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1117135, 1146889    
Bug Blocks: 1129527, 1138387, 1139988    

Comment 1 Anand Avati 2014-07-09 13:59:36 UTC
REVIEW: http://review.gluster.org/8269 (dht: fix rename race) posted (#2) for review on master by Jeff Darcy (jdarcy)

Comment 2 Jeff Darcy 2014-07-09 14:43:53 UTC
From the original report (editing out one piece of Red Hat internal information).

+++ This bug was initially created as a clone of Bug #1117135 +++

Description of problem:
=======================
On Distributed volume tried to rename same file at same time from more than one client and found that file is missing after that. Both source and Destination file is not present on mount point and bricks .


Version-Release number :
=========================
3.6.0.24-1.el6rhs.x86_64


How reproducible:
=================
Intermittent (got twice out of four time)


Steps to Reproduce:
====================
1. create and mount distributed volume. (mount on multiple client).
2. create few files and verify from mount point
mount :-
[root@OVM1 ren]# ls
b1  b10  b2  b3  b4  b5  b6  b7  b8  b9

3. Now try to rename file from more than one mount at the same time.

mount 1:-
[root@OVM3 ren]# for i in {1..10} ; do mv b$i c$i; done
mv: cannot move `b2' to `c2': File exists
mv: cannot move `b3' to `c3': File exists
mv: cannot move `b9' to `c9': No such file or directory


mount 2:-

[root@OVM1 ren]# for i in {1..10} ; do mv b$i c$i; done
mv: cannot move `b3' to `c3': No such file or directory
mv: cannot move `b4' to `c4': No such file or directory
mv: cannot move `b5' to `c5': No such file or directory
mv: cannot move `b7' to `c7': No such file or directory
mv: cannot move `b10' to `c10': No such file or directory


4. Verify data on mount point.
mount:-

[root@OVM1 ren]# ls
b3  c1  c10  c4  c5  c6  c7  c8  c9

5. File b2 and/or c2 is missing. either source or destination file should be present on mount.
Verified on bricks. file is not even present there

brick:-
[root@OVM3 ren]# ls -l /brick2/*
/brick2/r1:
total 0
---------T 3 root root 0 Jul  7 22:13 b1
---------T 3 root root 0 Jul  7 22:13 c1
-rw-r--r-- 2 root root 0 Jul  7 22:09 c6
-rw-r--r-- 2 root root 0 Jul  7 22:09 c7
---------T 2 root root 0 Jul  7 22:10 c8
-rw-r--r-- 2 root root 0 Jul  7 22:09 c9

/brick2/r2:
total 0
-rw-r--r-- 2 root root 0 Jul  7 22:09 b3
-rw-r--r-- 2 root root 0 Jul  7 22:09 c1
---------T 2 root root 0 Jul  7 22:10 c2
-rw-r--r-- 2 root root 0 Jul  7 22:09 c4
-rw-r--r-- 2 root root 0 Jul  7 22:09 c8

/brick2/r3:
total 0
---------T 2 root root 0 Jul  7 22:13 b4
-rw-r--r-- 2 root root 0 Jul  7 22:09 c10
-rw-r--r-- 2 root root 0 Jul  7 22:09 c5


Actual results:
===============
file is missing - Data loss in case of renaming same file from multiple mount at same time


Expected results:
================
If same operation - rename is executed from multiple mount and it should not end in data loss. Source or destination file should exist (depends on rename was successful or fail)


Additional info :-

mount 1 log :-

[root@OVM3 ren]# grep '/c2' /var/log/glusterfs/mnt-ren.log 
[2014-07-07 16:43:54.489608] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/c2: failed to resolve (No such file or directory)
[2014-07-07 16:43:54.493494] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/c2: failed to resolve (No such file or directory)
[2014-07-07 16:43:54.497569] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/c2: failed to resolve (No such file or directory)
[2014-07-07 16:43:54.501028] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/c2: failed to resolve (No such file or directory)
[2014-07-07 16:43:54.501445] W [client-rpc-fops.c:2604:client3_3_link_cbk] 0-ren-client-2: remote operation failed: File exists (/b2 -> /c2)
[2014-07-07 16:43:54.501878] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 149: /b2 -> /c2 => -1 (File exists)
[root@OVM3 ren]# grep '/b2' /var/log/glusterfs/mnt-ren.log 
[2014-07-07 16:40:25.735551] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/b2: failed to resolve (No such file or directory)
[2014-07-07 16:40:25.738732] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/b2: failed to resolve (No such file or directory)
[2014-07-07 16:40:25.741072] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/b2: failed to resolve (No such file or directory)
[2014-07-07 16:40:25.742584] D [MSGID: 0] [dht-common.c:1087:dht_lookup_everywhere_cbk] 0-ren-dht: found on ren-client-2 file /b2
[2014-07-07 16:40:25.742615] D [MSGID: 0] [dht-common.c:972:dht_lookup_everywhere_done] 0-ren-dht: Linking file /b2 on ren-client-2 to ren-client-1 (hash)(gfid = 00000000-0000-0000-0000-000000000000)
[2014-07-07 16:40:25.742865] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-ren-client-1: remote operation failed: File exists. Path: /b2
[2014-07-07 16:43:54.501445] W [client-rpc-fops.c:2604:client3_3_link_cbk] 0-ren-client-2: remote operation failed: File exists (/b2 -> /c2)
[2014-07-07 16:43:54.501878] W [fuse-bridge.c:1727:fuse_rename_cbk] 0-glusterfs-fuse: 149: /b2 -> /c2 => -1 (File exists)



mount 2 log:-

[root@OVM1 ren]# grep '/c2' /var/log/glusterfs/mnt-ren.log
[2014-07-07 16:43:54.816414] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/c2: failed to resolve (No such file or directory)
[2014-07-07 16:43:54.820530] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/c2: failed to resolve (No such file or directory)
[2014-07-07 16:43:54.824191] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/c2: failed to resolve (No such file or directory)
[2014-07-07 16:43:54.827681] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/c2: failed to resolve (No such file or directory)
[root@OVM1 ren]# grep '/b2' /var/log/glusterfs/mnt-ren.log
[2014-07-07 16:40:26.058748] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/b2: failed to resolve (No such file or directory)
[2014-07-07 16:40:26.062550] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/b2: failed to resolve (No such file or directory)
[2014-07-07 16:40:26.065694] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/b2: failed to resolve (No such file or directory)
[2014-07-07 16:40:26.068620] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/b2: failed to resolve (No such file or directory)


brick log:-

[root@OVM3 ren]# grep '/b2' /var/log/glusterfs/bricks/brick2-r*
/var/log/glusterfs/bricks/brick2-r2.log:[2014-07-07 16:40:25.742744] I [server-rpc-fops.c:557:server_mknod_cbk] 0-ren-server: 229: MKNOD (null) (00000000-0000-0000-0000-000000000001/b2) ==> (File exists)
[root@OVM3 ren]# grep '/c2' /var/log/glusterfs/bricks/brick2-r*
/var/log/glusterfs/bricks/brick2-r3.log:[2014-07-07 16:43:54.501210] I [server-rpc-fops.c:1185:server_link_cbk] 0-ren-server: 421: LINK /c2 (b4bc8b38-e17b-449d-8669-ff33a836edd6) -> 00000000-0000-0000-0000-000000000001/c2 ==> (File exists)

--- Additional comment from Rachana Patel on 2014-07-08 02:50:10 EDT ---

volume info :-

Volume Name: ren
Type: Distribute
Volume ID: ea9b5c23-6de6-4863-bb15-37e3ad57c226
Status: Started
Snap Volume: no
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.70.35.198:/brick2/r1
Brick2: 10.70.35.198:/brick2/r2
Brick3: 10.70.35.198:/brick2/r3
Options Reconfigured:
diagnostics.client-log-level: DEBUG
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable


mount info :-
mount  1:-
[root@OVM3 ~]# mount  | grep ren
10.70.35.198:/ren on /mnt/ren type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

mount 2 :-
root@OVM1 ~]# mount | grep ren
10.70.35.198:/ren on /mnt/ren type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

Comment 3 Anand Avati 2014-07-09 17:06:04 UTC
REVIEW: http://review.gluster.org/8269 (dht: fix rename race) posted (#3) for review on master by Jeff Darcy (jdarcy)

Comment 4 Anand Avati 2014-07-09 18:28:25 UTC
REVIEW: http://review.gluster.org/8269 (dht: fix rename race) posted (#4) for review on master by Jeff Darcy (jdarcy)

Comment 5 Anand Avati 2014-07-11 13:24:47 UTC
REVIEW: http://review.gluster.org/8297 (dht: fix rename race) posted (#1) for review on master by Jeff Darcy (jdarcy)

Comment 6 Anand Avati 2014-07-11 13:42:32 UTC
REVIEW: http://review.gluster.org/8276 (dht: fix rename race more aggressively) posted (#2) for review on master by Jeff Darcy (jdarcy)

Comment 7 Anand Avati 2014-07-11 15:07:56 UTC
REVIEW: http://review.gluster.org/8276 (dht: fix rename race more aggressively) posted (#3) for review on master by Jeff Darcy (jdarcy)

Comment 8 Anand Avati 2014-07-11 16:36:11 UTC
REVIEW: http://review.gluster.org/8276 (dht: fix rename race more aggressively) posted (#4) for review on master by Jeff Darcy (jdarcy)

Comment 9 Anand Avati 2014-07-14 19:21:06 UTC
REVIEW: http://review.gluster.org/8269 (dht: fix rename race) posted (#5) for review on master by Jeff Darcy (jdarcy)

Comment 10 Anand Avati 2014-07-14 20:12:55 UTC
REVIEW: http://review.gluster.org/8276 (dht: fix rename race more aggressively) posted (#5) for review on master by Jeff Darcy (jdarcy)

Comment 11 Anand Avati 2014-07-14 20:53:06 UTC
REVIEW: http://review.gluster.org/8276 (dht: fix rename race more aggressively) posted (#6) for review on master by Jeff Darcy (jdarcy)

Comment 12 Anand Avati 2014-07-15 00:23:16 UTC
REVIEW: http://review.gluster.org/8276 (dht: fix rename race more aggressively) posted (#7) for review on master by Jeff Darcy (jdarcy)

Comment 13 Anand Avati 2014-07-15 01:06:03 UTC
REVIEW: http://review.gluster.org/8269 (dht: fix rename race) posted (#6) for review on master by Jeff Darcy (jdarcy)

Comment 14 Anand Avati 2014-07-16 12:55:42 UTC
REVIEW: http://review.gluster.org/8276 (dht: fix rename race more aggressively) posted (#8) for review on master by Jeff Darcy (jdarcy)

Comment 15 Anand Avati 2014-07-16 15:35:18 UTC
REVIEW: http://review.gluster.org/8276 (dht: fix rename race more aggressively) posted (#9) for review on master by Jeff Darcy (jdarcy)

Comment 16 Anand Avati 2014-07-17 16:39:04 UTC
REVIEW: http://review.gluster.org/8327 (storage/posix: removing deleting entries in case of creation failures) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 17 Anand Avati 2014-07-17 17:23:49 UTC
REVIEW: http://review.gluster.org/8327 (storage/posix: removing deleting entries in case of creation failures) posted (#2) for review on master by Vijay Bellur (vbellur)

Comment 18 Anand Avati 2014-07-17 17:31:02 UTC
COMMIT: http://review.gluster.org/8269 committed in master by Vijay Bellur (vbellur) 
------
commit 950f9d8abe714708ca62b86f304e7417127e1132
Author: Jeff Darcy <jdarcy>
Date:   Tue Jul 8 21:56:04 2014 -0400

    dht: fix rename race
    
    If two clients try to rename the same file at the same time, we
    sometimes end up with *no file at all* in either the old or new
    location.  That's kind of bad.  The culprit seems to be some overly
    aggressive cleanup code.  AFAICT, based on today's study of the code,
    the intent of the changed section is to remove any linkfile we might
    have created before the actual rename.  However, what we're removing
    might not be our extra link.  If we're racing with another client that's
    also doing a rename, it might be the only remaining link to the user's
    data.  The solution, which is good enough to pass this test but almost
    certainly still not complete, is to be more selective about when we do
    this unlink.  Now, we only do it if we know that, at some point, we did
    in fact create the link without error (notably ENOENT on the source or
    EEXIST on the destination) ourselves.
    
    Change-Id: I8d8cce150b6f8b372c9fb813c90be58d69f8eb7b
    BUG: 1117851
    Signed-off-by: Jeff Darcy <jdarcy>
    Reviewed-on: http://review.gluster.org/8269
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 19 Anand Avati 2014-07-18 16:53:03 UTC
REVIEW: http://review.gluster.org/8327 (storage/posix: removing deleting entries in case of creation failures) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 20 Anand Avati 2014-07-19 10:29:10 UTC
REVIEW: http://review.gluster.org/8327 (storage/posix: removing deleting entries in case of creation failures) posted (#4) for review on master by Raghavendra G (rgowdapp)

Comment 21 Jeff Darcy 2014-07-20 10:23:27 UTC
Still POST until http://review.gluster.org/#/c/8327/ gets merged.

Comment 22 Anand Avati 2014-07-21 13:25:11 UTC
REVIEW: http://review.gluster.org/8338 (dht: fix rename race Additional check to check if we created the linkto file before deletingit in the rename cleanup function) posted (#1) for review on master by N Balachandran (nbalacha)

Comment 23 Anand Avati 2014-07-21 15:10:17 UTC
REVIEW: http://review.gluster.org/8338 (dht: fix rename race) posted (#2) for review on master by N Balachandran (nbalacha)

Comment 24 Anand Avati 2014-07-21 20:44:27 UTC
REVIEW: http://review.gluster.org/8327 (storage/posix: removing deleting entries in case of creation failures) posted (#5) for review on master by Raghavendra G (rgowdapp)

Comment 25 Anand Avati 2014-07-28 19:37:25 UTC
REVIEW: http://review.gluster.org/8338 (dht: fix rename race) posted (#3) for review on master by Shyamsundar Ranganathan (srangana)

Comment 26 Anand Avati 2014-07-30 06:31:40 UTC
REVIEW: http://review.gluster.org/8327 (storage/posix: removing deleting entries in case of creation failures) posted (#6) for review on master by Raghavendra G (rgowdapp)

Comment 27 Anand Avati 2014-07-30 09:17:59 UTC
COMMIT: http://review.gluster.org/8327 committed in master by Vijay Bellur (vbellur) 
------
commit 45fbf99cb669e891a84a8228cef27973f5e774bf
Author: Raghavendra G <rgowdapp>
Date:   Thu Jul 17 21:59:26 2014 +0530

    storage/posix: removing deleting entries in case of creation failures
    
    The code is not atomic enough to not to delete a dentry created by a
    prallel dentry creation operation.
    
    Change-Id: I9bd6d2aa9e7a1c0688c0a937b02a4b4f56d7aa3d
    BUG: 1117851
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/8327
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 28 Anand Avati 2014-07-30 09:19:24 UTC
COMMIT: http://review.gluster.org/8338 committed in master by Vijay Bellur (vbellur) 
------
commit df770496ba5ed6d2c72bcfc76ca9e816a08c383a
Author: Nithya Balachandran <nbalacha>
Date:   Mon Jul 21 18:46:14 2014 +0530

    dht: fix rename race
    
    Additional check to check if we created the linkto
    file before deleting it in the rename cleanup function
    
    Change-Id: I919cd7cb24f948ba4917eb9cf50d5169bb730a67
    BUG: 1117851
    Signed-off-by: Nithya Balachandran <nbalacha>
    Reviewed-on: http://review.gluster.org/8338
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 29 Niels de Vos 2014-09-22 12:44:41 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 30 Niels de Vos 2014-11-11 08:36:53 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users