Bug 1142650

Summary: [DATA LOSS]- DHT- rename from multiple mount ends in data loss
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED CURRENTRELEASE QA Contact: storage-qa-internal <storage-qa-internal>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: mzywusko, nbalacha, rgowdapp, rhs-bugs, smohan, spalai, tdesala
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: dht-data-loss, dht-fixed, dht-pm-query
Fixed In Version: 3.7.9-10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1157667 (view as bug list) Environment:
Last Closed: 2016-09-14 02:47:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1141368, 1166570    
Bug Blocks: 1157667    

Description Rachana Patel 2014-09-17 07:20:56 UTC
Description of problem:
=======================
[DATA LOSS]- DHT- rename from multiple mount ends in data loss

Version-Release number of selected component (if applicable):
=============================================================
3.6.0.28-1.el6rhs.x86_64

How reproducible:
=================
always

Steps to Reproduce:
===================
1. created volume 4x2 using snapshot{9..12} - volume name :- multi1
Volume Name: multi1
Type: Distributed-Replicate
Volume ID: 630a2173-3d1f-4ddd-8529-b1c14a6d6a64
Status: Started
Snap Volume: no
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: snapshot09.lab.eng.blr.redhat.com:/brick5/multi10
Brick2: snapshot10.lab.eng.blr.redhat.com:/brick5/multi11
Brick3: snapshot11.lab.eng.blr.redhat.com:/brick5/multi12
Brick4: snapshot12.lab.eng.blr.redhat.com:/brick5/multi13
Brick5: snapshot09.lab.eng.blr.redhat.com:/brick5/multi14
Brick6: snapshot10.lab.eng.blr.redhat.com:/brick5/multi15
Brick7: snapshot11.lab.eng.blr.redhat.com:/brick5/multi16
Brick8: snapshot12.lab.eng.blr.redhat.com:/brick5/multi17
Options Reconfigured:
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256

2. snapshot09 and 10 went down
3. created directory fresh4 after that.so it has complete layout
4. inside fresh4 Directories created files - a b c d
5. started renaming from two mount as below:-
one is NFS and one is FUSE
while true; do cd /mnt/multi/fresh4/; mv -f a b; mv -f c d; mv -f b a; mv -f d c; cd / ; done

Actual results:
===============
 [root@snapshot11 fresh4]# ls a b c d -li
 ls: cannot access c: No such file or directory
 ls: cannot access d: No such file or directory
 10880713884975601753 -rw-r--r-- 1 root root 41943040 Sep 17 09:57 a
 10880713884975601753 -rw-r--r-- 1 root root 41943040 Sep 17 09:57 b
-> verified on backend file is not present there

Expected results:
=================

Comment 2 Rachana Patel 2014-09-17 09:22:18 UTC
sosreport @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1142650/

Comment 3 Susant Kumar Palai 2014-09-17 10:59:48 UTC
Reproduced the issue with TRACE enabled.

1. Created two brick setup.
2. Created two files[make sure they hash to different bricks]. In my case they are tile and zile
3. Run "while true; done mv -f tile zile ; mv -f zile tile; done" form both nfs and fuse


Logs: [Captured last logs for either unlink or rename from both mount points]

1. from mnt.log [UNLINK was the last operation on zile ]

[2014-09-17 10:09:04.412720] T [fuse-bridge.c:435:fuse_entry_cbk] 0-glusterfs-fuse: 99191: LOOKUP() /zile => 11886673162260437651
[2014-09-17 10:09:04.412830] T [fuse-bridge.c:1570:fuse_unlink_resume] 0-glusterfs-fuse: 99192: UNLINK /zile
[2014-09-17 10:09:04.418667] T [fuse-bridge.c:1290:fuse_unlink_cbk] 0-glusterfs-fuse: 99192: UNLINK() /zile => 0


2. from nfs.log [rename tile->zile was the last operation on tile, zile]
[2014-09-17 10:09:04.397658] T [nfs-fops.c:1293:nfs_fop_rename] 0-nfs: Rename: /tile -> /zile
[2014-09-17 10:09:04.397696] I [dht-rename.c:1345:dht_rename] 0-test1-dht: renaming /tile (hash=test1-client-0/cache=test1-client-0) => /zile (hash=test1-client-1/cache=<nul>)
[2014-09-17 10:09:04.399864] T [MSGID: 0] [dht-rename.c:1051:dht_rename_create_links] 0-test1-dht: linkfile /tile @ test1-client-1 => test1-client-0
[2014-09-17 10:09:04.405688] T [MSGID: 0] [dht-rename.c:921:dht_rename_linkto_cbk] 0-test1-dht: link /tile => /zile (test1-client-0)
[2014-09-17 10:09:04.405983] T [MSGID: 0] [dht-rename.c:839:dht_do_rename] 0-test1-dht: renaming /tile => /zile (test1-client-1)
[2014-09-17 10:09:04.407583] T [MSGID: 0] [dht-rename.c:740:dht_rename_cbk] 0-test1-dht: deleting old src datafile /tile @ test1-client-0


Observations: The unlink of zile from /mnt "412720" and deletion of tile from /nfs "407583+some delay as it's not yet deleted" are very close and they are the last operations captured on the logs. And looks what Shyam pointed out earlier.


1. NFS mount tries to do rename tile -> zile and FUSE mount attempting zile -> tile
2. In case "tile->zile" tile got unlinked from nfs mount, but lookup happended same time around from FUSE mount.
3. And in the process of "zile->file" on FUSE mount, FUSE sent "unlink zile". 
And we loose the file.

Regards,
Susant

Comment 6 Raghavendra G 2016-07-01 06:15:08 UTC
Bz 1166570 which this bug depends on is fixed in RHEL-7.2. Since we shipped rhgs-3.1.3 on RHEL-7.2, this bug should be fixed in 3.1.3

<bz 116570>

Status: ASSIGNED → MODIFIED
Fixed In Version: coreutils-8.22-13.el7

</116570>

Comment 7 Prasad Desala 2016-09-13 11:17:22 UTC
Verified this bug on glusterfs build 3.7.9-12.el7rhgs.x86_64.
Here are the steps that were performed,
1. Created a distributed replica volume and started it.
2. NFS and Fuse mounted the volume on two different clients.
2. Created two files file1 and file2.
3. Simultaneously from NFS and Fuse mounts, continuously renamed the two files
"while true; do mv -f file1 file2 ; mv -f file2 file1; done"

The issue is fixed and no data loss was seen. Hence, moving state of bug to Verified.

Comment 8 Nithya Balachandran 2016-09-14 02:47:08 UTC
Thanks Prasad.

Closing this BZ as per comment#7.