Bug 1142650 - [DATA LOSS]- DHT- rename from multiple mount ends in data loss
Summary: [DATA LOSS]- DHT- rename from multiple mount ends in data loss
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: dht-data-loss, dht-fixed, dht-pm-query
Depends On: 1141368 1166570
Blocks: 1157667
TreeView+ depends on / blocked
 
Reported: 2014-09-17 07:20 UTC by Rachana Patel
Modified: 2016-09-14 02:47 UTC (History)
7 users (show)

Fixed In Version: 3.7.9-10
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1157667 (view as bug list)
Environment:
Last Closed: 2016-09-14 02:47:08 UTC
Embargoed:


Attachments (Terms of Use)

Description Rachana Patel 2014-09-17 07:20:56 UTC
Description of problem:
=======================
[DATA LOSS]- DHT- rename from multiple mount ends in data loss

Version-Release number of selected component (if applicable):
=============================================================
3.6.0.28-1.el6rhs.x86_64

How reproducible:
=================
always

Steps to Reproduce:
===================
1. created volume 4x2 using snapshot{9..12} - volume name :- multi1
Volume Name: multi1
Type: Distributed-Replicate
Volume ID: 630a2173-3d1f-4ddd-8529-b1c14a6d6a64
Status: Started
Snap Volume: no
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: snapshot09.lab.eng.blr.redhat.com:/brick5/multi10
Brick2: snapshot10.lab.eng.blr.redhat.com:/brick5/multi11
Brick3: snapshot11.lab.eng.blr.redhat.com:/brick5/multi12
Brick4: snapshot12.lab.eng.blr.redhat.com:/brick5/multi13
Brick5: snapshot09.lab.eng.blr.redhat.com:/brick5/multi14
Brick6: snapshot10.lab.eng.blr.redhat.com:/brick5/multi15
Brick7: snapshot11.lab.eng.blr.redhat.com:/brick5/multi16
Brick8: snapshot12.lab.eng.blr.redhat.com:/brick5/multi17
Options Reconfigured:
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256

2. snapshot09 and 10 went down
3. created directory fresh4 after that.so it has complete layout
4. inside fresh4 Directories created files - a b c d
5. started renaming from two mount as below:-
one is NFS and one is FUSE
while true; do cd /mnt/multi/fresh4/; mv -f a b; mv -f c d; mv -f b a; mv -f d c; cd / ; done

Actual results:
===============
 [root@snapshot11 fresh4]# ls a b c d -li
 ls: cannot access c: No such file or directory
 ls: cannot access d: No such file or directory
 10880713884975601753 -rw-r--r-- 1 root root 41943040 Sep 17 09:57 a
 10880713884975601753 -rw-r--r-- 1 root root 41943040 Sep 17 09:57 b
-> verified on backend file is not present there

Expected results:
=================

Comment 2 Rachana Patel 2014-09-17 09:22:18 UTC
sosreport @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1142650/

Comment 3 Susant Kumar Palai 2014-09-17 10:59:48 UTC
Reproduced the issue with TRACE enabled.

1. Created two brick setup.
2. Created two files[make sure they hash to different bricks]. In my case they are tile and zile
3. Run "while true; done mv -f tile zile ; mv -f zile tile; done" form both nfs and fuse


Logs: [Captured last logs for either unlink or rename from both mount points]

1. from mnt.log [UNLINK was the last operation on zile ]

[2014-09-17 10:09:04.412720] T [fuse-bridge.c:435:fuse_entry_cbk] 0-glusterfs-fuse: 99191: LOOKUP() /zile => 11886673162260437651
[2014-09-17 10:09:04.412830] T [fuse-bridge.c:1570:fuse_unlink_resume] 0-glusterfs-fuse: 99192: UNLINK /zile
[2014-09-17 10:09:04.418667] T [fuse-bridge.c:1290:fuse_unlink_cbk] 0-glusterfs-fuse: 99192: UNLINK() /zile => 0


2. from nfs.log [rename tile->zile was the last operation on tile, zile]
[2014-09-17 10:09:04.397658] T [nfs-fops.c:1293:nfs_fop_rename] 0-nfs: Rename: /tile -> /zile
[2014-09-17 10:09:04.397696] I [dht-rename.c:1345:dht_rename] 0-test1-dht: renaming /tile (hash=test1-client-0/cache=test1-client-0) => /zile (hash=test1-client-1/cache=<nul>)
[2014-09-17 10:09:04.399864] T [MSGID: 0] [dht-rename.c:1051:dht_rename_create_links] 0-test1-dht: linkfile /tile @ test1-client-1 => test1-client-0
[2014-09-17 10:09:04.405688] T [MSGID: 0] [dht-rename.c:921:dht_rename_linkto_cbk] 0-test1-dht: link /tile => /zile (test1-client-0)
[2014-09-17 10:09:04.405983] T [MSGID: 0] [dht-rename.c:839:dht_do_rename] 0-test1-dht: renaming /tile => /zile (test1-client-1)
[2014-09-17 10:09:04.407583] T [MSGID: 0] [dht-rename.c:740:dht_rename_cbk] 0-test1-dht: deleting old src datafile /tile @ test1-client-0


Observations: The unlink of zile from /mnt "412720" and deletion of tile from /nfs "407583+some delay as it's not yet deleted" are very close and they are the last operations captured on the logs. And looks what Shyam pointed out earlier.


1. NFS mount tries to do rename tile -> zile and FUSE mount attempting zile -> tile
2. In case "tile->zile" tile got unlinked from nfs mount, but lookup happended same time around from FUSE mount.
3. And in the process of "zile->file" on FUSE mount, FUSE sent "unlink zile". 
And we loose the file.

Regards,
Susant

Comment 6 Raghavendra G 2016-07-01 06:15:08 UTC
Bz 1166570 which this bug depends on is fixed in RHEL-7.2. Since we shipped rhgs-3.1.3 on RHEL-7.2, this bug should be fixed in 3.1.3

<bz 116570>

Status: ASSIGNED → MODIFIED
Fixed In Version: coreutils-8.22-13.el7

</116570>

Comment 7 Prasad Desala 2016-09-13 11:17:22 UTC
Verified this bug on glusterfs build 3.7.9-12.el7rhgs.x86_64.
Here are the steps that were performed,
1. Created a distributed replica volume and started it.
2. NFS and Fuse mounted the volume on two different clients.
2. Created two files file1 and file2.
3. Simultaneously from NFS and Fuse mounts, continuously renamed the two files
"while true; do mv -f file1 file2 ; mv -f file2 file1; done"

The issue is fixed and no data loss was seen. Hence, moving state of bug to Verified.

Comment 8 Nithya Balachandran 2016-09-14 02:47:08 UTC
Thanks Prasad.

Closing this BZ as per comment#7.


Note You need to log in before you can comment on or make changes to this bug.