Bug 1142650
Summary: | [DATA LOSS]- DHT- rename from multiple mount ends in data loss | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rachana Patel <racpatel> | |
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | storage-qa-internal <storage-qa-internal> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.0 | CC: | mzywusko, nbalacha, rgowdapp, rhs-bugs, smohan, spalai, tdesala | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | dht-data-loss, dht-fixed, dht-pm-query | |||
Fixed In Version: | 3.7.9-10 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1157667 (view as bug list) | Environment: | ||
Last Closed: | 2016-09-14 02:47:08 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1141368, 1166570 | |||
Bug Blocks: | 1157667 |
Description
Rachana Patel
2014-09-17 07:20:56 UTC
Reproduced the issue with TRACE enabled. 1. Created two brick setup. 2. Created two files[make sure they hash to different bricks]. In my case they are tile and zile 3. Run "while true; done mv -f tile zile ; mv -f zile tile; done" form both nfs and fuse Logs: [Captured last logs for either unlink or rename from both mount points] 1. from mnt.log [UNLINK was the last operation on zile ] [2014-09-17 10:09:04.412720] T [fuse-bridge.c:435:fuse_entry_cbk] 0-glusterfs-fuse: 99191: LOOKUP() /zile => 11886673162260437651 [2014-09-17 10:09:04.412830] T [fuse-bridge.c:1570:fuse_unlink_resume] 0-glusterfs-fuse: 99192: UNLINK /zile [2014-09-17 10:09:04.418667] T [fuse-bridge.c:1290:fuse_unlink_cbk] 0-glusterfs-fuse: 99192: UNLINK() /zile => 0 2. from nfs.log [rename tile->zile was the last operation on tile, zile] [2014-09-17 10:09:04.397658] T [nfs-fops.c:1293:nfs_fop_rename] 0-nfs: Rename: /tile -> /zile [2014-09-17 10:09:04.397696] I [dht-rename.c:1345:dht_rename] 0-test1-dht: renaming /tile (hash=test1-client-0/cache=test1-client-0) => /zile (hash=test1-client-1/cache=<nul>) [2014-09-17 10:09:04.399864] T [MSGID: 0] [dht-rename.c:1051:dht_rename_create_links] 0-test1-dht: linkfile /tile @ test1-client-1 => test1-client-0 [2014-09-17 10:09:04.405688] T [MSGID: 0] [dht-rename.c:921:dht_rename_linkto_cbk] 0-test1-dht: link /tile => /zile (test1-client-0) [2014-09-17 10:09:04.405983] T [MSGID: 0] [dht-rename.c:839:dht_do_rename] 0-test1-dht: renaming /tile => /zile (test1-client-1) [2014-09-17 10:09:04.407583] T [MSGID: 0] [dht-rename.c:740:dht_rename_cbk] 0-test1-dht: deleting old src datafile /tile @ test1-client-0 Observations: The unlink of zile from /mnt "412720" and deletion of tile from /nfs "407583+some delay as it's not yet deleted" are very close and they are the last operations captured on the logs. And looks what Shyam pointed out earlier. 1. NFS mount tries to do rename tile -> zile and FUSE mount attempting zile -> tile 2. In case "tile->zile" tile got unlinked from nfs mount, but lookup happended same time around from FUSE mount. 3. And in the process of "zile->file" on FUSE mount, FUSE sent "unlink zile". And we loose the file. Regards, Susant Bz 1166570 which this bug depends on is fixed in RHEL-7.2. Since we shipped rhgs-3.1.3 on RHEL-7.2, this bug should be fixed in 3.1.3 <bz 116570> Status: ASSIGNED → MODIFIED Fixed In Version: coreutils-8.22-13.el7 </116570> Verified this bug on glusterfs build 3.7.9-12.el7rhgs.x86_64. Here are the steps that were performed, 1. Created a distributed replica volume and started it. 2. NFS and Fuse mounted the volume on two different clients. 2. Created two files file1 and file2. 3. Simultaneously from NFS and Fuse mounts, continuously renamed the two files "while true; do mv -f file1 file2 ; mv -f file2 file1; done" The issue is fixed and no data loss was seen. Hence, moving state of bug to Verified. |