Bug 1117214

Summary: DHT : data loss - after multiple renames of file, cached file is missing and have 2 DHT link files for that file on bricks
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED DUPLICATE QA Contact: amainkar
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: asrivast, nbalacha, nsathyan, srangana, ssamanta, vagarwal, vbellur
Target Milestone: ---   
Target Release: RHGS 3.0.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-22 06:22:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test case for a single client
none
Another test from a single client none

Description Rachana Patel 2014-07-08 10:03:38 UTC
Description of problem:
=======================
after multiple rename of same file(and once rename was done in parellal) found that file is missing and on bricks found 2 link files(DHT specific link files ) for the same file. Both files are point to different brick for cached files.

mount:-
[root@OVM1 snap]# ls -l e6
ls: cannot access e6: No such file or directory


brick:-
[root@OVM3 snap]# getfattr -d -m . /brick2/*/*
getfattr: Removing leading '/' from absolute path names
# file: brick2/b1/e6
trusted.gfid=0sope9fhKvSKOIEIHWMSTyWA==
trusted.glusterfs.dht.linkto="snap-client-2"

# file: brick2/b2/e6
trusted.gfid=0sope9fhKvSKOIEIHWMSTyWA==
trusted.glusterfs.dht.linkto="snap-client-0"



Version-Release number :
=========================
3.6.0.24-1.el6rhs.x86_64


How reproducible:
=================
Intermittent (got twice out of four time)


Steps to Reproduce:
====================
1. create and mount distributed volume. (mount on multiple client)
2. create few files on mount point.
3. did multiple rename and once it was from multiple mount point. i.e. from file c$ to d$, from d$ to e$ (did this from multiple mount point)
4. found that file is missing. verified on mount point and on bricks. 
Bricks has 2 link files and both are refering different sub-volumes



mount:-
[root@OVM1 snap]# ls -l e6
ls: cannot access e6: No such file or directory


brick:-
[root@OVM3 snap]# ls -l /brick2/*
/brick2/b1:
total 4
-rw-r--r-- 2 root root 0 Jul  7 20:30 d10
---------T 2 root root 0 Jul  7 21:07 d9
-rw-r--r-- 2 root root 0 Jul  7 20:30 e1
-rw-r--r-- 2 root root 0 Jul  7 20:30 e2
---------T 2 root root 0 Jul  7 20:34 e3
-rw-r--r-- 2 root root 0 Jul  7 20:30 e5
---------T 2 root root 0 Jul  7 20:33 e6    <--------------
---------T 2 root root 0 Jul  7 20:34 e8
-rw-r--r-- 2 root root 4 Jul  7 21:07 e9
-rw-r--r-- 2 root root 0 Jul  7 20:21 f{1.10000}

/brick2/b2:
total 0
-rw-r--r-- 2 root root 0 Jul  7 20:30 d4
---------T 2 root root 0 Jul  7 20:34 e1
---------T 2 root root 0 Jul  7 20:34 e2
---------T 2 root root 0 Jul  7 20:34 e6    <----------------
---------T 2 root root 0 Jul  7 20:34 e7
-rw-r--r-- 2 root root 0 Jul  7 20:30 e8
-rw-r--r-- 2 root root 0 Jul  7 20:25 new
-rw-r--r-- 2 root root 0 Jul  7 20:26 new1

/brick2/b3:
total 0
-rw-r--r-- 2 root root 0 Jul  7 20:30 d9
-rw-r--r-- 2 root root 0 Jul  7 20:30 e3
---------T 2 root root 0 Jul  7 20:34 e5
-rw-r--r-- 2 root root 0 Jul  7 20:30 e7
-rw-r--r-- 2 root root 0 Jul  7 20:26 new2


[root@OVM3 snap]# getfattr -d -m . /brick2/*/*
getfattr: Removing leading '/' from absolute path names
# file: brick2/b1/e6
trusted.gfid=0sope9fhKvSKOIEIHWMSTyWA==
trusted.glusterfs.dht.linkto="snap-client-2"

# file: brick2/b2/e6
trusted.gfid=0sope9fhKvSKOIEIHWMSTyWA==
trusted.glusterfs.dht.linkto="snap-client-0"


Actual results:
===============
2 link files(DHT link file) are present and cached file is missing.

Expected results:
================
for any file, only one link file(DHT link file) should be present and rename should not delete cached file

Comment 2 Shyamsundar 2014-07-14 18:17:21 UTC
Created attachment 917940 [details]
Test case for a single client

Tried the attached test case (in 2 forms) to reproduce the problem from a single client. This has been passing without failures for over 10 runs of the same.

We do have multi client rename race issues as observed in bug #1117135 and hence would suggest running this test case post those fixes in a multi client fashion.

For now I think this is dependent or a another mianifestation of the same rename problem in the stated bug.

Testing was done on upstream code and including the fix from, http://review.gluster.org/#/c/8269/

Comment 3 Shyamsundar 2014-07-14 18:18:28 UTC
Created attachment 917941 [details]
Another test from a single client

Additional case, to leave behind link files.