Bug 1336320

Summary: [Tiering]: Unable to access file(s) from nfs client; gfid mismatch between cold and hot tier entries
Product: [Community] GlusterFS Reporter: Nithya Balachandran <nbalacha>
Component: tieringAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED CURRENTRELEASE QA Contact: bugs <bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, kramdoss, nbalacha, sanandpa
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.1.3 (or later) Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1334577 Environment:
Last Closed: 2018-08-29 03:35:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1334577    
Bug Blocks:    

Comment 1 Nithya Balachandran 2016-05-16 07:50:37 UTC
The issue is as follows:
The dht_linkfile_create_cbk () function does a lookup if the create call fails with EEXIST. The lookup uses the same frame as that for the create so the op_ret and op_errno returned by the mknod call is overwritten by that of the lookup. As the linkfile creation apparently succeeds, tier proceeds to create the data file, but using the gfid-req value which does not match the gfid on the already present linktofile.

Comment 2 Nithya Balachandran 2016-05-16 07:53:43 UTC
RCA:

I was able to hit this using gdb using the following steps:

1. mount the tier volume using gluster-nfs
2. gdb into the nfs process and set a breakpoint at tier_create_linkfile_create_cbk
3. On the mount point, create a file using "touch f3"
4. Once the breakpoint is hit, run 'gluster v start <volname> force'. This will restart the NFS server. The create call is thus aborted after the linkto file is created but before the data file is.
5. Once the touch command returns, check the gfids of the linkfile and the data file on the brick. They will be different.


[root@nb-rhs3-srv1 bricks]# getx brick2/hot-*/f3
# file: brick2/hot-3/f3
trusted.gfid=0x4fabbe60d86b402cb064356db35cf798

[root@nb-rhs3-srv1 bricks]# getx brick1/gs1-*/f3
# file: brick1/gs1-3/f3
trusted.gfid=0xbf1f836f07174259b6332759cc58e867
trusted.tier.tier-dht.linkto=0x6773312d686f742d64687400


It looks like the NFS client sends the create call again without a lookup.

This issue also exists in 3.1.2 (reproducible using the same set of steps). So I am removing the Regression keyword.

--- Additional comment from Nithya Balachandran on 2016-05-12 06:43:20 EDT ---

The same issue exists in dht.

Comment 3 Vijay Bellur 2016-05-16 07:58:02 UTC
REVIEW: http://review.gluster.org/14352 (cluster/dht : Use a new frame for linkfile lookup) posted (#1) for review on master by N Balachandran (nbalacha)

Comment 4 Amar Tumballi 2018-08-29 03:35:42 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.