The issue is as follows: The dht_linkfile_create_cbk () function does a lookup if the create call fails with EEXIST. The lookup uses the same frame as that for the create so the op_ret and op_errno returned by the mknod call is overwritten by that of the lookup. As the linkfile creation apparently succeeds, tier proceeds to create the data file, but using the gfid-req value which does not match the gfid on the already present linktofile.
RCA: I was able to hit this using gdb using the following steps: 1. mount the tier volume using gluster-nfs 2. gdb into the nfs process and set a breakpoint at tier_create_linkfile_create_cbk 3. On the mount point, create a file using "touch f3" 4. Once the breakpoint is hit, run 'gluster v start <volname> force'. This will restart the NFS server. The create call is thus aborted after the linkto file is created but before the data file is. 5. Once the touch command returns, check the gfids of the linkfile and the data file on the brick. They will be different. [root@nb-rhs3-srv1 bricks]# getx brick2/hot-*/f3 # file: brick2/hot-3/f3 trusted.gfid=0x4fabbe60d86b402cb064356db35cf798 [root@nb-rhs3-srv1 bricks]# getx brick1/gs1-*/f3 # file: brick1/gs1-3/f3 trusted.gfid=0xbf1f836f07174259b6332759cc58e867 trusted.tier.tier-dht.linkto=0x6773312d686f742d64687400 It looks like the NFS client sends the create call again without a lookup. This issue also exists in 3.1.2 (reproducible using the same set of steps). So I am removing the Regression keyword. --- Additional comment from Nithya Balachandran on 2016-05-12 06:43:20 EDT --- The same issue exists in dht.
REVIEW: http://review.gluster.org/14352 (cluster/dht : Use a new frame for linkfile lookup) posted (#1) for review on master by N Balachandran (nbalacha)
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.