Bug 1416450

Summary: hardlink creation fails saying "file exists" during brick down scenario on fuse mount
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: md-cacheAssignee: Poornima G <pgurusid>
Status: CLOSED WONTFIX QA Contact: Vivek Das <vdas>
Severity: low Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-19 06:16:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1444509    

Description Nag Pavan Chilakam 2017-01-25 14:32:22 UTC
Description of problem:
======================
Note: I was working on reproducing splitbrain situations for verifying automatic split brain resolution.


1)I created a 1x2 volume 
2) set favorite child policy
3)created a file f1 from mount
4)brought down b1 
5) now did a hardlink creation of f1....created 1000 hardlinks say hlink.1,hlink.2,....hlink.1000
6)now killed b2 and brought b1 up(was trying to create splitbrains)
7)now checked the fuse mount, it shows only f1
8)tried to create hardlinks with same name as what was done in step 5 
ie ln f1 hlink.{1..1000}

I get the following error
ln: failed to create hard link ‘link.994’: File exists

if i check the mount post the above command, i found about 3 files created as below 
[root@dhcp35-196 dir2]# ll
total 0
-rw-r--r--. 4 root root 0 Jan 25 19:52 link.1
-rw-r--r--. 4 root root 0 Jan 25 19:52 link.166
-rw-r--r--. 4 root root 0 Jan 25 19:52 link.800
-rw-r--r--. 4 root root 0 Jan 25 19:52 y1
[root@dhcp35-196 dir2]# ln y1 link.1001



this issue is not seen on gnfs, ie the hlink creation passes on b2

Why is this happening , is it some caching issue?

Also, however there is not split brains or issues observed otherwise

Version-Release number of selected component (if applicable):
==========
glusterfs-server-3.8.4-13.el7rhgs.x86_64


How reproducible:
always

Comment 2 Nag Pavan Chilakam 2017-01-25 14:36:57 UTC
Volume Name: 1x2
Type: Replicate
Volume ID: ceecc137-d06f-438a-b43f-8836fa8a348d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick1/1x2
Brick2: 10.70.35.116:/rhs/brick1/1x2
Options Reconfigured:
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
cluster.favorite-child-policy: mtime
cluster.self-heal-daemon: enable

Comment 9 Poornima G 2018-11-19 06:16:59 UTC
To fix this we have to clear the cache in md-cache every time a afr brick goes down or comes back up. This will relust in lots of performance impact. This use case is not something that needs to be fixed at the cost of performance. As a workaround, just stat-prefetch off-on will solve the issue. Hence closing as wontfix.