Bug 761810 (GLUSTER-78)

Summary: Dbench fails with "File exists"
Product: [Community] GlusterFS Reporter: Basavanagowda Kanur <gowda>
Component: replicateAssignee: Vijay Bellur <vbellur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: pre-2.0CC: gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Basavanagowda Kanur 2009-06-25 08:20:50 UTC
[Migrated from RT] - ticket 912 [http://support.gluster.com/rt/Ticket/Display.html?id=912]

Comment 1 Basavanagowda Kanur 2009-06-25 11:20:19 UTC
Thu Apr 09 02:42:34 2009  	 guru - Ticket created  	 	 
Version: glusterfs 2.0.0 pre35

* 100 TB cluster
* Distribute over replicate, all server except brick3 exporting over IB
and TCP
* I am not sure if this is related to #897 (tarball extraction failed,
possibly due to disk space getting filled)
* Free disk space does not seem to be an issue (since the maximum use
percentage is about 30% on the 7 servers)



# i=0; while true; do echo "===== $i ====="; ((i++));
/opt/benchmarks/dbench-4.0/bin/dbench -s -S -F 48 || break; done
..
48 3557 1.62 MB/sec execute 479 sec latency 7781.164 ms
48 3563 1.62 MB/sec execute 480 sec latency 8731.584 ms
[3626] rename ./clients/client15/~dmtmp/COREL/GRAPH1.CDR
./clients/client15/~dmtmp/COREL/@@@CDRW.TMP failed (File exists) -
expected NT_STATUS_OK
ERROR: child 15 failed at line 3626
Child failed with status 1


From the client log:
..
2009-04-08 10:36:00 E [fuse-bridge.c:1280:fuse_rename_cbk]
glusterfs-fuse: 26640699:
/dbench/clients/client15/~dmtmp/COREL/GRAPH1.CDR ->
/dbench/clients/client15/~dmtmp/COREL/@@@CDRW.TMP => -1 (File exists)

--------------------------------------------------------------------------------
#   	Fri Apr 17 18:10:27 2009 	gowda - Correspondence added

please observe the following log.

2009-04-07 07:08:07 W
[afr-self-heal-entry.c:496:afr_sh_entry_expunge_unlink] afr2: unlinking
file /d
bench/clients/client15/~dmtmp/COREL/@@@CDRW.TMP on brick2-ib

now, this file turns out to be dht's linkfile pointing to another
subvolume of dht - afr1. kernel sends forget and dht also forgets about
the above mentioned file.

later a rename() fails with the below error logged at mount/fuse.

2009-04-08 10:36:00 E [fuse-bridge.c:1280:fuse_rename_cbk]
glusterfs-fuse: 26640699: /dbench/clients/
client15/~dmtmp/COREL/GRAPH1.CDR ->
/dbench/clients/client15/~dmtmp/COREL/@@@CDRW.TMP => -1 (File exists)

before this rename(), dht would have got fresh lookup() for
/dbench/clients/client15/~dmtmp/COREL/@@@CDRW.TMP and would have
returned -1 (ENOENT).

during rename(), dht thinks that the destination file does not exist
anywhere in its namespace. and sends link() to afr1 to link destination
path to source, which fails with EEXISTS. if you notice the existing
file on afr1 is the one whose linkfile was deleted by afr2 as logged in
the first part.

-- 
gowda

Comment 2 Vikas Gorur 2009-07-09 11:01:52 UTC
Gowda, Avati:

What is the status of this bug? Has this been fixed in DHT?