Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 948087

Summary:	concurrent ls on NFS results in inconsistent views
Product:	[Community] GlusterFS	Reporter:	Anand Avati <aavati>
Component:	posix	Assignee:	Vijay Bellur <vbellur>
Status:	CLOSED DUPLICATE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	chrisw, gluster-bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-04-04 08:30:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Anand Avati 2013-04-03 23:51:38 UTC

I'm seeing a problem on my fairly fresh RHEL gluster install. Smells to me like a parallelism problem on the server.

If I mount a gluster volume via NFS (using glusterd's internal NFS server, nfs-kernel-server) and read a directory from multiple clients *in parallel*, I get inconsistent results across servers. Some files are missing from the directory listing, some may be present twice!

Exactly which files (or directories!) are missing/duplicated varies each time. But I can very consistently reproduce the behaviour.

You can see a screenshot here: http://imgur.com/JU8AFrt

The replication steps are:
* clusterssh to each NFS client
* unmount /gv0 (to clear cache)
* mount /gv0 [1]
* ls -al /gv0/common/apache-jmeter-2.9/bin (which is where I first noticed this)

Here's the rub: if, instead of doing the 'ls' in parallel, I do it in series, it works just fine (consistent correct results everywhere). But hitting the gluster server from multiple clients at the same time causes problems.

I can still stat() and open() the files missing from the directory listing, they just don't show up in an enumeration.

Mounting gv0 as a gluster client filesystem works just fine.

Details of my setup:
2 × gluster servers: 2×E5-2670, 128GB RAM, RHEL 6.4 64-bit, glusterfs-server-3.3.1-1.el6.x86_64 (from EPEL)
4 × NFS clients: 2×E5-2660, 128GB RAM, RHEL 5.7 64-bit, glusterfs-3.3.1-11.el5 (from kkeithley's repo, only used for testing)
gv0 volume information is below
bricks are 400GB SSDs with ext4[2]
common network is 10GbE, replication between servers happens over direct 10GbE link.

I will be testing on xfs/btrfs/zfs eventually, but for now I'm on ext4. 

Also attached is my chatlog from asking about this in #gluster

[1]: fstab line is: fearless1:/gv0 /gv0 nfs defaults,sync,tcp,wsize=8192,rsize=8192 0 0
[2]: yes, I've turned off dir_index to avoid That Bug. I've run the d_off test, results are here: http://pastebin.com/zQt5gZnZ

----
gluster> volume info gv0
 
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 20117b48-7f88-4f16-9490-a0349afacf71
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: fearless1:/export/bricks/500117310007a6d8/glusterdata
Brick2: fearless2:/export/bricks/500117310007a674/glusterdata
Brick3: fearless1:/export/bricks/500117310007a714/glusterdata
Brick4: fearless2:/export/bricks/500117310007a684/glusterdata
Brick5: fearless1:/export/bricks/500117310007a7dc/glusterdata
Brick6: fearless2:/export/bricks/500117310007a694/glusterdata
Brick7: fearless1:/export/bricks/500117310007a7e4/glusterdata
Brick8: fearless2:/export/bricks/500117310007a720/glusterdata
Brick9: fearless1:/export/bricks/500117310007a7ec/glusterdata
Brick10: fearless2:/export/bricks/500117310007a74c/glusterdata
Brick11: fearless1:/export/bricks/500117310007a838/glusterdata
Brick12: fearless2:/export/bricks/500117310007a814/glusterdata
Brick13: fearless1:/export/bricks/500117310007a850/glusterdata
Brick14: fearless2:/export/bricks/500117310007a84c/glusterdata
Brick15: fearless1:/export/bricks/500117310007a858/glusterdata
Brick16: fearless2:/export/bricks/500117310007a8f8/glusterdata
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: off

Comment 1 Vijay Bellur 2013-04-04 08:30:04 UTC


*** This bug has been marked as a duplicate of bug 948086 ***