I'm seeing a problem on my fairly fresh RHEL gluster install. Smells to me like a parallelism problem on the server. If I mount a gluster volume via NFS (using glusterd's internal NFS server, nfs-kernel-server) and read a directory from multiple clients *in parallel*, I get inconsistent results across servers. Some files are missing from the directory listing, some may be present twice! Exactly which files (or directories!) are missing/duplicated varies each time. But I can very consistently reproduce the behaviour. You can see a screenshot here: http://imgur.com/JU8AFrt The replication steps are: * clusterssh to each NFS client * unmount /gv0 (to clear cache) * mount /gv0 [1] * ls -al /gv0/common/apache-jmeter-2.9/bin (which is where I first noticed this) Here's the rub: if, instead of doing the 'ls' in parallel, I do it in series, it works just fine (consistent correct results everywhere). But hitting the gluster server from multiple clients at the same time causes problems. I can still stat() and open() the files missing from the directory listing, they just don't show up in an enumeration. Mounting gv0 as a gluster client filesystem works just fine. Details of my setup: 2 × gluster servers: 2×E5-2670, 128GB RAM, RHEL 6.4 64-bit, glusterfs-server-3.3.1-1.el6.x86_64 (from EPEL) 4 × NFS clients: 2×E5-2660, 128GB RAM, RHEL 5.7 64-bit, glusterfs-3.3.1-11.el5 (from kkeithley's repo, only used for testing) gv0 volume information is below bricks are 400GB SSDs with ext4[2] common network is 10GbE, replication between servers happens over direct 10GbE link. I will be testing on xfs/btrfs/zfs eventually, but for now I'm on ext4. Also attached is my chatlog from asking about this in #gluster [1]: fstab line is: fearless1:/gv0 /gv0 nfs defaults,sync,tcp,wsize=8192,rsize=8192 0 0 [2]: yes, I've turned off dir_index to avoid That Bug. I've run the d_off test, results are here: http://pastebin.com/zQt5gZnZ ---- gluster> volume info gv0 Volume Name: gv0 Type: Distributed-Replicate Volume ID: 20117b48-7f88-4f16-9490-a0349afacf71 Status: Started Number of Bricks: 8 x 2 = 16 Transport-type: tcp Bricks: Brick1: fearless1:/export/bricks/500117310007a6d8/glusterdata Brick2: fearless2:/export/bricks/500117310007a674/glusterdata Brick3: fearless1:/export/bricks/500117310007a714/glusterdata Brick4: fearless2:/export/bricks/500117310007a684/glusterdata Brick5: fearless1:/export/bricks/500117310007a7dc/glusterdata Brick6: fearless2:/export/bricks/500117310007a694/glusterdata Brick7: fearless1:/export/bricks/500117310007a7e4/glusterdata Brick8: fearless2:/export/bricks/500117310007a720/glusterdata Brick9: fearless1:/export/bricks/500117310007a7ec/glusterdata Brick10: fearless2:/export/bricks/500117310007a74c/glusterdata Brick11: fearless1:/export/bricks/500117310007a838/glusterdata Brick12: fearless2:/export/bricks/500117310007a814/glusterdata Brick13: fearless1:/export/bricks/500117310007a850/glusterdata Brick14: fearless2:/export/bricks/500117310007a84c/glusterdata Brick15: fearless1:/export/bricks/500117310007a858/glusterdata Brick16: fearless2:/export/bricks/500117310007a8f8/glusterdata Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on nfs.disable: off
*** Bug 948088 has been marked as a duplicate of this bug. ***
*** Bug 948087 has been marked as a duplicate of this bug. ***
REVIEW: http://review.gluster.org/4963 (posix: fix dangerous "sharing" of fd in readdir between two requests) posted (#1) for review on release-3.4 by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/4963 committed in release-3.4 by Anand Avati (avati) ------ commit 5ac55756cd923e4bb1e5b5df50aeaf198d5531b7 Author: Anand Avati <avati> Date: Wed Apr 3 16:31:07 2013 -0700 posix: fix dangerous "sharing" of fd in readdir between two requests posix_fill_readdir() is a multi-step function which performs many readdir() calls, and expects the directory cursor to have not "seeked away" elsewhere between two successive iterations. Usually this is not a problem as each opendir() from an application has its own backend fd, and there is nobody else to "seek away" the directory cursor. However in case of NFS's use of anonymous fd, the same fd_t is shared between all NFS readdir requests, and two readdir loops can be executing in parallel on the same dir dragging away the cursor in a chaotic manner. The fix in this patch is to lock on the fd around the loop. Another approach could be to reimplement posix_fill_readdir() with a single getdents() call, but that's for another day. Change-Id: Ia42e9c7fbcde43af4c0d08c20cc0f7419b98bd3f BUG: 948086 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/4774 Reviewed-by: Jeff Darcy <jdarcy> Tested-by: Gluster Build System <jenkins.com> Reviewed-on: http://review.gluster.org/4963