Created attachment 1090321 [details] testparm output for samba configuration used Description of problem: Using samba-vfs-glusterfs with gluster set to replica 4 all read calls (multiple reads to/from multiple files) are served from a single brick. When the same test is run using a fuse mount instead it reads from all replicas resulting in higher ops/s. samba-vfs-glusterfs: gluster volume profile lzone info | egrep "Brick|READ$" Brick: lzreptest001:/export/brick1/lzone Brick: lzreptest001:/export/brick2/lzone Brick: lzreptest002:/export/brick1/lzone Brick: lzreptest002:/export/brick2/lzone 1.57 51.21 us 21.00 us 2974.00 us 13815 READ Fuse mount: gluster volume profile lzone info | egrep "Brick|READ$" Brick: lzreptest001:/export/brick1/lzone 0.95 159.62 us 22.00 us 14080.00 us 7887 READ Brick: lzreptest001:/export/brick2/lzone 0.94 170.46 us 20.00 us 2985.00 us 7490 READ Brick: lzreptest002:/export/brick1/lzone 0.88 267.29 us 26.00 us 17699.00 us 7853 READ Brick: lzreptest002:/export/brick2/lzone 0.90 258.96 us 25.00 us 53134.00 us 7903 READ Version-Release number of selected component (if applicable): glusterfs-3.6.6-1.el6.x86_64 glusterfs-api-3.6.6-1.el6.x86_64 glusterfs-cli-3.6.6-1.el6.x86_64 glusterfs-fuse-3.6.6-1.el6.x86_64 glusterfs-geo-replication-3.6.6-1.el6.x86_64 glusterfs-libs-3.6.6-1.el6.x86_64 glusterfs-server-3.6.6-1.el6.x86_64 samba-4.1.17-4.el6rhs.x86_64 samba-client-4.1.17-4.el6rhs.x86_64 samba-common-4.1.17-4.el6rhs.x86_64 samba-libs-4.1.17-4.el6rhs.x86_64 samba-vfs-glusterfs-4.1.17-4.el6rhs.x86_64 How reproducible: Every time. Steps to Reproduce: 1. Create cluster with 2 servers, 4 bricks - replica 4 2. Setup samba with gluster vfs 3. Turn on gluster profile for the volume 4. mount samba on remote box and run "filebench" with the fileserver profile 5. observe results in gluster profile Actual results: Only a single brick used for READ calls. gluster volume profile lzone info | egrep "Brick|READ$" Brick: lzreptest001:/export/brick1/lzone Brick: lzreptest001:/export/brick2/lzone Brick: lzreptest002:/export/brick1/lzone Brick: lzreptest002:/export/brick2/lzone 1.57 51.21 us 21.00 us 2974.00 us 13815 READ Expected results: All bricks used for READ calls. gluster volume profile lzone info | egrep "Brick|READ$" Brick: lzreptest001:/export/brick1/lzone 0.95 159.62 us 22.00 us 14080.00 us 7887 READ Brick: lzreptest001:/export/brick2/lzone 0.94 170.46 us 20.00 us 2985.00 us 7490 READ Brick: lzreptest002:/export/brick1/lzone 0.88 267.29 us 26.00 us 17699.00 us 7853 READ Brick: lzreptest002:/export/brick2/lzone 0.90 258.96 us 25.00 us 53134.00 us 7903 READ Additional info:
Samba does not anything with bricks. Might be a libgfapi bug.
On a replicated volume READs are served by the brick that returned from a LOOKUP first (decided by the AFR-xlator). To explain the reported behaviour, it is important to know where the volume/share is mounted, and what server was used for mounting. For example, I guess you have a setup like this: - 4 gluster storage servers - one of these storage servers exports the volume over Samba - 1 client system (not on a storage server) AFR takes care of the replication, and talks to the brick processes running on all the storage servers. This is all done in the Gluster client side (fuse mount, or vfs_glusterfs/libgfapi). When a file is opened for reading, a LOOKUP is done as the first step, this is sent to all the bricks in the volume. When mounting the Gluster volume over FUSE on a client system, AFR runs on the client-side too. All storage servers are connected equally over a network. Whichever storage server replies to the LOOKUP first, will be used to READ from. The 1st replies will more or less come randomly from the different storage servers, and the load that READ procedures cause are distributed relatively evenly. If the Gluster client (vfs_glusterfs/libgfapi) is running local on a Gluster server, the brick on 'localhost' will most of the times be quickest in replying to the LOOKUP. The other bricks are located over a network connection, and will normally need more time to reply. You will see that in the "gluster volume profile" output the server running Samba handles most of the READ requests. I hope that this explains it well. Please let us know if my assumptions are incorrect. The developers working on AFR have been added as CC on this bug and they will be able to answer more details. We might want to place a description like this in our documentation on https://gluster.readthedocs.org/ too, but I'm leaving that for others to do (feel free to copy/paste).
"gluster volume set <volname> read-hash-mode 1" will serve reads based on gfid hash which will distribute the reads.