Bug 1557906

Summary: [EC] Read performance of EC volume exported over gNFS is significantly lower than write performance
Product: [Community] GlusterFS Reporter: Ashish Pandey <aspandey>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.0CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.0.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1554743
: 1559084 (view as bug list) Environment:
Last Closed: 2018-03-26 12:32:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1554743, 1558352    
Bug Blocks: 1557904, 1559084    

Description Ashish Pandey 2018-03-19 08:46:11 UTC
+++ This bug was initially created as a clone of Bug #1554743 +++

Description of problem:
Reads are only at 47MB/s while writes are at 219MB/s:



dd if=/dev/zero of=/media1/results/results/test-toberemoved/test.bin bs=1M count=1000 conv=fdatasync
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 4.785 s, 219 MB/s

echo 3 > /proc/sys/vm/drop_caches

dd if=/media1/results/results/test-toberemoved/test.bin of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 22.1433 s, 47.4 MB/s
================================================================================

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2018-03-13 05:47:14 EDT ---

REVIEW: https://review.gluster.org/19703 (cluster/ec: Change default read policy to gfid-hash) posted (#2) for review on master by Ashish Pandey

--- Additional comment from Worker Ant on 2018-03-14 06:10:44 EDT ---

COMMIT: https://review.gluster.org/19703 committed in master by "Ashish Pandey" <aspandey@redhat.com> with a commit message- cluster/ec: Change default read policy to gfid-hash

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1554743
Signed-off-by: Ashish Pandey <aspandey@redhat.com>

Comment 1 Worker Ant 2018-03-19 08:53:29 UTC
REVIEW: https://review.gluster.org/19739 (cluster/ec: Change default read policy to gfid-hash) posted (#1) for review on release-4.0 by Ashish Pandey

Comment 2 Worker Ant 2018-03-20 11:00:03 UTC
COMMIT: https://review.gluster.org/19739 committed in release-4.0 by "Ashish Pandey" <aspandey@redhat.com> with a commit message- cluster/ec: Change default read policy to gfid-hash

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey <aspandey@redhat.com>

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1557906
Signed-off-by: Ashish Pandey <aspandey@redhat.com>

Comment 3 Shyamsundar 2018-03-26 12:32:11 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.1, please open a new bug report.

glusterfs-4.0.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000093.html
[2] https://www.gluster.org/pipermail/gluster-users/