1558352 – [EC] Read performance of EC volume exported over gNFS is significantly lower than write performance

Bug 1558352 - [EC] Read performance of EC volume exported over gNFS is significantly lower than write performance

Summary: [EC] Read performance of EC volume exported over gNFS is significantly lower ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	3.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1554743
Blocks:	1557904 1557906 1559084
TreeView+	depends on / blocked

Reported:	2018-03-20 05:54 UTC by Ashish Pandey
Modified:	2018-04-24 06:53 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.12.8
Clone Of:	1554743
Environment:
Last Closed:	2018-04-24 06:53:38 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ashish Pandey 2018-03-20 05:54:15 UTC

+++ This bug was initially created as a clone of Bug #1554743 +++

Description of problem:
Reads are only at 47MB/s while writes are at 219MB/s:



dd if=/dev/zero of=/media1/results/results/test-toberemoved/test.bin bs=1M count=1000 conv=fdatasync
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 4.785 s, 219 MB/s

echo 3 > /proc/sys/vm/drop_caches

dd if=/media1/results/results/test-toberemoved/test.bin of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 22.1433 s, 47.4 MB/s
================================================================================

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2018-03-13 05:47:14 EDT ---

REVIEW: https://review.gluster.org/19703 (cluster/ec: Change default read policy to gfid-hash) posted (#2) for review on master by Ashish Pandey

--- Additional comment from Worker Ant on 2018-03-14 06:10:44 EDT ---

COMMIT: https://review.gluster.org/19703 committed in master by "Ashish Pandey" <aspandey> with a commit message- cluster/ec: Change default read policy to gfid-hash

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1554743
Signed-off-by: Ashish Pandey <aspandey>

Comment 1 Worker Ant 2018-03-20 06:10:25 UTC

REVIEW: https://review.gluster.org/19743 (cluster/ec: Change default read policy to gfid-hash) posted (#1) for review on release-3.12 by Ashish Pandey

Comment 2 Worker Ant 2018-04-06 12:49:33 UTC

COMMIT: https://review.gluster.org/19743 committed in release-3.12 by "jiffin tony Thottan" <jthottan> with a commit message- cluster/ec: Change default read policy to gfid-hash

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey <aspandey>

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1558352
Signed-off-by: Ashish Pandey <aspandey>

Comment 3 Jiffin 2018-04-24 06:53:38 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.8, please open a new bug report.

glusterfs-3.12.8 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-devel/2018-April/054749.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.