Bug 1665029

Summary:	read-ahead and io-cache degrading performance on sequential read
Product:	[Community] GlusterFS	Reporter:	Manoj Pillai <mpillai>
Component:	read-ahead	Assignee:	Raghavendra G <rgowdapp>
Status:	CLOSED DUPLICATE	QA Contact:
Severity:	high	Docs Contact:
Priority:	urgent
Version:	mainline	CC:	atumball, bugs, cavassin, csaba, guillaume.pavese, rgowdapp, shberry
Target Milestone:	---	Keywords:	Performance
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1676479 (view as bug list)		Environment:
Last Closed:	2019-07-05 08:52:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1676479

Description Manoj Pillai 2019-01-10 10:35:13 UTC

Description of problem:
A large-file sequential read test reports better throughput with read-ahead and io-cache xlators turned off.

One test is obviously not enough to decide the fate of these xlators. But through this bz we can answer some of the relevant questions.

Version-Release number of selected component (if applicable):
glusterfs-*5.2-1.el7.x86_64
kernel-3.10.0-957.el7.x86_64 (RHEL 7.6)

How reproducible:
Consistently

Steps to Reproduce:
1. fio write test to generate data set:
fio --name=initialwrite --ioengine=sync --rw=write \
--direct=0 --create_on_open=1 --end_fsync=1 --bs=128k \
--directory=/mnt/glustervol/ --filename_format=f.\$jobnum.\$filenum \
--filesize=16g --size=16g --numjobs=4

2. unmount and re-mount volume on client

3. fio sequential read test to read back the data written in step 1:
fio --name=readtest --ioengine=sync --rw=read \
--direct=0 --bs=128k --directory=/mnt/glustervol/ \
--filename_format=f.\$jobnum.\$filenum --filesize=16g \
--size=16g --numjobs=4

Actual results:

With default volume settings:
READ: bw=581MiB/s (610MB/s), 145MiB/s-146MiB/s (152MB/s-153MB/s), io=64.0GiB (68.7GB), run=112401-112717msec

With read-ahead and io-cache turned off:
READ: bw=1083MiB/s (1136MB/s), 271MiB/s-271MiB/s (284MB/s-284MB/s), io=64.0GiB (68.7GB), run=60487-60491msec

So in this test, there is a significant performance gain with these xlators turned off.

The low performance with default volume settings was first seen in another bz:https://bugzilla.redhat.com/show_bug.cgi?id=1664934#c0.

Expected results:
performance xlators should not degrade performance

Additional info:

Comment 1 Manoj Pillai 2019-01-10 13:18:03 UTC

Data showing that both read-ahead and io-cache cause performance degradation. I'm modifying the test in comment #0 here, substituting numjobs=2, instead of 4.

Test sequence:
fio --name=initialwrite --ioengine=sync --rw=write --direct=0 --create_on_open=1 --end_fsync=1 --bs=128k --directory=/mnt/glustervol/ --filename_format=f.\$jobnum.\$filenum --filesize=16g --size=16g --numjobs=2
[unmount and mount volume]
fio --name=readtest --ioengine=sync --rw=read --direct=0 --bs=128k --directory=/mnt/glustervol/ --filename_format=f.\$jobnum.\$filenum --filesize=16g --size=16g --numjobs=2

Result with default settings:
READ: bw=485MiB/s (509MB/s), 243MiB/s-243MiB/s (254MB/s-255MB/s), io=32.0GiB (34.4GB), run=67504-67522msec

Result with read-ahead turned off:
READ: bw=776MiB/s (813MB/s), 388MiB/s-388MiB/s (407MB/s-407MB/s), io=32.0GiB (34.4GB), run=42220-42237msec

Result with read-ahead and io-cache turned off:
READ: bw=1108MiB/s (1162MB/s), 554MiB/s-554MiB/s (581MB/s-581MB/s), io=32.0GiB (34.4GB), run=29565-29573msec

Comment 2 Raghavendra G 2019-01-11 06:40:57 UTC

Some observations while debugging the performance degradation with gluster read-ahead:

* Kernel too does read-ahead and it sends parallel read-requests as part of this.
* client-io-threads is on in this configuration.

The above two points mean parallel requests sent by kernel can reach read-ahead out of order. This means read-ahead no longer sees read requests at sequential contiguous offsets and hence it things reads are random. For random reads, it resets the read sequence. But when requests reach read-ahead in order, read-ahead is turned on again. Due to this intermittent toggling, much of read-ahead data is wasted regressing the performance. With client-io-threads off and I can no longer see the regression for the test case given. If I run the test with single fio job (--numjobs=1), gluster read-ahead on outperforms gluster read-ahead off on my setup.

[1] https://review.gluster.org/#/c/glusterfs/+/20981/

Comment 3 Raghavendra G 2019-01-11 06:42:55 UTC

(In reply to Raghavendra G from comment #2)
> Some observations while debugging the performance degradation with gluster
> read-ahead:
> 
> * Kernel too does read-ahead and it sends parallel read-requests as part of
> this.
> * client-io-threads is on in this configuration.
> 
> The above two points mean parallel requests sent by kernel can reach
> read-ahead out of order. This means read-ahead no longer sees read requests
> at sequential contiguous offsets and hence it things reads are random. For
> random reads, it resets the read sequence. But when requests reach
> read-ahead in order, read-ahead is turned on again. Due to this intermittent
> toggling, much of read-ahead data is wasted regressing the performance. With
> client-io-threads off and I can no longer see the regression for the test
> case given. If I run the test with single fio job (--numjobs=1), gluster
> read-ahead on outperforms gluster read-ahead off on my setup.

... single fio job (--numjobs=1), gluster read-ahead on with client-io-threads off outperforms gluster read-ahead off with client-io-threads off.

> 
> [1] https://review.gluster.org/#/c/glusterfs/+/20981/

Comment 4 Manoj Pillai 2019-01-11 12:40:52 UTC

(In reply to Manoj Pillai from comment #1)
> Data showing that both read-ahead and io-cache cause performance
> degradation. I'm modifying the test in comment #0 here, substituting
> numjobs=2, instead of 4.
> 
> Test sequence:
> fio --name=initialwrite --ioengine=sync --rw=write --direct=0
> --create_on_open=1 --end_fsync=1 --bs=128k --directory=/mnt/glustervol/
> --filename_format=f.\$jobnum.\$filenum --filesize=16g --size=16g --numjobs=2
> [unmount and mount volume]
> fio --name=readtest --ioengine=sync --rw=read --direct=0 --bs=128k
> --directory=/mnt/glustervol/ --filename_format=f.\$jobnum.\$filenum
> --filesize=16g --size=16g --numjobs=2
> 
> Result with default settings:
> READ: bw=485MiB/s (509MB/s), 243MiB/s-243MiB/s (254MB/s-255MB/s), io=32.0GiB
> (34.4GB), run=67504-67522msec
> 
> Result with read-ahead turned off:
> READ: bw=776MiB/s (813MB/s), 388MiB/s-388MiB/s (407MB/s-407MB/s), io=32.0GiB
> (34.4GB), run=42220-42237msec
> 
> Result with read-ahead and io-cache turned off:
> READ: bw=1108MiB/s (1162MB/s), 554MiB/s-554MiB/s (581MB/s-581MB/s),
> io=32.0GiB (34.4GB), run=29565-29573msec

Result with ciot=off, io-cache=off, gluster ra=on, read-ahead-page-count=10:
[these settings are based on comment #2 and comment #3]
READ: bw=975MiB/s (1023MB/s), 488MiB/s-488MiB/s (511MB/s-512MB/s), io=32.0GiB (34.4GB)

Comparing the best results seen (1108 vs 975), the gluster ra=off case is still a little bit better.

Result with ciot=off, io-cache=on, gluster ra=on, read-ahead-page-count=10:
READ: bw=674MiB/s (706MB/s), 337MiB/s-339MiB/s (353MB/s-355MB/s), io=32.0GiB (34.4GB)

Comment 5 Amar Tumballi 2019-02-12 12:11:51 UTC

Please go ahead and disable it by default in upstream master. Lets get a run done with these values, and if the performance is good without these 2 translators, then we can backport the patch to glusterfs-6 branch.

Otherwise, it would allow us another 2 months to validate it in upstream master before glusterfs-7.

Comment 6 Amar Tumballi 2019-07-02 03:58:44 UTC

https://review.gluster.org/22203

Comment 7 Amar Tumballi 2019-07-05 08:52:15 UTC


*** This bug has been marked as a duplicate of bug 1676479 ***