1094328 – poor fio rand read performance with read-ahead enabled

Bug 1094328 - poor fio rand read performance with read-ahead enabled

Summary: poor fio rand read performance with read-ahead enabled

Keywords:
Status:	CLOSED DUPLICATE of bug 1676479
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	read-ahead
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Csaba Henk
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	845300 1426045
TreeView+	depends on / blocked

Reported:	2014-05-05 13:12 UTC by hchen
Modified:	2019-04-29 13:20 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Clones:	1426045 (view as bug list)
Environment:
Last Closed:	2019-04-29 13:20:17 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description hchen 2014-05-05 13:12:35 UTC

Description of problem:

Turning on read-ahead results in poor random read performance from fio

Version-Release number of selected component (if applicable):

3.4

How reproducible:


Steps to Reproduce:
1. Turn on read-ahead 
gluster volume set vol performance.read-ahead-page-count 16

2. Run fio random read workload
fio --rw=randread --bs=1m --size=4g --runtime=30 --numjobs=1 --group_reporting --name=/8 --ioengine=gfapi --volume=vol --brick=bd-vm  

The throughput is 186MB/s


Actual results:

Turning off read ahead and run the workload again, I get 350MB/s

Expected results:

Read-ahead should not slow down random read performance.

Additional info:

Comment 1 Anand Avati 2014-05-05 16:23:08 UTC

REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#1) for review on master by Huamin Chen (hchen)

Comment 2 Anand Avati 2014-05-29 15:36:31 UTC

REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#2) for review on master by Huamin Chen (hchen)

Comment 3 Anand Avati 2014-05-29 15:36:38 UTC

REVIEW: http://review.gluster.org/7927 (response to Jeff Darcy's review. 1) change tuning param name to performance.read-ahead-enable-even-noncontiguous. When set to 1, read-ahead is enabled even for noncontiguous IO; when set to 0 (default), read-ahead is disabled when noncontiguity is detected. 2) the end of file offset is conditionally set) posted (#1) for review on master by Huamin Chen (hchen)

Comment 4 Anand Avati 2014-06-02 19:55:00 UTC

REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#3) for review on master by Huamin Chen (hchen)

Comment 5 Anand Avati 2014-06-02 19:59:16 UTC

REVIEW: http://review.gluster.org/7958 (response to Jeff Darcy's review. 1) change tuning param name to performance.read-ahead-enable-even-noncontiguous. When set to 1, read-ahead is enabled even for noncontiguous IO; when set to 0 (default), read-ahead is disabled when noncontiguity is detected. 2) the end of file offset is conditionally set) posted (#1) for review on master by Huamin Chen (hchen)

Comment 6 Anand Avati 2014-06-02 20:25:31 UTC

REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#4) for review on master by Huamin Chen (hchen)

Comment 7 Anand Avati 2015-05-28 08:35:10 UTC

REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#6) for review on master by Raghavendra G (rgowdapp)

Comment 8 Mike McCune 2016-03-28 23:25:37 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 9 Worker Ant 2017-07-20 06:47:47 UTC

REVIEW: https://review.gluster.org/7676 (performance/read-ahead: Enable read-ahead for strided reads) posted (#7) for review on master by Raghavendra G (rgowdapp)

Comment 10 Worker Ant 2017-07-20 09:00:52 UTC

REVIEW: https://review.gluster.org/7676 (performance/read-ahead: Enable read-ahead for strided reads) posted (#8) for review on master by Raghavendra G (rgowdapp)

Comment 11 Mohit Agrawal 2017-07-24 05:41:47 UTC

Below is the fio output without patch

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

fio ./fio.jb 
workload: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8
fio-2.2.8
Starting 1 thread
Jobs: 1 (f=4): [r(1)] [100.0% done] [4544KB/0KB/0KB /s] [1136/0/0 iops] [eta 00m:00s]
workload: (groupid=0, jobs=1): err= 0: pid=1505: Thu Jul 20 03:53:14 2017
  read : io=286452KB, bw=4773.5KB/s, iops=1193, runt= 60009msec
    slat (usec): min=15, max=6882, avg=101.54, stdev=68.74
    clat (usec): min=726, max=92017, avg=6599.76, stdev=3636.11
     lat (usec): min=793, max=92121, avg=6701.56, stdev=3636.42
    clat percentiles (usec):
     |  1.00th=[  924],  5.00th=[ 1048], 10.00th=[ 2160], 20.00th=[ 3376],
     | 30.00th=[ 4448], 40.00th=[ 5472], 50.00th=[ 6496], 60.00th=[ 7456],
     | 70.00th=[ 8512], 80.00th=[ 9408], 90.00th=[10432], 95.00th=[12224],
     | 99.00th=[17536], 99.50th=[19328], 99.90th=[25472], 99.95th=[28032],
     | 99.99th=[49408]
    bw (KB  /s): min= 3944, max= 5160, per=100.00%, avg=4778.32, stdev=170.24
    lat (usec) : 750=0.01%, 1000=3.55%
    lat (msec) : 2=5.69%, 4=16.50%, 10=60.44%, 20=13.39%, 50=0.42%
    lat (msec) : 100=0.01%
  cpu          : usr=0.47%, sys=1.44%, ctx=135227, majf=0, minf=36
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=71613/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: io=286452KB, aggrb=4773KB/s, minb=4773KB/s, maxb=4773KB/s, mint=60009msec, maxt=60009msec




fio ./fio.jb 
workload: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8
fio-2.2.8
Starting 1 thread
Jobs: 1 (f=4): [r(1)] [100.0% done] [4760KB/0KB/0KB /s] [1190/0/0 iops] [eta 00m:00s]
workload: (groupid=0, jobs=1): err= 0: pid=1522: Thu Jul 20 03:54:39 2017
  read : io=288976KB, bw=4815.5KB/s, iops=1203, runt= 60010msec
    slat (usec): min=9, max=3992, avg=98.81, stdev=60.45
    clat (usec): min=729, max=69517, avg=6544.02, stdev=3579.35
     lat (usec): min=798, max=69649, avg=6643.10, stdev=3579.86
    clat percentiles (usec):
     |  1.00th=[  908],  5.00th=[ 1048], 10.00th=[ 2128], 20.00th=[ 3344],
     | 30.00th=[ 4384], 40.00th=[ 5408], 50.00th=[ 6432], 60.00th=[ 7392],
     | 70.00th=[ 8384], 80.00th=[ 9408], 90.00th=[10304], 95.00th=[12096],
     | 99.00th=[17280], 99.50th=[19328], 99.90th=[25472], 99.95th=[27008],
     | 99.99th=[34560]
    bw (KB  /s): min= 4309, max= 5160, per=100.00%, avg=4821.04, stdev=153.25
    lat (usec) : 750=0.01%, 1000=3.76%
    lat (msec) : 2=5.60%, 4=16.74%, 10=60.59%, 20=12.87%, 50=0.43%
    lat (msec) : 100=0.01%
  cpu          : usr=0.44%, sys=1.47%, ctx=136676, majf=0, minf=36
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=72244/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: io=288976KB, aggrb=4815KB/s, minb=4815KB/s, maxb=4815KB/s, mint=60010msec, maxt=60010msec



>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Below is the data after apply the patch 

>>>>>>>>>>>>>>>>>>>>


fio ./fio.jb 
workload: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8
fio-2.2.8
Starting 1 thread
workload: Laying out IO file(s) (4 file(s) / 16384MB)
Jobs: 1 (f=4): [r(1)] [100.0% done] [5272KB/0KB/0KB /s] [1318/0/0 iops] [eta 00m:00s]
workload: (groupid=0, jobs=1): err= 0: pid=2506: Thu Jul 20 06:28:18 2017
  read : io=301872KB, bw=5030.7KB/s, iops=1257, runt= 60007msec
    slat (usec): min=10, max=517328, avg=112.60, stdev=1893.49
    clat (usec): min=708, max=1065.1K, avg=6246.28, stdev=14142.49
     lat (usec): min=769, max=1066.2K, avg=6359.34, stdev=14274.78
    clat percentiles (usec):
     |  1.00th=[  868],  5.00th=[  980], 10.00th=[ 1128], 20.00th=[ 2928],
     | 30.00th=[ 3920], 40.00th=[ 4960], 50.00th=[ 5920], 60.00th=[ 6880],
     | 70.00th=[ 7904], 80.00th=[ 8896], 90.00th=[ 9792], 95.00th=[10432],
     | 99.00th=[16192], 99.50th=[18816], 99.90th=[52992], 99.95th=[116224],
     | 99.99th=[937984]
    bw (KB  /s): min=   11, max= 5616, per=100.00%, avg=5112.72, stdev=786.53
    lat (usec) : 750=0.02%, 1000=5.66%
    lat (msec) : 2=6.52%, 4=18.52%, 10=61.24%, 20=7.61%, 50=0.33%
    lat (msec) : 100=0.04%, 250=0.04%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2000=0.01%
  cpu          : usr=0.44%, sys=1.84%, ctx=141784, majf=0, minf=34
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=75468/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: io=301872KB, aggrb=5030KB/s, minb=5030KB/s, maxb=5030KB/s, mint=60007msec, maxt=60007msec



fio ./fio.jb 
workload: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8
fio-2.2.8
Starting 1 thread
Jobs: 1 (f=4): [r(1)] [100.0% done] [5000KB/0KB/0KB /s] [1250/0/0 iops] [eta 00m:00s]
workload: (groupid=0, jobs=1): err= 0: pid=2562: Thu Jul 20 06:30:17 2017
  read : io=314556KB, bw=5241.2KB/s, iops=1310, runt= 60007msec
    slat (usec): min=14, max=2209, avg=91.95, stdev=65.24
    clat (usec): min=726, max=83821, avg=6009.90, stdev=3444.55
     lat (usec): min=776, max=83904, avg=6102.16, stdev=3445.70
    clat percentiles (usec):
     |  1.00th=[  884],  5.00th=[  988], 10.00th=[ 1144], 20.00th=[ 2928],
     | 30.00th=[ 3952], 40.00th=[ 4960], 50.00th=[ 5920], 60.00th=[ 6944],
     | 70.00th=[ 7904], 80.00th=[ 8896], 90.00th=[ 9792], 95.00th=[10432],
     | 99.00th=[16192], 99.50th=[18560], 99.90th=[25728], 99.95th=[30336],
     | 99.99th=[58624]
    bw (KB  /s): min= 4771, max= 5636, per=100.00%, avg=5248.08, stdev=172.91
    lat (usec) : 750=0.01%, 1000=5.33%
    lat (msec) : 2=6.89%, 4=18.26%, 10=61.35%, 20=7.75%, 50=0.38%
    lat (msec) : 100=0.01%
  cpu          : usr=0.55%, sys=1.81%, ctx=147896, majf=0, minf=36
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=78639/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: io=314556KB, aggrb=5241KB/s, minb=5241KB/s, maxb=5241KB/s, mint=60007msec, maxt=60007msec



>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

I used below configuration as a job file[global]
ioengine=libaio
#unified_rw_reporting=1
randrepeat=1
norandommap=1
group_reporting
direct=1
runtime=60
thread
size=16g
[workload]
bs=4k
rw=randread
iodepth=8
numjobs=1
file_service_type=random
filename=/mnt/glusterfs/perf5/iotest/fio_5
filename=/mnt/glusterfs/perf6/iotest/fio_6
filename=/mnt/glusterfs/perf7/iotest/fio_7
filename=/mnt/glusterfs/perf8/iotest/fio_8

Comment 12 Mohit Agrawal 2017-07-24 05:46:23 UTC

Hi,

I used below volume configuration to test fio for patch https://review.gluster.org/#/c/7676/

Volume Name: dist-repl
Type: Distributed-Replicate
Volume ID: bf0834c6-c315-456d-9ea1-d9bcd8c482a2
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: <host-name1>:/mnt/sdb1/brick1
Brick2: <host-name2>:/mnt/sdb1/brick1
Brick3: <host-name3>:/mnt/sdb1/brick1
Brick4: <host-name4>:/mnt/sdb1/brick2
Brick5: <host-name5>:/mnt/sdb1/brick2
Brick6: <host-name6>:/mnt/sdb1/brick2
Options Reconfigured:
performance.read-ahead-enable-strided: on
performance.strict-o-direct: on
transport.address-family: inet
nfs.disable: on

Regards
Mohit Agrawal

Comment 13 Raghavendra G 2017-07-24 07:01:44 UTC

I had a discussion with Manoj on the numbers got by Mohit. Things puzzling to me and Manoj are:

* fio is run in O_DIRECT mode, which means read-ahead is turned off
*  we've a bug where read-ahead won't work when open-behind is turned on [2]. Since open-behind is on by default, read-ahead should not be functioning at all.

Since we've two reasons to believe read-ahead is switched off, how are we seeing improvement in benchmarks?

Another point to note is that the jobfile for fio Mohit used doesn't exactly generate strided reads. Instead it just generates random reads. Manoj suggested following to generate strided reads with fio

<mpillai>       raghug, there seems to a straighforward way of generating strided pattern with fio. you'd use sequential io instead of rand, but give a stride. e.g. the option rw=read means seq read. rw=read:128k means read with a stride of 128k. that's what a quick read of the howto is suggesting.

Also, note that patch [1] doesn't exactly implement an optimization for strided read pattern (If it was, it had to be comparing delta's of previous reads with current read). It just keeps the cache if current read-region is already cached by read-ahead. So, we need to decide on whether to have more stringent strided read-ahead implementation or its beneficial to just keep the implementation of [1].

[1] https://review.gluster.org/#/c/7676
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1084508

Comment 14 Yaniv Kaul 2019-04-24 09:15:14 UTC

Status?

Comment 15 Csaba Henk 2019-04-26 17:13:13 UTC

I'm suggesting to close it as dupe of BZ 1676479. Asking Raghavendra for ack.

Comment 16 Raghavendra G 2019-04-28 01:06:35 UTC

(In reply to Csaba Henk from comment #15)
> I'm suggesting to close it as dupe of BZ 1676479. Asking Raghavendra for ack.

I agree. Gluster read-ahead at best is redundant. Kernel read-ahead is more intelligent [1]. Since this bug is on fuse-mount, disabling read-ahead will introduce no regression.

[1] https://lwn.net/Articles/155510/

Comment 17 Csaba Henk 2019-04-29 13:20:17 UTC


*** This bug has been marked as a duplicate of bug 1676479 ***

Note You need to log in before you can comment on or make changes to this bug.