Description of problem: Turning on read-ahead results in poor random read performance from fio Version-Release number of selected component (if applicable): 3.4 How reproducible: Steps to Reproduce: 1. Turn on read-ahead gluster volume set vol performance.read-ahead-page-count 16 2. Run fio random read workload fio --rw=randread --bs=1m --size=4g --runtime=30 --numjobs=1 --group_reporting --name=/8 --ioengine=gfapi --volume=vol --brick=bd-vm The throughput is 186MB/s Actual results: Turning off read ahead and run the workload again, I get 350MB/s Expected results: Read-ahead should not slow down random read performance. Additional info:
REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#1) for review on master by Huamin Chen (hchen)
REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#2) for review on master by Huamin Chen (hchen)
REVIEW: http://review.gluster.org/7927 (response to Jeff Darcy's review. 1) change tuning param name to performance.read-ahead-enable-even-noncontiguous. When set to 1, read-ahead is enabled even for noncontiguous IO; when set to 0 (default), read-ahead is disabled when noncontiguity is detected. 2) the end of file offset is conditionally set) posted (#1) for review on master by Huamin Chen (hchen)
REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#3) for review on master by Huamin Chen (hchen)
REVIEW: http://review.gluster.org/7958 (response to Jeff Darcy's review. 1) change tuning param name to performance.read-ahead-enable-even-noncontiguous. When set to 1, read-ahead is enabled even for noncontiguous IO; when set to 0 (default), read-ahead is disabled when noncontiguity is detected. 2) the end of file offset is conditionally set) posted (#1) for review on master by Huamin Chen (hchen)
REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#4) for review on master by Huamin Chen (hchen)
REVIEW: http://review.gluster.org/7676 (add option to 1) keep readahead for straddle read; 2) disable readahead otherwise) posted (#6) for review on master by Raghavendra G (rgowdapp)
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
REVIEW: https://review.gluster.org/7676 (performance/read-ahead: Enable read-ahead for strided reads) posted (#7) for review on master by Raghavendra G (rgowdapp)
REVIEW: https://review.gluster.org/7676 (performance/read-ahead: Enable read-ahead for strided reads) posted (#8) for review on master by Raghavendra G (rgowdapp)
Below is the fio output without patch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fio ./fio.jb workload: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8 fio-2.2.8 Starting 1 thread Jobs: 1 (f=4): [r(1)] [100.0% done] [4544KB/0KB/0KB /s] [1136/0/0 iops] [eta 00m:00s] workload: (groupid=0, jobs=1): err= 0: pid=1505: Thu Jul 20 03:53:14 2017 read : io=286452KB, bw=4773.5KB/s, iops=1193, runt= 60009msec slat (usec): min=15, max=6882, avg=101.54, stdev=68.74 clat (usec): min=726, max=92017, avg=6599.76, stdev=3636.11 lat (usec): min=793, max=92121, avg=6701.56, stdev=3636.42 clat percentiles (usec): | 1.00th=[ 924], 5.00th=[ 1048], 10.00th=[ 2160], 20.00th=[ 3376], | 30.00th=[ 4448], 40.00th=[ 5472], 50.00th=[ 6496], 60.00th=[ 7456], | 70.00th=[ 8512], 80.00th=[ 9408], 90.00th=[10432], 95.00th=[12224], | 99.00th=[17536], 99.50th=[19328], 99.90th=[25472], 99.95th=[28032], | 99.99th=[49408] bw (KB /s): min= 3944, max= 5160, per=100.00%, avg=4778.32, stdev=170.24 lat (usec) : 750=0.01%, 1000=3.55% lat (msec) : 2=5.69%, 4=16.50%, 10=60.44%, 20=13.39%, 50=0.42% lat (msec) : 100=0.01% cpu : usr=0.47%, sys=1.44%, ctx=135227, majf=0, minf=36 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=71613/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=8 Run status group 0 (all jobs): READ: io=286452KB, aggrb=4773KB/s, minb=4773KB/s, maxb=4773KB/s, mint=60009msec, maxt=60009msec fio ./fio.jb workload: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8 fio-2.2.8 Starting 1 thread Jobs: 1 (f=4): [r(1)] [100.0% done] [4760KB/0KB/0KB /s] [1190/0/0 iops] [eta 00m:00s] workload: (groupid=0, jobs=1): err= 0: pid=1522: Thu Jul 20 03:54:39 2017 read : io=288976KB, bw=4815.5KB/s, iops=1203, runt= 60010msec slat (usec): min=9, max=3992, avg=98.81, stdev=60.45 clat (usec): min=729, max=69517, avg=6544.02, stdev=3579.35 lat (usec): min=798, max=69649, avg=6643.10, stdev=3579.86 clat percentiles (usec): | 1.00th=[ 908], 5.00th=[ 1048], 10.00th=[ 2128], 20.00th=[ 3344], | 30.00th=[ 4384], 40.00th=[ 5408], 50.00th=[ 6432], 60.00th=[ 7392], | 70.00th=[ 8384], 80.00th=[ 9408], 90.00th=[10304], 95.00th=[12096], | 99.00th=[17280], 99.50th=[19328], 99.90th=[25472], 99.95th=[27008], | 99.99th=[34560] bw (KB /s): min= 4309, max= 5160, per=100.00%, avg=4821.04, stdev=153.25 lat (usec) : 750=0.01%, 1000=3.76% lat (msec) : 2=5.60%, 4=16.74%, 10=60.59%, 20=12.87%, 50=0.43% lat (msec) : 100=0.01% cpu : usr=0.44%, sys=1.47%, ctx=136676, majf=0, minf=36 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=72244/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=8 Run status group 0 (all jobs): READ: io=288976KB, aggrb=4815KB/s, minb=4815KB/s, maxb=4815KB/s, mint=60010msec, maxt=60010msec >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Below is the data after apply the patch >>>>>>>>>>>>>>>>>>>> fio ./fio.jb workload: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8 fio-2.2.8 Starting 1 thread workload: Laying out IO file(s) (4 file(s) / 16384MB) Jobs: 1 (f=4): [r(1)] [100.0% done] [5272KB/0KB/0KB /s] [1318/0/0 iops] [eta 00m:00s] workload: (groupid=0, jobs=1): err= 0: pid=2506: Thu Jul 20 06:28:18 2017 read : io=301872KB, bw=5030.7KB/s, iops=1257, runt= 60007msec slat (usec): min=10, max=517328, avg=112.60, stdev=1893.49 clat (usec): min=708, max=1065.1K, avg=6246.28, stdev=14142.49 lat (usec): min=769, max=1066.2K, avg=6359.34, stdev=14274.78 clat percentiles (usec): | 1.00th=[ 868], 5.00th=[ 980], 10.00th=[ 1128], 20.00th=[ 2928], | 30.00th=[ 3920], 40.00th=[ 4960], 50.00th=[ 5920], 60.00th=[ 6880], | 70.00th=[ 7904], 80.00th=[ 8896], 90.00th=[ 9792], 95.00th=[10432], | 99.00th=[16192], 99.50th=[18816], 99.90th=[52992], 99.95th=[116224], | 99.99th=[937984] bw (KB /s): min= 11, max= 5616, per=100.00%, avg=5112.72, stdev=786.53 lat (usec) : 750=0.02%, 1000=5.66% lat (msec) : 2=6.52%, 4=18.52%, 10=61.24%, 20=7.61%, 50=0.33% lat (msec) : 100=0.04%, 250=0.04%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2000=0.01% cpu : usr=0.44%, sys=1.84%, ctx=141784, majf=0, minf=34 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=75468/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=8 Run status group 0 (all jobs): READ: io=301872KB, aggrb=5030KB/s, minb=5030KB/s, maxb=5030KB/s, mint=60007msec, maxt=60007msec fio ./fio.jb workload: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8 fio-2.2.8 Starting 1 thread Jobs: 1 (f=4): [r(1)] [100.0% done] [5000KB/0KB/0KB /s] [1250/0/0 iops] [eta 00m:00s] workload: (groupid=0, jobs=1): err= 0: pid=2562: Thu Jul 20 06:30:17 2017 read : io=314556KB, bw=5241.2KB/s, iops=1310, runt= 60007msec slat (usec): min=14, max=2209, avg=91.95, stdev=65.24 clat (usec): min=726, max=83821, avg=6009.90, stdev=3444.55 lat (usec): min=776, max=83904, avg=6102.16, stdev=3445.70 clat percentiles (usec): | 1.00th=[ 884], 5.00th=[ 988], 10.00th=[ 1144], 20.00th=[ 2928], | 30.00th=[ 3952], 40.00th=[ 4960], 50.00th=[ 5920], 60.00th=[ 6944], | 70.00th=[ 7904], 80.00th=[ 8896], 90.00th=[ 9792], 95.00th=[10432], | 99.00th=[16192], 99.50th=[18560], 99.90th=[25728], 99.95th=[30336], | 99.99th=[58624] bw (KB /s): min= 4771, max= 5636, per=100.00%, avg=5248.08, stdev=172.91 lat (usec) : 750=0.01%, 1000=5.33% lat (msec) : 2=6.89%, 4=18.26%, 10=61.35%, 20=7.75%, 50=0.38% lat (msec) : 100=0.01% cpu : usr=0.55%, sys=1.81%, ctx=147896, majf=0, minf=36 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=78639/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=8 Run status group 0 (all jobs): READ: io=314556KB, aggrb=5241KB/s, minb=5241KB/s, maxb=5241KB/s, mint=60007msec, maxt=60007msec >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I used below configuration as a job file[global] ioengine=libaio #unified_rw_reporting=1 randrepeat=1 norandommap=1 group_reporting direct=1 runtime=60 thread size=16g [workload] bs=4k rw=randread iodepth=8 numjobs=1 file_service_type=random filename=/mnt/glusterfs/perf5/iotest/fio_5 filename=/mnt/glusterfs/perf6/iotest/fio_6 filename=/mnt/glusterfs/perf7/iotest/fio_7 filename=/mnt/glusterfs/perf8/iotest/fio_8
Hi, I used below volume configuration to test fio for patch https://review.gluster.org/#/c/7676/ Volume Name: dist-repl Type: Distributed-Replicate Volume ID: bf0834c6-c315-456d-9ea1-d9bcd8c482a2 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: <host-name1>:/mnt/sdb1/brick1 Brick2: <host-name2>:/mnt/sdb1/brick1 Brick3: <host-name3>:/mnt/sdb1/brick1 Brick4: <host-name4>:/mnt/sdb1/brick2 Brick5: <host-name5>:/mnt/sdb1/brick2 Brick6: <host-name6>:/mnt/sdb1/brick2 Options Reconfigured: performance.read-ahead-enable-strided: on performance.strict-o-direct: on transport.address-family: inet nfs.disable: on Regards Mohit Agrawal
I had a discussion with Manoj on the numbers got by Mohit. Things puzzling to me and Manoj are: * fio is run in O_DIRECT mode, which means read-ahead is turned off * we've a bug where read-ahead won't work when open-behind is turned on [2]. Since open-behind is on by default, read-ahead should not be functioning at all. Since we've two reasons to believe read-ahead is switched off, how are we seeing improvement in benchmarks? Another point to note is that the jobfile for fio Mohit used doesn't exactly generate strided reads. Instead it just generates random reads. Manoj suggested following to generate strided reads with fio <mpillai> raghug, there seems to a straighforward way of generating strided pattern with fio. you'd use sequential io instead of rand, but give a stride. e.g. the option rw=read means seq read. rw=read:128k means read with a stride of 128k. that's what a quick read of the howto is suggesting. Also, note that patch [1] doesn't exactly implement an optimization for strided read pattern (If it was, it had to be comparing delta's of previous reads with current read). It just keeps the cache if current read-region is already cached by read-ahead. So, we need to decide on whether to have more stringent strided read-ahead implementation or its beneficial to just keep the implementation of [1]. [1] https://review.gluster.org/#/c/7676 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1084508
Status?
I'm suggesting to close it as dupe of BZ 1676479. Asking Raghavendra for ack.
(In reply to Csaba Henk from comment #15) > I'm suggesting to close it as dupe of BZ 1676479. Asking Raghavendra for ack. I agree. Gluster read-ahead at best is redundant. Kernel read-ahead is more intelligent [1]. Since this bug is on fuse-mount, disabling read-ahead will introduce no regression. [1] https://lwn.net/Articles/155510/
*** This bug has been marked as a duplicate of bug 1676479 ***