Bug 1455992 - Poor performance on gluster block device
Summary: Poor performance on gluster block device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tcmu-runner
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.3.0
Assignee: Pranith Kumar K
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-05-26 15:13 UTC by Manoj Pillai
Modified: 2017-09-21 04:19 UTC (History)
12 users (show)

Fixed In Version: tcmu-runner-1.2.0-10.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-21 04:19:33 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2773 0 normal SHIPPED_LIVE new packages: gluster-block 2017-09-21 08:16:22 UTC

Description Manoj Pillai 2017-05-26 15:13:34 UTC
Description of problem:
On an iozone sequential I/O test, an XFS file system on gluster-block device performs very poorly compared to gluster-fuse and gluster-nfs

Version-Release number of selected component (if applicable):
glusterfs-3.8.4-25.el7rhgs.x86_64

How reproducible:
Consistently

Steps to Reproduce:
1. single brick gluster volume created on an NVMe SSD. 
In the gluster-block case, create a block device of 100GiB (ha1). On client, create XFS filesystem on discovered gluster-block device and mount.

For gluster-fuse and gluster-nfs case, mount the gluster volume on the client.

Client connected via 10GbE.

2. Run an iozone test with the following parameters:
iozone -+m <conf_file> -i 0 -w -+n -c -C -e -s 4g -r 1024k -t 8
<drop caches>
iozone -+m <conf_file> -i 1 -w -+n -c -C -e -s 4g -r 1024k -t 8

[this iozone command is adapted from a multi-client test. It is being run on a single client here]

Actual results:

gluster-block:
        Children see throughput for  8 initial writers  =   87671.01 kB/sec
        Children see throughput for  8 readers          =  133351.47 kB/sec

gluster-fuse:
        Children see throughput for  8 initial writers  =  939760.11 kB/sec
        Children see throughput for  8 readers          =  791956.57 kB/sec

gluster-nfs:
        Children see throughput for  8 initial writers  =  989822.15 kB/sec
        Children see throughput for  8 readers          = 1203338.91 kB/sec

Expected results:
On this test gluster-block should be able to perform close to gluster-fuse and gluster-nfs

Additional info:
Varying the number of iozone threads does not have any significant impact on gluster-block performance. This could indicate too few "io threads", lock contention or some throttling.

Increasing read-ahead on the discovered block device on the client from the default of 128KB to 4MB did not improve read performance.

Comment 6 Manoj Pillai 2017-06-16 11:51:07 UTC
I see poor performance with gluster-block in random I/O tests as well.

For the same setup as in comment #0

> 1. single brick gluster volume created on an NVMe SSD. 
> In the gluster-block case, create a block device of 100GiB (ha1). On client,
> create XFS filesystem on discovered gluster-block device and mount.
> 
> For gluster-fuse and gluster-nfs case, mount the gluster volume on the
> client.
> 
> Client connected via 10GbE.

I'm skipping the gluster-nfs comparison for random I/O.

fio job keeps total size at 80g and data accessed at 5%. Number of threads (jobs) and other parameters vary. 

Here's the job file for 8 threads, 64k access size, random write:

[global]
rw=randwrite
end_fsync=1
bs=64k
startdelay=0
ioengine=sync
direct=1

[lgf-randwrite]
directory=/mnt/glustervol/${HOSTNAME}
filename_format=f.$jobnum.$filenum
numjobs=8
nrfiles=1
openfiles=1
size=10g
io_size=512m

Results for the above case:
Fuse         : write: io=4096.0MB, bw=792275K/s
gluster-block: write: io=4096.0MB, bw=85010K/s

For 8 threads, 8k access size, random write:
Fuse         : write: io=4096.0MB, bw=145499K/s
gluster-block: write: io=4096.0MB, bw=13301K/s

For 1 thread, 8k access size, random write:
Fuse         : write: io=4096.0MB, bw=33514KB/s
gluster-block: write: io=4096.0MB, bw=10261KB/s

Also, latency is much higher for gluster-block. Looking at "clat percentiles (usec):" reported:

For 1 thread, 8k access size, random write:
Fuse         : 50.00th=[  223] 95.00th=[  290]
gluster-block: 50.00th=[  764] 95.00th=[  964]

For 8threads, 8k access size, random write:
Fuse         : 50.00th=[  434] 95.00th=[  502]
gluster-block: 50.00th=[ 4768] 95.00th=[ 5600]

Latency shoots up for multi-threaded writes with gluster-block. That is consistent with the hypothesis in comment #0 about serialization/contention being the cause of poor block performance.

Comment 8 Pranith Kumar K 2017-06-24 09:06:09 UTC
Manoj,
    gluster-block is a file opened on gluster volume using O_DIRECT+O_SYNC. These are the results I get when I do a quick test on my laptop with different options on a plain distribute volume:

root@dhcp35-190 - /mnt/d1 
14:23:00 :) ⚡ dd if=/dev/zero of=1 bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.141712 s, 740 MB/s

root@dhcp35-190 - /mnt/d1 
14:23:26 :) ⚡ dd if=/dev/zero of=1 bs=1M count=100 oflag=sync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 11.0294 s, 9.5 MB/s

root@dhcp35-190 - /mnt/d1 
14:23:42 :) ⚡ dd if=/dev/zero of=1 bs=1M count=100 oflag=direct
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.71891 s, 38.6 MB/s

So I think the first thing to do here would be to figure out what flags would satisfy the requirements for block but not put extra pressure on storage.

Will update with more details once I find something better.

Pranith

Comment 9 Pranith Kumar K 2017-06-24 09:46:37 UTC
Manoj,
   I tried to do same dd inside the gluster-block and I get 1.6GBps which is twice the speed I got from normal fuse mount. I guess I will need to do debugging on your machine itself. Is that possible?

Pranith

Comment 10 Manoj Pillai 2017-06-26 18:26:01 UTC
(In reply to Pranith Kumar K from comment #8)
[...]
> So I think the first thing to do here would be to figure out what flags
> would satisfy the requirements for block but not put extra pressure on
> storage.
> 

My main concern in comment #0 and comment #6 is that gluster-block throughput does not increase when I increase the no. of threads doing I/O. Repeating some of the results from comment #6 here:

For 8 threads, 8k access size, random write:
Fuse         : write: io=4096.0MB, bw=145499K/s
gluster-block: write: io=4096.0MB, bw=13301K/s

For 1 thread, 8k access size, random write:
Fuse         : write: io=4096.0MB, bw=33514KB/s
gluster-block: write: io=4096.0MB, bw=10261KB/s

fuse throughput jumps almost 5x going from single-threaded to 8 threads. gluster-block barely budges. Why? Does the open flags explain this behavior? I don't see that it does.

Comment 11 Manoj Pillai 2017-06-26 18:30:01 UTC
(In reply to Pranith Kumar K from comment #9)
> Manoj,
>    I tried to do same dd inside the gluster-block and I get 1.6GBps which is
> twice the speed I got from normal fuse mount. I guess I will need to do
> debugging on your machine itself. Is that possible?
> 
> Pranith

Let's try to get you a separate setup. I'm moving to openshift-based testing to focus on the real workloads we are targeting this feature with. This is not a great time for me to be sharing my setup. I'll send you the link for bagl lab reservations so you can request a small allocation.

Comment 12 Pranith Kumar K 2017-06-27 14:17:56 UTC
Manoj,
    So here are some results that Vijay and I found testing it with and without O_SYNC.

fio comparison

gluster-block with O_SYNC

gf-randwrite: (g=0): rw=randwrite, bs=64K-64K/64K-64K/64K-64K, ioengine=sync, iodepth=1
...
fio-2.2.8
Starting 8 processes
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
Jobs: 8 (f=8): [F(8)] [100.0% done] [0KB/512KB/0KB /s] [0/8/0 iops] [eta 00m:00s]  
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12860: Tue Jun 27 07:27:17 2017
  write: io=524288KB, bw=737545B/s, iops=11, runt=727916msec
    clat (msec): min=23, max=416, avg=88.69, stdev=36.49
     lat (msec): min=23, max=416, avg=88.69, stdev=36.49
    clat percentiles (msec):
     |  1.00th=[   34],  5.00th=[   42], 10.00th=[   43], 20.00th=[   68],
     | 30.00th=[   76], 40.00th=[   84], 50.00th=[   92], 60.00th=[  100],
     | 70.00th=[  100], 80.00th=[  109], 90.00th=[  109], 95.00th=[  117],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  375], 99.95th=[  383],
     | 99.99th=[  416]
    bw (KB  /s): min=  172, max= 1109, per=12.62%, avg=726.91, stdev=132.69
    lat (msec) : 50=11.84%, 100=60.82%, 250=25.94%, 500=1.40%
  cpu          : usr=0.01%, sys=0.11%, ctx=8273, majf=0, minf=32
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12861: Tue Jun 27 07:27:17 2017
  write: io=524288KB, bw=737545B/s, iops=11, runt=727916msec
    clat (msec): min=24, max=416, avg=88.69, stdev=36.50
     lat (msec): min=24, max=416, avg=88.69, stdev=36.50
    clat percentiles (msec):
     |  1.00th=[   34],  5.00th=[   42], 10.00th=[   43], 20.00th=[   68],
     | 30.00th=[   76], 40.00th=[   84], 50.00th=[   92], 60.00th=[  100],
     | 70.00th=[  100], 80.00th=[  109], 90.00th=[  109], 95.00th=[  117],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  375], 99.95th=[  383],
     | 99.99th=[  416]
    bw (KB  /s): min=  172, max= 1109, per=12.61%, avg=726.86, stdev=133.47
    lat (msec) : 50=11.88%, 100=60.39%, 250=26.34%, 500=1.39%
  cpu          : usr=0.02%, sys=0.11%, ctx=8276, majf=0, minf=31
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12862: Tue Jun 27 07:27:17 2017
  write: io=524288KB, bw=737545B/s, iops=11, runt=727916msec
    clat (msec): min=23, max=416, avg=88.69, stdev=36.49
     lat (msec): min=23, max=416, avg=88.69, stdev=36.49
    clat percentiles (msec):
     |  1.00th=[   34],  5.00th=[   42], 10.00th=[   43], 20.00th=[   68],
     | 30.00th=[   76], 40.00th=[   84], 50.00th=[   92], 60.00th=[  100],
     | 70.00th=[  100], 80.00th=[  109], 90.00th=[  109], 95.00th=[  117],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  375], 99.95th=[  383],
     | 99.99th=[  416]
    bw (KB  /s): min=  172, max= 1109, per=12.61%, avg=726.63, stdev=133.45
    lat (msec) : 50=11.84%, 100=60.42%, 250=26.33%, 500=1.40%
  cpu          : usr=0.01%, sys=0.11%, ctx=8272, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12863: Tue Jun 27 07:27:17 2017
  write: io=524288KB, bw=737545B/s, iops=11, runt=727916msec
    clat (msec): min=23, max=416, avg=88.69, stdev=36.49
     lat (msec): min=23, max=416, avg=88.69, stdev=36.49
    clat percentiles (msec):
     |  1.00th=[   34],  5.00th=[   42], 10.00th=[   43], 20.00th=[   68],
     | 30.00th=[   76], 40.00th=[   84], 50.00th=[   92], 60.00th=[  100],
     | 70.00th=[  100], 80.00th=[  109], 90.00th=[  109], 95.00th=[  117],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  375], 99.95th=[  383],
     | 99.99th=[  416]
    bw (KB  /s): min=  172, max= 1116, per=12.61%, avg=726.82, stdev=133.62
    lat (msec) : 50=11.84%, 100=60.30%, 250=26.46%, 500=1.39%
  cpu          : usr=0.01%, sys=0.10%, ctx=8275, majf=0, minf=32
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12864: Tue Jun 27 07:27:17 2017
  write: io=524288KB, bw=737545B/s, iops=11, runt=727916msec
    clat (msec): min=23, max=416, avg=88.69, stdev=36.49
     lat (msec): min=23, max=416, avg=88.69, stdev=36.49
    clat percentiles (msec):
     |  1.00th=[   34],  5.00th=[   42], 10.00th=[   43], 20.00th=[   68],
     | 30.00th=[   76], 40.00th=[   84], 50.00th=[   92], 60.00th=[  100],
     | 70.00th=[  100], 80.00th=[  109], 90.00th=[  109], 95.00th=[  117],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  375], 99.95th=[  383],
     | 99.99th=[  416]
    bw (KB  /s): min=  172, max= 1080, per=12.61%, avg=726.64, stdev=134.92
    lat (msec) : 50=11.84%, 100=60.67%, 250=26.10%, 500=1.39%
  cpu          : usr=0.01%, sys=0.11%, ctx=8272, majf=0, minf=32
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12865: Tue Jun 27 07:27:17 2017
  write: io=524288KB, bw=737545B/s, iops=11, runt=727916msec
    clat (msec): min=24, max=416, avg=88.69, stdev=36.49
     lat (msec): min=24, max=416, avg=88.69, stdev=36.49
    clat percentiles (msec):
     |  1.00th=[   34],  5.00th=[   42], 10.00th=[   43], 20.00th=[   68],
     | 30.00th=[   76], 40.00th=[   84], 50.00th=[   92], 60.00th=[  100],
     | 70.00th=[  100], 80.00th=[  109], 90.00th=[  109], 95.00th=[  117],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  375], 99.95th=[  383],
     | 99.99th=[  416]
    bw (KB  /s): min=  172, max= 1109, per=12.61%, avg=726.72, stdev=132.65
    lat (msec) : 50=11.83%, 100=60.63%, 250=26.15%, 500=1.39%
  cpu          : usr=0.01%, sys=0.10%, ctx=8279, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12866: Tue Jun 27 07:27:17 2017
  write: io=524288KB, bw=737545B/s, iops=11, runt=727916msec
    clat (msec): min=23, max=416, avg=88.69, stdev=36.49
     lat (msec): min=23, max=416, avg=88.69, stdev=36.49
    clat percentiles (msec):
     |  1.00th=[   34],  5.00th=[   42], 10.00th=[   43], 20.00th=[   68],
     | 30.00th=[   76], 40.00th=[   84], 50.00th=[   92], 60.00th=[  100],
     | 70.00th=[  100], 80.00th=[  109], 90.00th=[  109], 95.00th=[  117],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  375], 99.95th=[  383],
     | 99.99th=[  416]
    bw (KB  /s): min=  172, max= 1066, per=12.61%, avg=726.62, stdev=132.42
    lat (msec) : 50=11.83%, 100=60.44%, 250=26.34%, 500=1.39%
  cpu          : usr=0.01%, sys=0.11%, ctx=8275, majf=0, minf=31
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12867: Tue Jun 27 07:27:17 2017
  write: io=524288KB, bw=737545B/s, iops=11, runt=727916msec
    clat (msec): min=23, max=416, avg=88.69, stdev=36.49
     lat (msec): min=23, max=416, avg=88.69, stdev=36.49
    clat percentiles (msec):
     |  1.00th=[   34],  5.00th=[   42], 10.00th=[   43], 20.00th=[   68],
     | 30.00th=[   76], 40.00th=[   84], 50.00th=[   92], 60.00th=[  100],
     | 70.00th=[  100], 80.00th=[  109], 90.00th=[  109], 95.00th=[  117],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  375], 99.95th=[  383],
     | 99.99th=[  416]
    bw (KB  /s): min=  172, max= 1080, per=12.61%, avg=726.52, stdev=134.76
    lat (msec) : 50=11.85%, 100=60.46%, 250=26.29%, 500=1.39%
  cpu          : usr=0.01%, sys=0.10%, ctx=8272, majf=0, minf=31
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=5762KB/s, minb=720KB/s, maxb=720KB/s, mint=727916msec, maxt=727916msec

Disk stats (read/write):
  sdb: ios=0/66577, merge=0/1, ticks=0/6116759, in_queue=6116766, util=100.00%

==========================================================================

without O_SYNC and just O_DIRECT

fio ./random.cfg 
lgf-randwrite: (g=0): rw=randwrite, bs=64K-64K/64K-64K/64K-64K, ioengine=sync, iodepth=1
...
fio-2.2.8
Starting 8 processes
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
Jobs: 7 (f=7): [w(2),_(1),w(5)] [91.5% done] [0KB/21248KB/0KB /s] [0/332/0 iops] [eta 00m:1Jobs: 7 (f=7): [w(2),_(1),w(5)] [92.0% done] [0KB/19200KB/0KB /s] [0/300/0 iops] [eta 00m:1Jobs: 7 (f=7): [w(2),_(1),w(5)] [92.6% done] [0KB/19136KB/0KB /s] [0/299/0 iops] [eta 00m:1Jobs: 7 (f=7): [w(2),_(1),w(5)] [93.1% done] [0KB/20288KB/0KB /s] [0/317/0 iops] [eta 00m:1Jobs: 7 (f=7): [w(2),_(1),w(5)] [93.6% done] [0KB/20544KB/0KB /s] [0/321/0 iops] [eta 00m:1Jobs: 7 (f=7): [w(2),_(1),w(5)] [94.1% done] [0KB/21824KB/0KB /s] [0/341/0 iops] [eta 00m:1Jobs: 6 (f=6): [w(2),_(1),w(1),_(1),w(3)] [94.7% done] [0KB/21312KB/0KB /s] [0/333/0 iops] Jobs: 6 (f=6): [w(2),_(1),w(1),_(1),w(3)] [95.2% done] [0KB/23552KB/0KB /s] [0/368/0 iops] Jobs: 5 (f=5): [w(2),_(1),w(1),_(1),F(2),_(1)] [96.3% done] [0KB/19968KB/0KB /s] [0/312/0 iJobs: 3 (f=3): [w(1),F(1),_(1),w(1),_(4)] [97.3% done] [0KB/20608KB/0KB /s] [0/322/0 iops] Jobs: 1 (f=1): [_(3),w(1),_(4)] [100.0% done] [0KB/17408KB/0KB /s] [0/272/0 iops] [eta 00m:00s]         
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12902: Tue Jun 27 07:32:49 2017
  write: io=524288KB, bw=2905.7KB/s, iops=45, runt=180439msec
    clat (msec): min=1, max=286, avg=22.01, stdev=14.72
     lat (msec): min=1, max=286, avg=22.02, stdev=14.72
    clat percentiles (usec):
     |  1.00th=[ 1208],  5.00th=[ 1528], 10.00th=[ 2608], 20.00th=[12096],
     | 30.00th=[18048], 40.00th=[21376], 50.00th=[23936], 60.00th=[25984],
     | 70.00th=[27776], 80.00th=[29824], 90.00th=[33024], 95.00th=[36096],
     | 99.00th=[43776], 99.50th=[52480], 99.90th=[207872], 99.95th=[264192],
     | 99.99th=[288768]
    bw (KB  /s): min= 1013, max=10645, per=12.58%, avg=2913.65, stdev=1062.92
    lat (msec) : 2=8.07%, 4=4.63%, 10=5.44%, 20=17.05%, 50=64.27%
    lat (msec) : 100=0.22%, 250=0.23%, 500=0.09%
  cpu          : usr=0.07%, sys=0.44%, ctx=8277, majf=0, minf=32
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12903: Tue Jun 27 07:32:49 2017
  write: io=524288KB, bw=2911.6KB/s, iops=45, runt=180074msec
    clat (msec): min=1, max=287, avg=21.96, stdev=14.74
     lat (msec): min=1, max=287, avg=21.96, stdev=14.74
    clat percentiles (usec):
     |  1.00th=[ 1208],  5.00th=[ 1480], 10.00th=[ 2352], 20.00th=[11840],
     | 30.00th=[18048], 40.00th=[21376], 50.00th=[23936], 60.00th=[25984],
     | 70.00th=[27776], 80.00th=[29824], 90.00th=[33024], 95.00th=[35584],
     | 99.00th=[43776], 99.50th=[51456], 99.90th=[207872], 99.95th=[264192],
     | 99.99th=[288768]
    bw (KB  /s): min= 1097, max=10645, per=12.61%, avg=2919.07, stdev=997.21
    lat (msec) : 2=8.56%, 4=4.35%, 10=5.43%, 20=16.98%, 50=64.16%
    lat (msec) : 100=0.21%, 250=0.23%, 500=0.09%
  cpu          : usr=0.07%, sys=0.44%, ctx=8273, majf=0, minf=32
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12904: Tue Jun 27 07:32:49 2017
  write: io=524288KB, bw=3073.6KB/s, iops=48, runt=170581msec
    clat (msec): min=1, max=274, avg=20.78, stdev=15.05
     lat (msec): min=1, max=274, avg=20.79, stdev=15.05
    clat percentiles (usec):
     |  1.00th=[ 1224],  5.00th=[ 1384], 10.00th=[ 1640], 20.00th=[ 4384],
     | 30.00th=[16064], 40.00th=[20352], 50.00th=[23424], 60.00th=[25728],
     | 70.00th=[27520], 80.00th=[29568], 90.00th=[33024], 95.00th=[35584],
     | 99.00th=[43776], 99.50th=[50944], 99.90th=[195584], 99.95th=[264192],
     | 99.99th=[272384]
    bw (KB  /s): min= 1013, max=10645, per=13.33%, avg=3087.23, stdev=985.85
    lat (msec) : 2=13.48%, 4=4.96%, 10=4.92%, 20=15.52%, 50=60.62%
    lat (msec) : 100=0.22%, 250=0.22%, 500=0.07%
  cpu          : usr=0.06%, sys=0.41%, ctx=8269, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12905: Tue Jun 27 07:32:49 2017
  write: io=524288KB, bw=2894.7KB/s, iops=45, runt=181124msec
    clat (msec): min=1, max=287, avg=22.10, stdev=14.76
     lat (msec): min=1, max=287, avg=22.10, stdev=14.76
    clat percentiles (usec):
     |  1.00th=[ 1176],  5.00th=[ 1480], 10.00th=[ 2672], 20.00th=[12096],
     | 30.00th=[18048], 40.00th=[21632], 50.00th=[23936], 60.00th=[25984],
     | 70.00th=[28032], 80.00th=[30080], 90.00th=[33024], 95.00th=[36096],
     | 99.00th=[44288], 99.50th=[50944], 99.90th=[207872], 99.95th=[264192],
     | 99.99th=[288768]
    bw (KB  /s): min= 1043, max=13952, per=12.44%, avg=2881.36, stdev=1222.90
    lat (msec) : 2=7.87%, 4=4.00%, 10=6.10%, 20=17.10%, 50=64.37%
    lat (msec) : 100=0.22%, 250=0.24%, 500=0.09%
  cpu          : usr=0.07%, sys=0.44%, ctx=8276, majf=0, minf=32
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12906: Tue Jun 27 07:32:49 2017
  write: io=524288KB, bw=2970.1KB/s, iops=46, runt=176473msec
    clat (usec): min=989, max=287010, avg=21509.37, stdev=15038.56
     lat (usec): min=990, max=287014, avg=21512.91, stdev=15038.61
    clat percentiles (usec):
     |  1.00th=[ 1208],  5.00th=[ 1448], 10.00th=[ 1880], 20.00th=[ 9152],
     | 30.00th=[17536], 40.00th=[21120], 50.00th=[23936], 60.00th=[25728],
     | 70.00th=[27776], 80.00th=[29824], 90.00th=[33024], 95.00th=[35584],
     | 99.00th=[43264], 99.50th=[50944], 99.90th=[207872], 99.95th=[264192],
     | 99.99th=[288768]
    bw (KB  /s): min= 1013, max=10645, per=12.88%, avg=2982.87, stdev=958.42
    lat (usec) : 1000=0.01%
    lat (msec) : 2=10.77%, 4=5.09%, 10=4.50%, 20=15.88%, 50=63.23%
    lat (msec) : 100=0.20%, 250=0.23%, 500=0.09%
  cpu          : usr=0.06%, sys=0.45%, ctx=8271, majf=0, minf=33
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12907: Tue Jun 27 07:32:49 2017
  write: io=524288KB, bw=2928.6KB/s, iops=45, runt=179028msec
    clat (msec): min=1, max=286, avg=21.84, stdev=14.87
     lat (msec): min=1, max=286, avg=21.84, stdev=14.87
    clat percentiles (usec):
     |  1.00th=[ 1208],  5.00th=[ 1480], 10.00th=[ 2096], 20.00th=[11584],
     | 30.00th=[17792], 40.00th=[21376], 50.00th=[23936], 60.00th=[25984],
     | 70.00th=[27776], 80.00th=[29824], 90.00th=[33024], 95.00th=[36096],
     | 99.00th=[43776], 99.50th=[51456], 99.90th=[205824], 99.95th=[264192],
     | 99.99th=[288768]
    bw (KB  /s): min=  887, max=11025, per=12.68%, avg=2937.45, stdev=1052.21
    lat (msec) : 2=9.69%, 4=4.22%, 10=5.00%, 20=16.69%, 50=63.85%
    lat (msec) : 100=0.22%, 250=0.23%, 500=0.09%
  cpu          : usr=0.07%, sys=0.40%, ctx=8275, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12908: Tue Jun 27 07:32:49 2017
  write: io=524288KB, bw=2928.6KB/s, iops=45, runt=179028msec
    clat (msec): min=1, max=286, avg=21.84, stdev=14.83
     lat (msec): min=1, max=286, avg=21.84, stdev=14.83
    clat percentiles (usec):
     |  1.00th=[ 1208],  5.00th=[ 1480], 10.00th=[ 2160], 20.00th=[11712],
     | 30.00th=[18048], 40.00th=[21376], 50.00th=[23936], 60.00th=[25984],
     | 70.00th=[27776], 80.00th=[29824], 90.00th=[33024], 95.00th=[35584],
     | 99.00th=[43776], 99.50th=[51456], 99.90th=[207872], 99.95th=[264192],
     | 99.99th=[288768]
    bw (KB  /s): min= 1013, max=10518, per=12.68%, avg=2935.97, stdev=943.13
    lat (msec) : 2=9.20%, 4=4.87%, 10=4.70%, 20=16.48%, 50=64.21%
    lat (msec) : 100=0.22%, 250=0.23%, 500=0.09%
  cpu          : usr=0.07%, sys=0.44%, ctx=8276, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12909: Tue Jun 27 07:32:49 2017
  write: io=524288KB, bw=2931.2KB/s, iops=45, runt=178876msec
    clat (msec): min=1, max=286, avg=21.81, stdev=14.86
     lat (msec): min=1, max=286, avg=21.81, stdev=14.86
    clat percentiles (usec):
     |  1.00th=[ 1208],  5.00th=[ 1464], 10.00th=[ 2040], 20.00th=[11584],
     | 30.00th=[18048], 40.00th=[21376], 50.00th=[23936], 60.00th=[25984],
     | 70.00th=[27776], 80.00th=[29824], 90.00th=[33024], 95.00th=[35584],
     | 99.00th=[43776], 99.50th=[52480], 99.90th=[207872], 99.95th=[264192],
     | 99.99th=[288768]
    bw (KB  /s): min= 1013, max=10265, per=12.70%, avg=2940.48, stdev=940.95
    lat (msec) : 2=9.86%, 4=4.42%, 10=4.64%, 20=16.60%, 50=63.94%
    lat (msec) : 100=0.23%, 250=0.22%, 500=0.09%
  cpu          : usr=0.06%, sys=0.44%, ctx=8271, majf=0, minf=32
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=23157KB/s, minb=2894KB/s, maxb=3073KB/s, mint=170581msec, maxt=181124msec

Disk stats (read/write):
  sdb: ios=0/66068, merge=0/552, ticks=0/1437760, in_queue=1437743, util=100.00%

=====================================================================

fuse

lgf-randwrite: (g=0): rw=randwrite, bs=64K-64K/64K-64K/64K-64K, ioengine=sync, iodepth=1
...
fio-2.2.8
Starting 8 processes
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
lgf-randwrite: Laying out IO file(s) (1 file(s) / 10240MB)
Jobs: 8 (f=8): [F(1),w(1),F(4),w(1),F(1)] [100.0% done] [0KB/7488KB/0KB /s] [0/117/0 iops] Jobs: 8 (f=8): [F(8)] [100.0% done] [0KB/896KB/0KB /s] [0/14/0 iops] [eta 00m:00s]                      
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12943: Tue Jun 27 07:40:16 2017
  write: io=524288KB, bw=1839.4KB/s, iops=28, runt=285046msec
    clat (usec): min=224, max=587842, avg=34633.95, stdev=55296.35
     lat (usec): min=227, max=587844, avg=34636.73, stdev=55296.35
    clat percentiles (usec):
     |  1.00th=[  494],  5.00th=[ 1608], 10.00th=[ 3120], 20.00th=[ 4704],
     | 30.00th=[ 7264], 40.00th=[ 9536], 50.00th=[13376], 60.00th=[18304],
     | 70.00th=[32128], 80.00th=[48384], 90.00th=[80384], 95.00th=[183296],
     | 99.00th=[232448], 99.50th=[272384], 99.90th=[493568], 99.95th=[552960],
     | 99.99th=[585728]
    bw (KB  /s): min=  584, max= 4434, per=12.68%, avg=1865.61, stdev=317.21
    lat (usec) : 250=0.04%, 500=1.04%, 750=0.89%, 1000=1.07%
    lat (msec) : 2=2.84%, 4=9.27%, 10=26.05%, 20=21.17%, 50=18.41%
    lat (msec) : 100=11.43%, 250=7.23%, 500=0.48%, 750=0.10%
  cpu          : usr=0.09%, sys=0.20%, ctx=16388, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12944: Tue Jun 27 07:40:16 2017
  write: io=524288KB, bw=1839.4KB/s, iops=28, runt=285046msec
    clat (usec): min=208, max=606969, avg=34673.66, stdev=59059.36
     lat (usec): min=211, max=606972, avg=34676.41, stdev=59059.37
    clat percentiles (usec):
     |  1.00th=[  470],  5.00th=[ 1656], 10.00th=[ 2832], 20.00th=[ 4128],
     | 30.00th=[ 5152], 40.00th=[ 7520], 50.00th=[ 9664], 60.00th=[15424],
     | 70.00th=[32384], 80.00th=[48896], 90.00th=[92672], 95.00th=[158720],
     | 99.00th=[264192], 99.50th=[309248], 99.90th=[428032], 99.95th=[468992],
     | 99.99th=[610304]
    bw (KB  /s): min=  358, max= 4447, per=12.63%, avg=1858.40, stdev=342.69
    lat (usec) : 250=0.01%, 500=1.37%, 750=0.98%, 1000=1.25%
    lat (msec) : 2=2.48%, 4=12.93%, 10=31.75%, 20=13.60%, 50=16.02%
    lat (msec) : 100=10.16%, 250=7.59%, 500=1.83%, 750=0.05%
  cpu          : usr=0.13%, sys=0.18%, ctx=16390, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12945: Tue Jun 27 07:40:16 2017
  write: io=524288KB, bw=1839.4KB/s, iops=28, runt=285046msec
    clat (usec): min=190, max=629174, avg=34633.51, stdev=64418.51
     lat (usec): min=192, max=629177, avg=34636.20, stdev=64418.52
    clat percentiles (usec):
     |  1.00th=[  470],  5.00th=[ 1432], 10.00th=[ 2864], 20.00th=[ 4384],
     | 30.00th=[ 6368], 40.00th=[ 8160], 50.00th=[ 9792], 60.00th=[13120],
     | 70.00th=[18304], 80.00th=[27008], 90.00th=[156672], 95.00th=[201728],
     | 99.00th=[240640], 99.50th=[309248], 99.90th=[481280], 99.95th=[544768],
     | 99.99th=[626688]
    bw (KB  /s): min=  125, max= 4722, per=12.66%, avg=1863.43, stdev=345.39
    lat (usec) : 250=0.02%, 500=1.28%, 750=0.84%, 1000=1.49%
    lat (msec) : 2=3.19%, 4=10.14%, 10=33.90%, 20=21.25%, 50=13.78%
    lat (msec) : 100=1.32%, 250=12.01%, 500=0.67%, 750=0.10%
  cpu          : usr=0.09%, sys=0.20%, ctx=16388, majf=0, minf=27
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12946: Tue Jun 27 07:40:16 2017
  write: io=524288KB, bw=1839.4KB/s, iops=28, runt=285046msec
    clat (usec): min=130, max=473626, avg=34632.84, stdev=42395.06
     lat (usec): min=134, max=473629, avg=34635.58, stdev=42395.05
    clat percentiles (usec):
     |  1.00th=[  524],  5.00th=[ 2224], 10.00th=[ 3632], 20.00th=[ 6304],
     | 30.00th=[ 8512], 40.00th=[12096], 50.00th=[17280], 60.00th=[22400],
     | 70.00th=[45824], 80.00th=[59648], 90.00th=[90624], 95.00th=[121344],
     | 99.00th=[187392], 99.50th=[211968], 99.90th=[382976], 99.95th=[395264],
     | 99.99th=[473088]
    bw (KB  /s): min=  240, max= 5322, per=12.64%, avg=1860.39, stdev=387.64
    lat (usec) : 250=0.06%, 500=0.81%, 750=0.61%, 1000=1.12%
    lat (msec) : 2=1.92%, 4=7.32%, 10=23.99%, 20=19.86%, 50=16.99%
    lat (msec) : 100=18.41%, 250=8.58%, 500=0.33%
  cpu          : usr=0.08%, sys=0.24%, ctx=16389, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12947: Tue Jun 27 07:40:16 2017
  write: io=524288KB, bw=1839.4KB/s, iops=28, runt=285046msec
    clat (usec): min=116, max=499696, avg=34634.29, stdev=46594.74
     lat (usec): min=119, max=499697, avg=34637.10, stdev=46594.74
    clat percentiles (usec):
     |  1.00th=[  490],  5.00th=[ 2064], 10.00th=[ 3184], 20.00th=[ 4640],
     | 30.00th=[ 7392], 40.00th=[10304], 50.00th=[15040], 60.00th=[20864],
     | 70.00th=[34048], 80.00th=[59648], 90.00th=[94720], 95.00th=[136192],
     | 99.00th=[191488], 99.50th=[222208], 99.90th=[391168], 99.95th=[432128],
     | 99.99th=[497664]
    bw (KB  /s): min=  245, max= 4481, per=12.64%, avg=1860.13, stdev=379.98
    lat (usec) : 250=0.07%, 500=0.99%, 750=0.61%, 1000=0.79%
    lat (msec) : 2=2.31%, 4=10.21%, 10=24.17%, 20=19.85%, 50=17.02%
    lat (msec) : 100=14.77%, 250=8.80%, 500=0.42%
  cpu          : usr=0.10%, sys=0.22%, ctx=16389, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12948: Tue Jun 27 07:40:16 2017
  write: io=524288KB, bw=1839.4KB/s, iops=28, runt=285046msec
    clat (usec): min=162, max=625086, avg=34632.58, stdev=55390.48
     lat (usec): min=165, max=625088, avg=34635.37, stdev=55390.46
    clat percentiles (usec):
     |  1.00th=[  478],  5.00th=[ 1736], 10.00th=[ 3472], 20.00th=[ 6496],
     | 30.00th=[ 8896], 40.00th=[11456], 50.00th=[14016], 60.00th=[17280],
     | 70.00th=[22912], 80.00th=[37120], 90.00th=[119296], 95.00th=[166912],
     | 99.00th=[214016], 99.50th=[305152], 99.90th=[432128], 99.95th=[528384],
     | 99.99th=[626688]
    bw (KB  /s): min=  120, max= 5417, per=12.67%, avg=1863.67, stdev=401.18
    lat (usec) : 250=0.05%, 500=0.99%, 750=0.54%, 1000=0.98%
    lat (msec) : 2=3.32%, 4=5.96%, 10=22.53%, 20=31.95%, 50=17.72%
    lat (msec) : 100=3.91%, 250=11.44%, 500=0.56%, 750=0.06%
  cpu          : usr=0.12%, sys=0.24%, ctx=16389, majf=0, minf=54
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12949: Tue Jun 27 07:40:16 2017
  write: io=524288KB, bw=1839.4KB/s, iops=28, runt=285046msec
    clat (usec): min=202, max=958358, avg=34672.45, stdev=100428.77
     lat (usec): min=204, max=958360, avg=34675.23, stdev=100428.81
    clat percentiles (usec):
     |  1.00th=[  414],  5.00th=[ 1048], 10.00th=[ 2256], 20.00th=[ 3216],
     | 30.00th=[ 3920], 40.00th=[ 4512], 50.00th=[ 5344], 60.00th=[ 7200],
     | 70.00th=[ 9536], 80.00th=[14400], 90.00th=[46848], 95.00th=[350208],
     | 99.00th=[460800], 99.50th=[518144], 99.90th=[716800], 99.95th=[733184],
     | 99.99th=[954368]
    bw (KB  /s): min=  117, max= 3952, per=12.76%, avg=1877.50, stdev=321.70
    lat (usec) : 250=0.02%, 500=2.33%, 750=1.05%, 1000=1.43%
    lat (msec) : 2=3.59%, 4=22.52%, 10=40.45%, 20=13.21%, 50=5.88%
    lat (msec) : 100=2.92%, 250=0.71%, 500=5.32%, 750=0.51%, 1000=0.05%
  cpu          : usr=0.11%, sys=0.24%, ctx=16388, majf=0, minf=58
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
lgf-randwrite: (groupid=0, jobs=1): err= 0: pid=12950: Tue Jun 27 07:40:16 2017
  write: io=524288KB, bw=1839.4KB/s, iops=28, runt=285046msec
    clat (usec): min=198, max=642362, avg=34632.47, stdev=60796.16
     lat (usec): min=201, max=642365, avg=34635.21, stdev=60796.14
    clat percentiles (usec):
     |  1.00th=[  490],  5.00th=[ 1928], 10.00th=[ 2992], 20.00th=[ 4640],
     | 30.00th=[ 8384], 40.00th=[12608], 50.00th=[16512], 60.00th=[20352],
     | 70.00th=[24704], 80.00th=[33536], 90.00th=[73216], 95.00th=[220160],
     | 99.00th=[248832], 99.50th=[317440], 99.90th=[544768], 99.95th=[585728],
     | 99.99th=[643072]
    bw (KB  /s): min=  344, max= 4740, per=12.69%, avg=1867.42, stdev=329.03
    lat (usec) : 250=0.04%, 500=1.03%, 750=0.77%, 1000=0.73%
    lat (msec) : 2=2.77%, 4=10.46%, 10=18.64%, 20=25.06%, 50=25.68%
    lat (msec) : 100=7.64%, 250=6.27%, 500=0.77%, 750=0.13%
  cpu          : usr=0.12%, sys=0.20%, ctx=16388, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8192/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=14714KB/s, minb=1839KB/s, maxb=1839KB/s, mint=285046msec, maxt=285046msec

==========================================================================


While this proves our theory about O_SYNC causing significant delays. We also want to verify what is the bottleneck. To run the fio job with 1 thread, is it enough to change "numjobs=1" in the job file you gave?

Comment 13 Manoj Pillai 2017-06-27 17:04:23 UTC
(In reply to Pranith Kumar K from comment #12)
[...]
> While this proves our theory about O_SYNC causing significant delays. We
> also want to verify what is the bottleneck. To run the fio job with 1
> thread, is it enough to change "numjobs=1" in the job file you gave?

I change a few other options as well to keep data and access set size the same:
numjobs=1
nrfiles=1
openfiles=1
size=80g
io_size=4096m

[A suggestion, that is admittedly based on a personal preference :). Could we paste just the key portion of the output in the bz comments and include the detailed output as attachments? Keeps the bz navigable]

Comment 14 Pranith Kumar K 2017-06-27 17:32:25 UTC
(In reply to Manoj Pillai from comment #13)
> (In reply to Pranith Kumar K from comment #12)
> [...]
> > While this proves our theory about O_SYNC causing significant delays. We
> > also want to verify what is the bottleneck. To run the fio job with 1
> > thread, is it enough to change "numjobs=1" in the job file you gave?
> 
> I change a few other options as well to keep data and access set size the
> same:
> numjobs=1
> nrfiles=1
> openfiles=1
> size=80g
> io_size=4096m
> 
> [A suggestion, that is admittedly based on a personal preference :). Could
> we paste just the key portion of the output in the bz comments and include
> the detailed output as attachments? Keeps the bz navigable]

Yeah :-), even Prasanna said the same.

Comment 15 Manoj Pillai 2017-06-27 18:33:39 UTC
Hmm.. some stats I don't understand this. Looking at the fio run from comment #6, randwrite, with 8 threads, bs=8k:

Bandwidth reported is about 13MB/s:
gluster-block: write: io=4096.0MB, bw=13301K/s

When I look at iostat on the block device at the client during this interval, I see:

<excerpt>
06/16/2017 03:05:44 AM
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
<snip>
sdc               0.00     0.00    0.00 1643.80     0.00 13149.40    16.00     7.95    4.84    0.00    4.84   0.61 100.00
</excerpt>

This matches expectation: wkB/s corresponds to what the app is reporting; avgrq-sz is 16, which is basically the 8k request size expressed as 512b sectors.

But when I look at iostat on the back-end device, I see:

<excerpt>
06/16/2017 03:05:46 AM
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.00    0.00 5086.00     0.00 445277.33   175.10     0.44    0.09    0.00    0.09   0.05  25.63
<snip>
rhsvg1-rhslv1     0.00     0.00    0.00 6772.33     0.00 445192.00   131.47     0.64    0.09    0.00    0.09   0.06  38.67
</excerpt>

Request sizes are much higher. Writes/s are much higher. And a lot of device bandwidth is being used. I need to try to understand the transformation of the I/O stream between these two points.

Comment 16 Pranith Kumar K 2017-06-28 01:51:13 UTC
(In reply to Manoj Pillai from comment #15)
> Hmm.. some stats I don't understand this. Looking at the fio run from
> comment #6, randwrite, with 8 threads, bs=8k:
> 
> Bandwidth reported is about 13MB/s:
> gluster-block: write: io=4096.0MB, bw=13301K/s
> 
> When I look at iostat on the block device at the client during this
> interval, I see:
> 
> <excerpt>
> 06/16/2017 03:05:44 AM
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> <snip>
> sdc               0.00     0.00    0.00 1643.80     0.00 13149.40    16.00  
> 7.95    4.84    0.00    4.84   0.61 100.00
> </excerpt>
> 
> This matches expectation: wkB/s corresponds to what the app is reporting;
> avgrq-sz is 16, which is basically the 8k request size expressed as 512b
> sectors.
> 
> But when I look at iostat on the back-end device, I see:
> 
> <excerpt>
> 06/16/2017 03:05:46 AM
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> nvme0n1           0.00     0.00    0.00 5086.00     0.00 445277.33   175.10 
> 0.44    0.09    0.00    0.09   0.05  25.63
> <snip>
> rhsvg1-rhslv1     0.00     0.00    0.00 6772.33     0.00 445192.00   131.47 
> 0.64    0.09    0.00    0.09   0.06  38.67
> </excerpt>
> 
> Request sizes are much higher. Writes/s are much higher. And a lot of device
> bandwidth is being used. I need to try to understand the transformation of
> the I/O stream between these two points.

By default write-behind still caches writes and merges them into a bigger write. So you may want to do "gluster volume set <volname> performance.strict-o-direct on" to instruct write-behind not to cache/merge the writes.

Comment 17 Manoj Pillai 2017-06-28 03:21:04 UTC
(In reply to Pranith Kumar K from comment #16)

> By default write-behind still caches writes and merges them into a bigger
> write. So you may want to do "gluster volume set <volname>
> performance.strict-o-direct on" to instruct write-behind not to cache/merge
> the writes.

Yes it is on. In this case, gluster volume info:

Volume Name: perfvol
Type: Distribute
Volume ID: 7e0e12d4-3d2b-4209-91f8-00788dafc424
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: smerf04-10ge:/mnt/rhs_brick1
Options Reconfigured:
network.remote-dio: disable
performance.strict-o-direct: on
cluster.lookup-optimize: on
performance.io-cache: off
performance.write-behind: off
performance.read-ahead: off
transport.address-family: inet
nfs.disable: on

Comment 18 Pranith Kumar K 2017-06-28 03:23:55 UTC
(In reply to Manoj Pillai from comment #17)
> (In reply to Pranith Kumar K from comment #16)
> 
> > By default write-behind still caches writes and merges them into a bigger
> > write. So you may want to do "gluster volume set <volname>
> > performance.strict-o-direct on" to instruct write-behind not to cache/merge
> > the writes.
> 
> Yes it is on. In this case, gluster volume info:
> 
> Volume Name: perfvol
> Type: Distribute
> Volume ID: 7e0e12d4-3d2b-4209-91f8-00788dafc424
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: smerf04-10ge:/mnt/rhs_brick1
> Options Reconfigured:
> network.remote-dio: disable
> performance.strict-o-direct: on
> cluster.lookup-optimize: on
> performance.io-cache: off
> performance.write-behind: off
> performance.read-ahead: off
> transport.address-family: inet
> nfs.disable: on

Hey, write-behind is off. So it is not even related to write-behind then.

Comment 39 Sweta Anandpara 2017-08-03 11:05:00 UTC
I am doing a round of testing on the said tcmu-runner and gluster-block bits to verify the functionality impact. 

Manoj, please do update the perf results once you have, post which we can move this bug to its final state.

Comment 40 Manoj Pillai 2017-08-03 11:14:23 UTC
Is there a date by which this needs to be done? see previous comment.

Comment 44 Manoj Pillai 2017-09-01 06:00:04 UTC
Re-evaluating with the following version:

glusterfs-client-xlators-3.8.4-43.el7rhgs.x86_64
glusterfs-cli-3.8.4-43.el7rhgs.x86_64
glusterfs-libs-3.8.4-43.el7rhgs.x86_64
glusterfs-3.8.4-43.el7rhgs.x86_64
glusterfs-fuse-3.8.4-43.el7rhgs.x86_64
glusterfs-server-3.8.4-43.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-41.el7rhgs.x86_64
glusterfs-api-3.8.4-43.el7rhgs.x86_64

gluster-block-0.2.1-9.el7rhgs.x86_64

tcmu-runner-1.2.0-12.el7rhgs.x86_64
libtcmu-1.2.0-12.el7rhgs.x86_64

kernel-3.10.0-693.el7.x86_64

Comparing fuse and gluster-block on an fio tests with a single client, 3 servers, 1x3 volume on nvme bricks. 8 fio jobs running in parallel.

gluster-block with "gluster v set <vol> group gluster-block" and "gluster-block create ... prealloc full...".


Seq Write, bs=1024k:
    fuse: 398  MB/s
    block: 338 MB/s: (85% of fuse)

Seq Read, bs=1024k:
    fuse: 1173 MB/s
    block: 624 MB/s: (53% of fuse)

Random Read, bs=8K, direct=1:
    fuse: 102 MB/s
    block: 80 MB/s: (78% of fuse)

Random Write, bs=8K, direct=1:
    fuse: 63 MB/s
    block: 26 MB/s: (41% of fuse)

Analysis in progress.

Some of these results are affected by: bz #1487495. So the picture from the above results may not be accurate.

Comment 46 Manoj Pillai 2017-09-14 09:10:57 UTC
On an fio random I/O test with libaio, and multiple files per job I see gluster-block at about par with glusterfs-fuse on random read. (glusterfs-fuse is still affected by bz #1487495, so that inaccuracy still exists).

The job file for rand write is below (rand read job is similar):
<quote>
[global]
rw=randwrite
end_fsync=1
startdelay=0
ioengine=libaio
direct=1
bs=8k

[randwrite]
directory=/mnt/glustervol/${HOSTNAME}
filename_format=f.$jobnum.$filenum
iodepth=16
numjobs=2
nrfiles=4
openfiles=4
filesize=10g
size=40g
io_size=2048m
</quote>

Random write performance is still poor with gluster-block:
fuse: 8434 iops
gluster-block: 4321 iops (51% of fuse)

Comment 48 Manoj Pillai 2017-09-15 02:25:46 UTC
(In reply to Manoj Pillai from comment #46)
> On an fio random I/O test with libaio, and multiple files per job I see
> gluster-block at about par with glusterfs-fuse on random read.
> (glusterfs-fuse is still affected by bz #1487495, so that inaccuracy still
> exists).

Here's the job file and numbers for random read on this test:

<quote>
[global]
rw=randread
startdelay=0
ioengine=libaio
direct=1
bs=8k

[randread]
directory=/mnt/glustervol/${HOSTNAME}
filename_format=f.$jobnum.$filenum
iodepth=16
numjobs=2
nrfiles=4
openfiles=4
filesize=10g
size=40g
io_size=2048m
</quote>

fuse: 14500 iops
gluster-block: 16200 iops

Comment 51 errata-xmlrpc 2017-09-21 04:19:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2773


Note You need to log in before you can comment on or make changes to this bug.