Description of problem:
The option "performance.client-io-threads on" is needed in some scenarios where the fuse thread becomes a bottleneck. But this option seems to be disabled for replicated volumes i.e. setting the option on via the "gluster v set ..." command has no effect.
It is fine to set the option to off by default for replicated volumes. But we need the option of turning it on in scenarios where we see that the fuse thread is bottlenecked.
Version-Release number of selected component (if applicable):
gluster v info perfvol
Volume Name: perfvol
Volume ID: 7773ecd2-6a6b-4220-bf79-42bd279d6476
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Output of top command (with -H option) during a random read test on above volume:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3737 root 20 0 607540 24080 3792 R 99.2 0.0 7:31.29 glusterfs
3734 root 20 0 607540 24080 3792 S 42.2 0.0 4:04.87 glusterfs
3733 root 20 0 607540 24080 3792 S 42.0 0.0 4:06.15 glusterfs
4166 root 20 0 504132 1016 368 D 1.7 0.0 0:00.28 fio
# pstack 3737 [the thread above with 99+% cpu usage]
Thread 1 (process 3737):
#0 0x00007f02d1597190 in readv () from /lib64/libc.so.6
#1 0x00007f02ca26e75a in fuse_thread_proc () from /usr/lib64/glusterfs/3.8.4/xlator/mount/fuse.so
#2 0x00007f02d1cd3e25 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f02d15a034d in clone () from /lib64/libc.so.6
Some data on the potential performance benefit from turning client-io-threads on ...
fio random io test, with 16 threads, direct=1, block size=8.
1x3 volume with bricks on nvme. 10GbE network. fuse-mounted on client.
read : io=8192.0MB, bw=113406K/s, iops=14175
write: io=8192.0MB, bw=65170K/s, iops=8146
[The fuse thread is at 99+% CPU utilization in the above tests.]
client-io-threads turned on by manually editing volfiles [Thanks to Krutika for showing how to do that]:
read : io=8192.0MB, bw=211604K/s, iops=26450
write: io=8192.0MB, bw=152025K/s, iops=19003
Read iops improves by 1.86x
Write iops improves by 2.33x
I understand that in perf QE testing we have seen performance degradations from turning client-io-threads on. No disputing that, but turning it on UNDER THE RIGHT CIRCUMSTANCES, i.e. when you are hitting a fuse thread bottleneck, can give huge benefits. We should be able to turn it on via the CLI in these scenarios. But based on perf QE results, we can keep it off by default.
Upstream patch: https://review.gluster.org/18430
Ravi, my understanding is that the number of client-io-threads is controlled by the option performance.io-thread-count? And that this option also controls the number of io-threads at the brick.
Can you confirm the above and that currently there is no way control the number of client-io-threads independently of the number of io-threads at the brick?
That is correct. https://github.com/gluster/glusterfs/commit/09232fd6855f288c47b5396dcd4d4245a154576f seems to have introduced support for loading io-threads xlator on the client side but has not provided any specific way to modify the tunables exclusively for the brick or client side. So yes, changing any xlator option in iot xlator gets reflected in both the brick and client volfiles.
Thanks! I opened an upstream bz for this along with some data in support of the request: https://bugzilla.redhat.com/show_bug.cgi?id=1499649.
Build Used: glusterfs-3.12.2-8.el7rhgs.x86_64
Below Scenario's are verified:
1) create 1 * 3 replicate volume and start it
2) check for performance.client-io-threads and it should be off
3) check in trusted-<volname>.tcp-fuse.vol file whether io-threads are loaded or not ( it shouldnot be loaded )
1) create distribute volume ( 1 * 1 ) and start
2) check for performance.client-io-threads and it should be on
3) io-threads should be loaded in trusted-<volname>.tcp-fuse.vol
4) add bricks to convert to replicate volume ( 1 * 2 )
5) check for performance.client-io-threads and it should be off
6) io-threads shouldn't be loaded in trusted-<volname>.tcp-fuse.vol
7) remove bricks so that volume type will be converted to distribute
8) check for performance.client-io-threads and it should be on agian
9) io-threads should be loaded in trusted-<volname>.tcp-fuse.vol
1) create 1 * 3 replicate volume and start it
2) set performance.client-io-threads to on
3) check for performance.client-io-threads and it should be on
4) io-threads should be loaded in trusted-<volname>.tcp-fuse.vol
5) add bricks so that volume converts to 2 * 3
6) check for performance.client-io-threads and it should be on
7) io-threads should be loaded in trusted-<volname>.tcp-fuse.vol
8) remove bricks so that 2 * 3 converts to 1 * 3
9) check for performance.client-io-threads and it should be on
10) io-threads should be loaded in trusted-<volname>.tcp-fuse.vol
Changing status to Verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.