Red Hat Bugzilla – Bug 858850
fuse: backport scatter-gather direct IO
Last modified: 2014-02-21 02:08:20 EST
This is a bug to track a backport of the proposed scatter-gather direct I/O patches currently up for review in upstream fuse:
This is desirable for RHS/RHEV integration (qemu), but the rhel6 target for this work that most expediently filters into the targeted RHEV integration release is not totally clear. Targeted to rhel6.5 for now.
Additional testing to measure impact of this patch indicates that the patch is important, BUT we can get significant improvement in performance without this patch, and this may be enough to get RHS at least into beta test with RHEV 3.1.
HOWEVER, there was a 30% improvement in performance with the patch that in our tests made Gluster more closely match NFS performance on sequential I/O inside VM, even with replication overhead. So this really does matter, and we want to see this patch backported to RHEL6 if at all possible.
I can post data but since this bug is not marked redhat-internal I hesitate to post it here, certainly it is available upon request.
Posted for review:
I tested the S-G patch in bz 858850 with Brian Foster's RHEL6.4 kernel using test-pwritev.c to get quickest results. The data shows that the gain in performance for a synthetic preadv/pwritev workload matches what we observed in previous testing with the patch applied to Linux 3.6 kernel. In addition, the patch shows very significant perf gains for WRITES with O_DIRECT. The results are very consistent and not subtle at all, and can be quickly reproduced with the test program below.
Huge gains are seen with O_DIRECT when iovec array contains 256 4-KB buffers. For reads the gain is a factor of 15, for writes the gain is only a factor of 2.5. There is a 10=15% decrease in read performance (why?) with buffered I/O.
A graph and raw data for the results are available at
graph - http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/s-g-patch-preadv-pwritev.odp
This result shows that the patch does what it is supposed to do. What it does NOT tell you is how much the patch will help KVM. Clearly it only applies for KVM mode "io=threads", not "io=native" (which uses io_submit() system call instead). Since it does not help with buffered I/O, it will not help with KVM cache=writeback/cache=writethrough. It will help with KVM cache=none,io=threads. We will test to validate this hypothesis.
The test program is:
I found and fixed some measurement problems, results in above graph have been updated. THere seems to be no measurable regression in buffered I/O case when I use multiple samples. The impact of S-G patch for O_DIRECT case is identical.
I will now go and measure effect of disabling O_DIRECT on server on this graph. It's pretty clear that we don't need the S-G patch if we use cache=writeback. On the other hand, we may not need cache=writeback if we use O_DIRECT on KVM host side in conjunction with S-G patch and don't use O_DIRECT on server side. This is what NFS does, but not what Gluster does today. If we can use cache=none then RHEV's default setting will work with Gluster.
I measured effect of S-G patch using a RHEL6.4 kernel built with it by Brian, using default RHEV setting of KVM cache=none, io=threads, and NO GUEST TUNING. I focused on reads because this was the biggest problem area for RHEV/RHS. The results show a significant impact for S-G patch, with a lesser but still significant effect of eliminating O_DIRECT open in RHS server. Best results were obtained with both optimizations in use.
workload: Single KVM guest, using just a single iozone thread doing buffered read with 4-KB I/O size to 8-GB file. Cache was dropped in servers, KVM host (should be unnecessary) and VM.
configuration: 2 servers, 1 KVM host, 10-GbE link with jumbo frames, each server has a RAID6 LUN, using multipath driver and LVM as well. RHS version rhsvirt1-7 was used on KVM hosts and storage servers. All Gluster translators are off except for write-behind translator, and no Gluster volume tuning was used.
All results sampled 3 times and percent deviation is under 5%.
An updated v2 set posted for the next 6.3.z window is posted here:
The v2 backport corresponds with the upstream v3 proposal set:
Patch(es) available on kernel-2.6.32-342.el6
I tested Brian's latest version of this patch and performance is identical to previous version and continues to be far superior to vanilla RHEL6.3 KVM/RHS throughput for single-thread sequential read performance, 2x throughput without any read tuning in VM, 4x throughput with /sys/block/vda/queue/read_ahead_kb=8192.
The test program is still there but the DNS domain name was changed on us, from lab.bos.redhat.com to perf.lab.eng.bos.redhat.com. Just substitute in the above URLs and you should be able to get these programs.
I do not know if I can be QA contact since I do not know how to get the kernel that you want to test. I tested Brian's patch already. If you changed the patch further then I can re-test if a kernel RPM is provided. If you didn't change the patch then re-test would not be necessary correct?
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.