Bug 858850

Summary:	fuse: backport scatter-gather direct IO
Product:	Red Hat Enterprise Linux 6	Reporter:	Brian Foster <bfoster>
Component:	kernel	Assignee:	Brian Foster <bfoster>
Status:	CLOSED ERRATA	QA Contact:	Filesystem QE <fs-qe>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	6.4	CC:	bdonahue, bengland, dhoward, eguan, esandeen, fhrbata, jiali, jpallich, jshao, kzhang, msvoboda, perfbz, rwheeler, sforsber, shaines
Target Milestone:	rc	Keywords:	ZStream
Target Release:	6.4
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Filesystem in Userspace (FUSE) did not implement scatter-gather direct I/O optimally. Consequently, the kernel had to process an extensive number of FUSE requests, which had a negative impact on system performance. This update applies a set of patches which improves internal request management for other features, such as readahead. FUSE direct I/O overhead has been significantly reduced to minimize negative effects on system performance.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-02-21 06:38:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	832572, 865305, 881827

Description Brian Foster 2012-09-19 19:29:49 UTC

This is a bug to track a backport of the proposed scatter-gather direct I/O patches currently up for review in upstream fuse:

http://sourceforge.net/mailarchive/message.php?msg_id=29858183

This is desirable for RHS/RHEV integration (qemu), but the rhel6 target for this work that most expediently filters into the targeted RHEV integration release is not totally clear. Targeted to rhel6.5 for now.

Comment 5 Ben England 2012-09-27 19:47:14 UTC

Additional testing to measure impact of this patch indicates that the patch is important, BUT we can get significant improvement in performance without this patch, and this may be enough to get RHS at least into beta test with RHEV 3.1. 

HOWEVER, there was a 30% improvement in performance with the patch that in our tests made Gluster more closely match NFS performance on sequential I/O inside VM, even with replication overhead.  So this really does matter, and we want to see this patch backported to RHEL6 if at all possible.

I can post data but since this bug is not marked redhat-internal I hesitate to post it here, certainly it is available upon request.

Comment 6 Brian Foster 2012-10-10 20:03:30 UTC

Posted for review:

http://post-office.corp.redhat.com/archives/rhkernel-list/2012-October/msg01186.html

Comment 8 Ben England 2012-10-11 19:41:42 UTC

I tested the S-G patch in bz 858850 with Brian Foster's RHEL6.4 kernel using test-pwritev.c to get quickest results.  The data shows that the gain in performance for a synthetic preadv/pwritev workload matches what we observed in previous testing with the patch applied to Linux 3.6 kernel.  In addition, the patch shows very significant perf gains for WRITES with O_DIRECT.  The results are very consistent and not subtle at all, and can be quickly reproduced with the test program below.

Huge gains are seen with O_DIRECT when iovec array contains 256 4-KB buffers.  For reads the gain is a factor of 15, for writes the gain is only a factor of 2.5.  There is a 10=15% decrease in read performance (why?) with buffered I/O.

A graph and raw data for the results are available at

graph - http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/s-g-patch-preadv-pwritev.odp

This result shows that the patch does what it is supposed to do.  What it does NOT tell you is how much the patch will help KVM.  Clearly it only applies for KVM mode "io=threads", not "io=native" (which uses io_submit() system call instead).   Since it does not help with buffered I/O, it will not help with KVM cache=writeback/cache=writethrough.  It will help with KVM cache=none,io=threads.  We will test to validate this hypothesis.

The test program is:

http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/test-pwritev.c

Comment 9 Ben England 2012-10-11 23:05:58 UTC

I found and fixed some measurement problems, results in above graph have been updated.  THere seems to be no measurable regression in buffered I/O case when I use multiple samples. The impact of S-G patch for O_DIRECT case is identical.

I will now go and measure effect of disabling O_DIRECT on server on this graph.  It's pretty clear that we don't need the S-G patch if we use cache=writeback.  On the other hand, we may not need cache=writeback if we use O_DIRECT on KVM host side in conjunction with S-G patch and don't use O_DIRECT on server side.  This is what NFS does, but not what Gluster does today.  If we can use cache=none then RHEV's default setting will work with Gluster.

Comment 10 Ben England 2012-10-15 11:30:34 UTC

I measured effect of S-G patch using a RHEL6.4 kernel built with it by Brian, using default RHEV setting of KVM cache=none, io=threads, and NO GUEST TUNING.    I focused on reads because this was the biggest problem area for RHEV/RHS.  The results show a significant impact for S-G patch, with a lesser but still significant effect of eliminating O_DIRECT open in RHS server.  Best results were obtained with both optimizations in use.

http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/rhev-rhs-aug-2012-1-sg.pdf

workload: Single KVM guest, using just a single iozone thread doing buffered read with 4-KB I/O size to 8-GB file.   Cache was dropped in servers, KVM host (should be unnecessary) and VM.

configuration: 2 servers, 1 KVM host, 10-GbE link with jumbo frames, each server has a RAID6 LUN, using multipath driver and LVM as well.  RHS version rhsvirt1-7 was used on KVM hosts and storage servers.  All Gluster translators are off except for write-behind translator, and no Gluster volume tuning was used.

All results sampled 3 times and percent deviation is under 5%.

Comment 11 Brian Foster 2012-11-13 13:16:04 UTC

An updated v2 set posted for the next 6.3.z window is posted here:

http://post-office.corp.redhat.com/archives/rhkernel-list/2012-November/msg00415.html

The v2 backport corresponds with the upstream v3 proposal set:

http://sourceforge.net/mailarchive/message.php?msg_id=30017972

Comment 12 Jarod Wilson 2012-11-19 15:10:21 UTC

Patch(es) available on kernel-2.6.32-342.el6

Comment 15 Ben England 2012-11-30 15:38:55 UTC

I tested Brian's latest version of this patch and performance is identical to previous version and continues to be far superior to vanilla RHEL6.3 KVM/RHS throughput for single-thread sequential read performance, 2x throughput without any read tuning in VM, 4x throughput with /sys/block/vda/queue/read_ahead_kb=8192.

Comment 18 Ben England 2013-01-18 04:10:07 UTC

The test program is still there but the DNS domain name was changed on us, from lab.bos.redhat.com to perf.lab.eng.bos.redhat.com.  Just substitute in the above URLs and you should be able to get these programs. 

I do not know if I can be QA contact since I do not know how to get the kernel that you want to test.  I tested Brian's patch already.  If you changed the patch further then I can re-test if a kernel RPM is provided.  If you didn't change the patch then re-test would not be necessary correct?

Comment 21 errata-xmlrpc 2013-02-21 06:38:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html