858850 – fuse: backport scatter-gather direct IO

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 858850 - fuse: backport scatter-gather direct IO

Summary: fuse: backport scatter-gather direct IO

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	6.4
Assignee:	Brian Foster
QA Contact:	Filesystem QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	832572 865305 881827
TreeView+	depends on / blocked

Reported:	2012-09-19 19:29 UTC by Brian Foster
Modified:	2014-02-21 07:08 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Filesystem in Userspace (FUSE) did not implement scatter-gather direct I/O optimally. Consequently, the kernel had to process an extensive number of FUSE requests, which had a negative impact on system performance. This update applies a set of patches which improves internal request management for other features, such as readahead. FUSE direct I/O overhead has been significantly reduced to minimize negative effects on system performance.
Clone Of:
Environment:
Last Closed:	2013-02-21 06:38:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2013:0496	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6 kernel update	2013-02-20 21:40:54 UTC

Description Brian Foster 2012-09-19 19:29:49 UTC

This is a bug to track a backport of the proposed scatter-gather direct I/O patches currently up for review in upstream fuse:

http://sourceforge.net/mailarchive/message.php?msg_id=29858183

This is desirable for RHS/RHEV integration (qemu), but the rhel6 target for this work that most expediently filters into the targeted RHEV integration release is not totally clear. Targeted to rhel6.5 for now.

Comment 5 Ben England 2012-09-27 19:47:14 UTC

Additional testing to measure impact of this patch indicates that the patch is important, BUT we can get significant improvement in performance without this patch, and this may be enough to get RHS at least into beta test with RHEV 3.1. 

HOWEVER, there was a 30% improvement in performance with the patch that in our tests made Gluster more closely match NFS performance on sequential I/O inside VM, even with replication overhead.  So this really does matter, and we want to see this patch backported to RHEL6 if at all possible.

I can post data but since this bug is not marked redhat-internal I hesitate to post it here, certainly it is available upon request.

Comment 6 Brian Foster 2012-10-10 20:03:30 UTC

Posted for review:

http://post-office.corp.redhat.com/archives/rhkernel-list/2012-October/msg01186.html

Comment 8 Ben England 2012-10-11 19:41:42 UTC

I tested the S-G patch in bz 858850 with Brian Foster's RHEL6.4 kernel using test-pwritev.c to get quickest results.  The data shows that the gain in performance for a synthetic preadv/pwritev workload matches what we observed in previous testing with the patch applied to Linux 3.6 kernel.  In addition, the patch shows very significant perf gains for WRITES with O_DIRECT.  The results are very consistent and not subtle at all, and can be quickly reproduced with the test program below.

Huge gains are seen with O_DIRECT when iovec array contains 256 4-KB buffers.  For reads the gain is a factor of 15, for writes the gain is only a factor of 2.5.  There is a 10=15% decrease in read performance (why?) with buffered I/O.

A graph and raw data for the results are available at

graph - http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/s-g-patch-preadv-pwritev.odp

This result shows that the patch does what it is supposed to do.  What it does NOT tell you is how much the patch will help KVM.  Clearly it only applies for KVM mode "io=threads", not "io=native" (which uses io_submit() system call instead).   Since it does not help with buffered I/O, it will not help with KVM cache=writeback/cache=writethrough.  It will help with KVM cache=none,io=threads.  We will test to validate this hypothesis.

The test program is:

http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/test-pwritev.c

Comment 9 Ben England 2012-10-11 23:05:58 UTC

I found and fixed some measurement problems, results in above graph have been updated.  THere seems to be no measurable regression in buffered I/O case when I use multiple samples. The impact of S-G patch for O_DIRECT case is identical.

I will now go and measure effect of disabling O_DIRECT on server on this graph.  It's pretty clear that we don't need the S-G patch if we use cache=writeback.  On the other hand, we may not need cache=writeback if we use O_DIRECT on KVM host side in conjunction with S-G patch and don't use O_DIRECT on server side.  This is what NFS does, but not what Gluster does today.  If we can use cache=none then RHEV's default setting will work with Gluster.

Comment 10 Ben England 2012-10-15 11:30:34 UTC

I measured effect of S-G patch using a RHEL6.4 kernel built with it by Brian, using default RHEV setting of KVM cache=none, io=threads, and NO GUEST TUNING.    I focused on reads because this was the biggest problem area for RHEV/RHS.  The results show a significant impact for S-G patch, with a lesser but still significant effect of eliminating O_DIRECT open in RHS server.  Best results were obtained with both optimizations in use.

http://perf1.lab.bos.redhat.com/bengland/laptop/matte/virt/rhev-rhs-aug-2012-1-sg.pdf

workload: Single KVM guest, using just a single iozone thread doing buffered read with 4-KB I/O size to 8-GB file.   Cache was dropped in servers, KVM host (should be unnecessary) and VM.

configuration: 2 servers, 1 KVM host, 10-GbE link with jumbo frames, each server has a RAID6 LUN, using multipath driver and LVM as well.  RHS version rhsvirt1-7 was used on KVM hosts and storage servers.  All Gluster translators are off except for write-behind translator, and no Gluster volume tuning was used.

All results sampled 3 times and percent deviation is under 5%.

Comment 11 Brian Foster 2012-11-13 13:16:04 UTC

An updated v2 set posted for the next 6.3.z window is posted here:

http://post-office.corp.redhat.com/archives/rhkernel-list/2012-November/msg00415.html

The v2 backport corresponds with the upstream v3 proposal set:

http://sourceforge.net/mailarchive/message.php?msg_id=30017972

Comment 12 Jarod Wilson 2012-11-19 15:10:21 UTC

Patch(es) available on kernel-2.6.32-342.el6

Comment 15 Ben England 2012-11-30 15:38:55 UTC

I tested Brian's latest version of this patch and performance is identical to previous version and continues to be far superior to vanilla RHEL6.3 KVM/RHS throughput for single-thread sequential read performance, 2x throughput without any read tuning in VM, 4x throughput with /sys/block/vda/queue/read_ahead_kb=8192.

Comment 18 Ben England 2013-01-18 04:10:07 UTC

The test program is still there but the DNS domain name was changed on us, from lab.bos.redhat.com to perf.lab.eng.bos.redhat.com.  Just substitute in the above URLs and you should be able to get these programs. 

I do not know if I can be QA contact since I do not know how to get the kernel that you want to test.  I tested Brian's patch already.  If you changed the patch further then I can re-test if a kernel RPM is provided.  If you didn't change the patch then re-test would not be necessary correct?

Comment 21 errata-xmlrpc 2013-02-21 06:38:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html

Note You need to log in before you can comment on or make changes to this bug.