Bug 510861
Summary: | Storage performance regression between Redhat 4 up 3 and Redhat 5 up 3 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jean Blouin <blouin> |
Component: | kernel | Assignee: | Jeff Moyer <jmoyer> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | urgent | Docs Contact: | |
Priority: | low | ||
Version: | 5.3 | CC: | agk, cluster-maint, dwysocha, edamato, heinzm, jbrassow, jmoyer, mbroz, msnitzer, prockai |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-11-29 19:47:47 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jean Blouin
2009-07-11 19:00:08 UTC
We have found a workaround for the above problem. By modifying the IO scheduler from CFQ to deadline we were able to improve the performance on read on both RedHat 4 up 3 and RedHat 5 up 3. The problem as we understand it now is that the Completely Fair Queueing IO scheduler of RedHat 4 up 3 did a much better job in merging disk request than the verions in Redhat 5 up 3 The following knowledgebase article help us out resolving the issue http://kbase.redhat.com/faq/docs/DOC-15355 The Completely Fair Queuing (cfq) scheduler in RHEL5 appears to have worse I/O read performance than RHEL4. This bug can be closed *** This bug has been marked as a duplicate of bug 448130 *** (In reply to comment #0) > I did some tests on an xw8600 configured with 2 loops XR+XE comparing what "dd" > could read directly from the devices. The goal was to eliminate LVM (and > mdadm) as well as xfs from the equation. > > I believe the results indicate that we are experiencing an issue due to changes > in LVM in RHEL5u3. > > Here's the data: > > > Under RHEL4u3, running four dd processes each reading a seperate underlying LUN > directly, and then running a single dd reading the LVM raid0 device, we see the > following: > > (FOUR TIMES) dd if=/dev/sdb of=/dev/null iflag=direct,nonblock bs=512K iflag=direct isn't supported by the dd shipped with RHEL 4. Did you update your coreutils package to a version that does support this flag? Is your LVM a striped logical volume (I presume)? Not closing as a dup of 448130 ... DM uses a single queue to dispatch IO to the N devices in a striped LV. 448130 is concerned with using multiple threads that are interleaving i/o on behalf of a common task. I'm not convinced this issue isn't somehow related to 448130 but I'll separate the bugs regardless. However, this bug seems like a kernel (cfq) bug not an lvm2 bug. Switching from cfq to deadline apparently resolves the reporter's issue. Jean, If it wouldn't be an inconvenience, could you please re-run your single dd over the logical volume and get me blktrace data? You'd run something like: blktrace -d /dev/<logical-volume> in a directory not on said volume. Then run your test. When the test is complete, kill off blktrace and upload the resulting data files somewhere (if they're not too big, you can attach them to this bugzilla). If you can't do this for some reason, I'll try to reproduce here, but that will take some extra time. Thanks! (In reply to comment #5) > Jean, > > If it wouldn't be an inconvenience, could you please re-run your single dd over > the logical volume and get me blktrace data? You'd run something like: > > blktrace -d /dev/<logical-volume> actually, it would be better if you just got the blktrace data for a single of the underlying luns while doing the dd to the whole raid. Thanks! (In reply to comment #3) > (In reply to comment #0) > > > I did some tests on an xw8600 configured with 2 loops XR+XE comparing what "dd" > > could read directly from the devices. The goal was to eliminate LVM (and > > mdadm) as well as xfs from the equation. > > > > I believe the results indicate that we are experiencing an issue due to changes > > in LVM in RHEL5u3. > > > > Here's the data: > > > > > > Under RHEL4u3, running four dd processes each reading a seperate underlying LUN > > directly, and then running a single dd reading the LVM raid0 device, we see the > > following: > > > (FOUR TIMES) dd if=/dev/sdb of=/dev/null iflag=direct,nonblock bs=512K > > iflag=direct isn't supported by the dd shipped with RHEL 4. Did you update > your coreutils package to a version that does support this flag? > > Is your LVM a striped logical volume (I presume)? Hi Jeffrey, Yes we updated the dd with one that supports the direct flag. I will be out of the office for the next 2 weeks so I forwarded your questions to the engineer that actually did the test on this. He will be able to answer your questions more precisely. As I mentioned above by modifying the IO scheduler from CFQ to deadline we effectively resolved the issue for us. And you are right to say that the problem is most likely related to a bug with the kernel CFQ scheduler. Thanks, Jean I'm still waiting for blktrace data and details on the lvm volume used for testing. I have one further question. What is the value for /sys/block/sdX/queue/max_sectors_kb for RHEL 4 and RHEL 5? I'm still waiting for information, here. If you can't provide any data, then I'm not going to be able to help you! I've posted a test kernel to the following location: http://people.redhat.com/jmoyer/cfq-cc My test system is bandwidth limited, but it did show a speedup for CFQ when striping over two disks. Could you please give this kernel a try with the CFQ I/O scheduler and report your results? I'd appreciate it. Given that there have been no updates from the reporter in the past several months, I'm closing this bug. In the event that the requested information is provided, I'll reopen the bug and we can take it from there. |