Bug 139910
Summary: | Poor SCSI read performance caused by fragmentation of user requests | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Chris Worley <chrisw> | ||||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | coughlan, danderso, gary.mansell, kanderso, kpreslan, petrides, riel, sct | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-10-19 19:13:55 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Chris Worley
2004-11-18 19:03:08 UTC
Larry Woodman and a few other people have been working this problem more than me. It is not necessarily a SCSI issue. Your correct that sg_dd will skip anything that could fragment the requests, but when using dd the fragmentation is likely a result of the I/O elevator and VM subsystems more than it is the SCSI subsystem. Larry, if you disagree, then simply toss this back at me. Created attachment 107910 [details]
Graphs comparing RHEL to RH7.3 SCSI/FC performance
I'm adding some data I've collected; see the attached PDF.
The left column of three graphs shows RHEL3U2 results, the right column of
three graphs shows RH7.3 (I forget which kernel... I think 2.4.19) results. All
on the same hardware: DDN S2A8000 8-port SANs, 3GHz I/O and Compute nodes, I/O
nodes w/ QLogic FC2 cards, channel bonded GigE, and running GFS; the Compute
Nodes exporting via NFS the GFS partitions in a ratio of 8 Compute Nodes per
I/O Node. A Foundry FastIron 1500 switch is used for GigE/NFS (the Foundry
switch firmware has also been upgraded in this time frame).
The left and right graphs are comperable, even if the captions sound a bit
different.
The top two graphs show I/O (GFS) node perspective (GigE/NFS not involved).
The recent graphs with RHEL (left top) show up to 16 I/O nodes, the RH7.3graph
(right top) only shows eight. Write performance has doubled in RHEL vs. 7.3...
from a maximum at 550MB/s to 1.1GB/s! This is pushing the limits of the DDN
S2A8000 (sgp_dd can get about 1.3GB/s writes).
The read performance, on the other hand, has decreased significantly. In RH7.3
I had been getting an aggregate performance of ~350MB/s. In RHEL, I'm getting
200MB/s. The sgp_dd read performance aggregate (across 8 FC2 ports) is
~900MB/s for the DDN S2A8000.
The middle two graphs show the Compute Node perspective. The Compute Nodes are
NFS clients for the GFS partitions served by the I/O nodes. The 16 and 32
client nodes on the X-axis (of the left RHEL graph) map Compute Nodes to I/O
nodes 1-to-1 and 2-to-1 (one NFS client per NFS/GFS server, and two NFS Clients
per NFS/GFS server). These two sets of columns are comparable to the X-axis on
the right graph at 8 and 16 Compute Nodes (as the RH7.3 system only had 8 I/O
nodes).
The middle graphs only show that NFS carries through the performance seen on
the I/O nodes.
The bottom two graphs show the scalability from one to eight Compute Nodes that
are all NFS clients of one I/O node.
In the RH7.3 case (right graph) you could expect the same NFS performance for
both reads and writes, comparable to the GFS performance of the I/O node.
In RHEL, the I/O node write performance of 170MB/s slumps to a Compute node
aggregate of 100MB/s, even though the I/O nodes use channel-bonded (802.3ad)
GigE.
Also in RHEL, the read performance doesn't reach 90% of the I/O nodes read
performance until all eight NFS client Compute Nodes are invoked. This
compounds the already low maximum.
All numbers were gathered using I/O zone with 512K byte blocks, and file sizes
of twice memory size.
I was going back through the data on the attached PDF and found an error... The charts shown for RH7.3 were prior to channel bonding the I/O (GFS) nodes, and prior to using 9K MTU's. Once those features were enabled, for example, the RH7.3 numbers on the center right chart peaked at 650MB/s for writes, and over 400MB/s for reads. So, half the read speed was lost in the move to RHEL. I have just upgraded a fileserver from RH7.3 to RHEL3.0 and have discovered a huge read performance hit. I have a Dell PE2650 attached via two Qlogic QLA2310F FCAL cards to the storage device which is a Dell/EMC FC4500. I have installed the latest BIOS 1.42 into the FCAL cards and have used the Qlogic 7.0.3 driver instead of the Redhat supplied one. I am running kernel 2.4.21-9.0.1Elsmp. I used to get about 50mb/s write and 60mb/s read speed. I am now getting about 50mb/s write and 20mb/s read speeds. Read speed is now only a third of what it was. Gary: It would help to get sgp_dd performance numbers, as the difference between sgp_dd and dd implicate the SCSI layer and not the SAN. You'll probably drool when you see what you could be getting! I think the main problem is that the Linux SCSI layer has been tuned to SCSI adapters and left FC considerations in the dust. While FC is not the culprit (which sgp_dd shows), with a SCSI adapter and a JBOD I can get great dd read/write performance (i.e. 300MB/s)... but, whatever the kernel is doing to boost SCSI adapters effects FC adversely. Chris, How do I go about getting sgp_dd performance numbers? Regards Gary See: http://sg.torque.net/sg/u_index.html sg_dd is like dd, while sgp_dd shows the effect of multiple threads. You'll need "scsi generic" either built-in to your kernel, or the sg.o module loaded. Hi, I can report that I have now overcome this problem - look at bug report 106771. I came across this before - basically, the max-readahead value is set to 31 which is more suited towards random access. For sequential reads, set this to 256. I have done this and my read performance is back up to circa 60mb/s. Created attachment 112856 [details]
Problem partially resolved in RHEL3U4 and GFS 6
Same system has been upgraded to RHEL3U4 and GFS 6, and the performance numbers
are compared to GFS 5.2/RHEL3U2 in the attached PDF.
The SCSI performance problems are solved. If you compare the top two charts in
their lower left corner, you'll see the read speed is much better than it was.
This is a single threaded test, so 100MB/s is the max. In multi-threaded tests
I can get 140MB/s reading on an I/O node.
Good Job RedHat!
But, "read" scalability is still an issue in GFS. While the peak "read" speed
has improved slightly, it's not reflecting the available hardware (and now OS)
available performance.
So, RH7.3 GFS still beats RHEL in "read" scalability.
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |