From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041001 Firefox/0.10.1 Description of problem: RH 7.3 seemed to max-out at about 60MB/s over FC. RHEL 3 hits maxs-out at about 20MB/s. The problem is confined to the SCSI layer. Using sg_dd (SCSI generic implementation of dd), the FC and SAN max-out at about 160MB/s on reads (FC-2). This means that, if the SCSI layer would send "read" commands with larger "read" lengths, the performance would increase. The SAN shows this problem nicely: the DDNS2A8500 "stats length" command shows a graph of the "read" and "write" lengths it receives. When performang I/O operations with large "read" and "write" lengths, the DDN shows that the "write" requests are large, but the "read" requests are fragmented: S2A 8000[1]: stats length Command Length Statistics Length Port 1 Port 2 Port 3 Port 4 Kbytes Reads Writes Reads Writes Reads Writes Reads Writes > 0 2359622 1687DD9 1A07EF3 E0351 1B50060 C555 B0AD97 27B19F > 16 C6332 44FF0D 226E5 6589 1FFF8 1CAC 6358 2E863 > 32 14BB4 26DA31 58CD 2BDA 4664 11CD 16A 10D6 > 48 17542 186DF8 8B13 1557 81BA 96D 0 B66 > 64 7FA2 119010 5E52 1A2E 5C20 B94 2 65B > 80 4768 D408F 3807 C48 33B0 589 0 640 > 96 B2BE A4BA6 7F07 B6C 7DBE 59D 0 49A > 112 2F7D 818FF 3057 E14 2FC0 902 0 440 > 128 3710D 2E40F9 3467F 84337 35BCB 852B5 0 37F00 > 144 0 0 0 0 0 0 0 0 > 160 0 0 0 0 0 0 0 0 > 176 0 0 0 0 0 0 0 0 > 192 0 0 0 0 0 0 0 0 > 208 0 0 0 0 0 0 0 0 > 224 0 0 0 0 0 0 0 0 > 240 0 0 0 0 0 0 0 0 Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Time "dd" commands with large block sizes (bs) both reading and writing to SCSI FC devices. 2.Time "sg_dd" commands of the same block sizes to the same SCSI FC devices. 3. Note the performance difference. Actual Results: FC-2 read/write pwerformance using sg_dd with large block sizes will get about 160MB/s (given a good SAN like a DDNS2A8500). Using "dd" with the same large block sizes over the same device gets much worse performance. Expected Results: I need 160MB/s "read" and "write". I'd settle for a 10% performance degredation in "reads". Additional info:
Larry Woodman and a few other people have been working this problem more than me. It is not necessarily a SCSI issue. Your correct that sg_dd will skip anything that could fragment the requests, but when using dd the fragmentation is likely a result of the I/O elevator and VM subsystems more than it is the SCSI subsystem. Larry, if you disagree, then simply toss this back at me.
Created attachment 107910 [details] Graphs comparing RHEL to RH7.3 SCSI/FC performance I'm adding some data I've collected; see the attached PDF. The left column of three graphs shows RHEL3U2 results, the right column of three graphs shows RH7.3 (I forget which kernel... I think 2.4.19) results. All on the same hardware: DDN S2A8000 8-port SANs, 3GHz I/O and Compute nodes, I/O nodes w/ QLogic FC2 cards, channel bonded GigE, and running GFS; the Compute Nodes exporting via NFS the GFS partitions in a ratio of 8 Compute Nodes per I/O Node. A Foundry FastIron 1500 switch is used for GigE/NFS (the Foundry switch firmware has also been upgraded in this time frame). The left and right graphs are comperable, even if the captions sound a bit different. The top two graphs show I/O (GFS) node perspective (GigE/NFS not involved). The recent graphs with RHEL (left top) show up to 16 I/O nodes, the RH7.3graph (right top) only shows eight. Write performance has doubled in RHEL vs. 7.3... from a maximum at 550MB/s to 1.1GB/s! This is pushing the limits of the DDN S2A8000 (sgp_dd can get about 1.3GB/s writes). The read performance, on the other hand, has decreased significantly. In RH7.3 I had been getting an aggregate performance of ~350MB/s. In RHEL, I'm getting 200MB/s. The sgp_dd read performance aggregate (across 8 FC2 ports) is ~900MB/s for the DDN S2A8000. The middle two graphs show the Compute Node perspective. The Compute Nodes are NFS clients for the GFS partitions served by the I/O nodes. The 16 and 32 client nodes on the X-axis (of the left RHEL graph) map Compute Nodes to I/O nodes 1-to-1 and 2-to-1 (one NFS client per NFS/GFS server, and two NFS Clients per NFS/GFS server). These two sets of columns are comparable to the X-axis on the right graph at 8 and 16 Compute Nodes (as the RH7.3 system only had 8 I/O nodes). The middle graphs only show that NFS carries through the performance seen on the I/O nodes. The bottom two graphs show the scalability from one to eight Compute Nodes that are all NFS clients of one I/O node. In the RH7.3 case (right graph) you could expect the same NFS performance for both reads and writes, comparable to the GFS performance of the I/O node. In RHEL, the I/O node write performance of 170MB/s slumps to a Compute node aggregate of 100MB/s, even though the I/O nodes use channel-bonded (802.3ad) GigE. Also in RHEL, the read performance doesn't reach 90% of the I/O nodes read performance until all eight NFS client Compute Nodes are invoked. This compounds the already low maximum. All numbers were gathered using I/O zone with 512K byte blocks, and file sizes of twice memory size.
I was going back through the data on the attached PDF and found an error... The charts shown for RH7.3 were prior to channel bonding the I/O (GFS) nodes, and prior to using 9K MTU's. Once those features were enabled, for example, the RH7.3 numbers on the center right chart peaked at 650MB/s for writes, and over 400MB/s for reads. So, half the read speed was lost in the move to RHEL.
I have just upgraded a fileserver from RH7.3 to RHEL3.0 and have discovered a huge read performance hit. I have a Dell PE2650 attached via two Qlogic QLA2310F FCAL cards to the storage device which is a Dell/EMC FC4500. I have installed the latest BIOS 1.42 into the FCAL cards and have used the Qlogic 7.0.3 driver instead of the Redhat supplied one. I am running kernel 2.4.21-9.0.1Elsmp. I used to get about 50mb/s write and 60mb/s read speed. I am now getting about 50mb/s write and 20mb/s read speeds. Read speed is now only a third of what it was.
Gary: It would help to get sgp_dd performance numbers, as the difference between sgp_dd and dd implicate the SCSI layer and not the SAN. You'll probably drool when you see what you could be getting! I think the main problem is that the Linux SCSI layer has been tuned to SCSI adapters and left FC considerations in the dust. While FC is not the culprit (which sgp_dd shows), with a SCSI adapter and a JBOD I can get great dd read/write performance (i.e. 300MB/s)... but, whatever the kernel is doing to boost SCSI adapters effects FC adversely.
Chris, How do I go about getting sgp_dd performance numbers? Regards Gary
See: http://sg.torque.net/sg/u_index.html sg_dd is like dd, while sgp_dd shows the effect of multiple threads. You'll need "scsi generic" either built-in to your kernel, or the sg.o module loaded.
Hi, I can report that I have now overcome this problem - look at bug report 106771. I came across this before - basically, the max-readahead value is set to 31 which is more suited towards random access. For sequential reads, set this to 256. I have done this and my read performance is back up to circa 60mb/s.
Created attachment 112856 [details] Problem partially resolved in RHEL3U4 and GFS 6 Same system has been upgraded to RHEL3U4 and GFS 6, and the performance numbers are compared to GFS 5.2/RHEL3U2 in the attached PDF. The SCSI performance problems are solved. If you compare the top two charts in their lower left corner, you'll see the read speed is much better than it was. This is a single threaded test, so 100MB/s is the max. In multi-threaded tests I can get 140MB/s reading on an I/O node. Good Job RedHat! But, "read" scalability is still an issue in GFS. While the peak "read" speed has improved slightly, it's not reflecting the available hardware (and now OS) available performance. So, RH7.3 GFS still beats RHEL in "read" scalability.
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.