Description of problem: We have noticed a huge performance regression on IA64 with the Fusion MPT driver between versions 3.02.73 (2.6.9-55.0.12) and 3.02.99.00 (2.6.9-67). I know it is related to the MPT Fusion driver because performance is good if I downgrade the MPT driver to 3.02.73 in the 2.6.9-67 kernel. Version-Release number of selected component (if applicable): I can reproduce the problem with both 2.6.9-67 and 2.6.9-67.0.1 kernels, but not with 2.6.9-55.0.12. How reproducible: I can reproduce the problem on our 3 IA64 nodes. Cannot reproduce the problem on i686 or x86_64. Steps to Reproduce: 1. boot 2.6.9-67 2. write some data to a block device relying on the MPT driver (SCSI in our case, not FC/SAS) boot logs: mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator} scsi0 : ioc0: LSI53C1030, FwRev=01030100h, Ports=1, MaxQ=255, IRQ=55 Using cfq io scheduler ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 73 (level, low) -> IRQ 56 mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator} scsi1 : ioc1: LSI53C1030, FwRev=01030100h, Ports=1, MaxQ=255, IRQ=56 Vendor: FUJITSU Model: MAN3735MC Rev: 0111 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sda: 143550456 512-byte hdwr sectors (73498 MB) SCSI device sda: drive cache: write back, no read (daft) SCSI device sda: 143550456 512-byte hdwr sectors (73498 MB) SCSI device sda: drive cache: write back, no read (daft) Actual results: With 2.6.9-67: # time dd if=/dev/zero of=/dev/sda bs=1M count=100 100+0 records in 100+0 records out real 0m46.203s user 0m0.001s sys 0m0.512s Expected results: With 2.6.9-55.0.12: # time dd if=/dev/zero of=/dev/sda bs=1M count=100 100+0 records in 100+0 records out real 0m2.159s user 0m0.000s sys 0m0.160s Of course, mke2fs also suffers from this problem. Amazingly, performance is good at the sg level: # sg_map /dev/sg0 /dev/sda # time sg_dd if=/dev/zero of=/dev/vg0 bs=1M count=100 100+0 records in 100+0 records out real 0m0.125s user 0m0.013s sys 0m0.103s # time sg_dd if=/dev/zero of=/dev/sda bs=1M count=100 100+0 records in 100+0 records out real 0m46.158s user 0m0.012s sys 0m0.271s
I asked Johann to mix & match kernels & drivers to be sure it's the driver: 3.02.73 3.02.99 2.6.9-55.0.12 perf OK perf KO 2.6.9-67 perf OK perf KO -Eric
I wonder if this is related to: "-fix regression in MPT/Fusion diskdump (Nobuhiro Tachino) [251153]" which was put in 2.6.9-57
(In reply to comment #2) > I wonder if this is related to: "-fix regression in MPT/Fusion diskdump > (Nobuhiro Tachino) [251153]" which was put in 2.6.9-57 attachment 160815 [details] is a diskdump fix. It is unlikely related to this problem. But, I will give it a try, just to make sure.
I don't see the problem with the dd workload on my tiger4: 2.6.9-55: real 0m1.898s user 0m0.000s sys 0m0.507s 2.6.9-67: real 0m1.908s user 0m0.001s sys 0m0.444s my storage controller are: 06:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 06:02.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
I don't know how to make progress about this bug. Based on the comment#1,4, the driver itself doesn't have regression from two different testing.