429680 – MPT Fusion SCSI: performance regression on IA64

Bug 429680 - MPT Fusion SCSI: performance regression on IA64

Summary: MPT Fusion SCSI: performance regression on IA64

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.8
Hardware:	ia64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Red Hat Kernel Manager
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-01-22 13:00 UTC by Johann Lombardi
Modified:	2008-03-11 02:22 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-03-11 02:22:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Johann Lombardi 2008-01-22 13:00:41 UTC

Description of problem:

We have noticed a huge performance regression on IA64 with the Fusion MPT
driver between versions 3.02.73 (2.6.9-55.0.12) and 3.02.99.00 (2.6.9-67).
I know it is related to the MPT Fusion driver because performance is good
if I downgrade the MPT driver to 3.02.73 in the 2.6.9-67 kernel.

Version-Release number of selected component (if applicable):
I can reproduce the problem with both 2.6.9-67 and 2.6.9-67.0.1 kernels, but
not with 2.6.9-55.0.12. 

How reproducible:
I can reproduce the problem on our 3 IA64 nodes. Cannot reproduce the problem
on i686 or x86_64.

Steps to Reproduce:
1. boot 2.6.9-67
2. write some data to a block device relying on the MPT driver (SCSI in our
   case, not FC/SAS)

boot logs:
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
scsi0 : ioc0: LSI53C1030, FwRev=01030100h, Ports=1, MaxQ=255, IRQ=55
Using cfq io scheduler
ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 73 (level, low) -> IRQ 56
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
scsi1 : ioc1: LSI53C1030, FwRev=01030100h, Ports=1, MaxQ=255, IRQ=56
  Vendor: FUJITSU   Model: MAN3735MC         Rev: 0111
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sda: 143550456 512-byte hdwr sectors (73498 MB)
SCSI device sda: drive cache: write back, no read (daft)
SCSI device sda: 143550456 512-byte hdwr sectors (73498 MB)
SCSI device sda: drive cache: write back, no read (daft)

  
Actual results:

With 2.6.9-67:
# time dd if=/dev/zero of=/dev/sda bs=1M count=100
100+0 records in
100+0 records out

real    0m46.203s
user    0m0.001s
sys     0m0.512s

Expected results:

With 2.6.9-55.0.12:

# time dd if=/dev/zero of=/dev/sda bs=1M count=100
100+0 records in
100+0 records out

real    0m2.159s
user    0m0.000s
sys     0m0.160s

Of course, mke2fs also suffers from this problem.
Amazingly, performance is good at the sg level:

# sg_map
/dev/sg0  /dev/sda
# time sg_dd if=/dev/zero of=/dev/vg0 bs=1M count=100
100+0 records in
100+0 records out

real    0m0.125s
user    0m0.013s
sys     0m0.103s

# time sg_dd if=/dev/zero of=/dev/sda bs=1M count=100
100+0 records in
100+0 records out

real    0m46.158s
user    0m0.012s
sys     0m0.271s

Comment 1 Eric Sandeen 2008-01-23 14:01:48 UTC

I asked Johann to mix & match kernels & drivers to be sure it's the driver:

                 3.02.73    3.02.99
2.6.9-55.0.12    perf OK    perf KO
2.6.9-67         perf OK    perf KO

-Eric

Comment 2 Konrad Rzeszutek 2008-01-23 16:18:16 UTC

I wonder if this is related to: "-fix regression in MPT/Fusion diskdump
(Nobuhiro Tachino) [251153]" which was put in  2.6.9-57

Comment 3 Johann Lombardi 2008-01-23 17:01:30 UTC

(In reply to comment #2)
> I wonder if this is related to: "-fix regression in MPT/Fusion diskdump
> (Nobuhiro Tachino) [251153]" which was put in  2.6.9-57

attachment 160815 [details] is a diskdump fix. It is unlikely related to this problem.
But, I will give it a try, just to make sure.

Comment 4 Luming Yu 2008-01-28 06:38:28 UTC

I don't see the problem with the dd workload on my tiger4:

2.6.9-55:
real 0m1.898s
user 0m0.000s
sys  0m0.507s

2.6.9-67:
real 0m1.908s
user 0m0.001s
sys  0m0.444s

my storage controller are:

06:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)
06:02.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)

Comment 5 Luming Yu 2008-03-11 02:22:27 UTC

I don't know how to make progress about this bug.
Based on the comment#1,4, the driver itself doesn't have regression from two
different testing.

Note You need to log in before you can comment on or make changes to this bug.