Bug 194462

Summary: The Disk I/O is very slow. (Fusion MPT base driver)
Product: Red Hat Enterprise Linux 4 Reporter: Masaji Takeyama <ccs-se>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 13:25:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Masaji Takeyama 2006-06-08 11:25:17 UTC
Description of problem:
The Disk I/O is very slow. (Fusion MPT base driver)
# It happens by the RAID 1 server.
# It doesn't happen in one disk server. 

Version-Release number of selected component (if applicable):
Kernel: 2.6.9-34.ELsmp, 2.6.9-34.0.1.ELsmp
Fusion MPT base driver: 3.02.62.01rh

How reproducible:
Sometimes

Steps to Reproduce:
1. Install RHEL4
2. edit file(/etc/modprobe.conf) & remake initrd
3. reboot system
  
Actual results:
hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads:    6 MB in  3.28 seconds =   1.83 MB/sec

hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads:  250 MB in  3.02 seconds =  82.82 MB/sec


Expected results:


Additional info:
SERVER: PRIMERGY RX 200 S2
(CPU:   Xeon 3.4GHz)

[log of dmesg]
Fusion MPT base driver 3.02.62.01rh
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT FC Host driver 3.02.62.01rh
Fusion MPT SPI Host driver 3.02.62.01rh
ACPI: PCI interrupt 0000:03:05.0[A] -> GSI 18 (level, low) -> IRQ 193
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
scsi0 : ioc0: LSI53C1030, FwRev=01032911h, Ports=1, MaxQ=222, IRQ=193
  Vendor: LSILOGIC  Model: 1030          IM  Rev: 1000
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 585727872 512-byte hdwr sectors (299893 MB)
SCSI device sda: drive cache: write through
SCSI device sda: 585727872 512-byte hdwr sectors (299893 MB)
SCSI device sda: drive cache: write through
 sda: sda1 sda2 sda3 sda4 <<6>mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
mptbase: ioc0:   volume is now optimal, enabled, quiesced
mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0
mptbase: ioc0:   PhysDisk is now online, quiesced
mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1
mptbase: ioc0:   PhysDisk is now online, quiesced
mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
mptbase: ioc0:   volume is now optimal, enabled
mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0
mptbase: ioc0:   PhysDisk is now online
mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1
mptbase: ioc0:   PhysDisk is now online
 sda5 sda6 sda7 sda8 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
mptbase: ioc0:   volume is now optimal, enabled, quiesced
mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0
mptbase: ioc0:   PhysDisk is now online, quiesced
mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1
mptbase: ioc0:   PhysDisk is now online, quiesced
mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
mptbase: ioc0:   volume is now optimal, enabled
mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0
mptbase: ioc0:   PhysDisk is now online
mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1
mptbase: ioc0:   PhysDisk is now online
  Vendor: SDR       Model: GEM318            Rev: 0
  Type:   Processor                          ANSI SCSI revision: 02
mptbase: ioc0: IOCStatus(0x0043): SCSI Device Not There
Fusion MPT SAS Host driver 3.02.62.01rh
Fusion MPT misc device (ioctl) driver 3.02.62.01rh
mptctl: Registered with Fusion MPT base driver
mptctl: /dev/mptctl @ (major,minor=10,220)
Fusion MPT LAN driver 3.02.62.01rh

Comment 2 Jussi Silvennoinen 2006-06-14 19:54:20 UTC
A me-too report. Only affects certain Fujitsu harddrives. -36 kernel has a fix
for this.

Comment 3 Masaji Takeyama 2006-06-15 01:23:55 UTC
(In reply to comment #2)
> A me-too report. Only affects certain Fujitsu harddrives. -36 kernel has a fix
> for this.
Is "-36 kernel" kernel 2.6.9-36.x.x ?
Where are rpm and srpm(kernel-2.6.9-36.xx.xx.EL.src.rpm) ?

The kernel update kit(Fujitsu's) was separately an answer of necessity to use 
the inquiry 2.6.9-34.ELsmp(2.6.9-34.0.1.ELsmp) for Fujitsu. 
It seems to be released from Fujitsu in a few days. 
I will test it. 

I checked document(released-drivers.xls) "Controller and Driver for PRIMERGY
Server" from Fujitsu Siemens Computer. 
Kernel 2.6.9-34.EL(Update 3) was "3.02.62.01rh native" at "LSI 53C1020/53C1030
Dual IME(mptbase/mptscsih)".
#(lspci -v)
# 03:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)
#        Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1041
#        Flags: bus master, 66Mhz, medium devsel, latency 128, IRQ 193
#        I/O ports at 2000 [size=256]
#        Memory at dd210000 (64-bit, non-prefetchable) [size=64K]
#        Memory at dd200000 (64-bit, non-prefetchable) [size=64K]
#        Capabilities: [50] Power Management version 2
#        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
#        Capabilities: [68] PCI-X non-bridge device.

Comment 4 Masaji Takeyama 2006-06-15 07:51:05 UTC
(In reply to comment #2)
> A me-too report. Only affects certain Fujitsu harddrives. -36 kernel has a fix
> for this.
I found and tested new kernel(kernel-2.6.9-37.EL).
# http://people.redhat.com/~jbaron/rhel4/RPMS.kernel/

However, the problem was not solved. 


I think that the MPT driver has the cause of the problem. 
# The kernel 2.6.9-34.0.1 and kernel 2.6.9-37 seem the same MPT drivers.

# strings /lib/modules/2.6.9-37.ELsmp/kernel/drivers/message/fusion/mptbase.ko|
grep "version="
version=3.02.62.01rh C7256E37EDB7163450F426D

# strings /lib/modules/2.6.9-37.EL/kernel/drivers/message/fusion/mptbase.ko|
grep "version="
version=3.02.62.01rh C7256E37EDB7163450F426D

# strings /lib/modules/2.6.9-34.0.1.EL/kernel/drivers/message/fusion/mptbase.ko|
grep "version="
version=3.02.62.01rh C7256E37EDB7163450F426D

# strings
/lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/message/fusion/mptbase.ko| grep
"version="
version=3.02.62.01rh C7256E37EDB7163450F426D


Comment 5 Jussi Silvennoinen 2006-06-15 12:52:58 UTC
Seems you're right, I retested with -36 and -37 saw no improvement. Strange, I
could swear I saw normal speed with -36 some weeks ago. Never put any box to
production with it however.

I've seen this only with Fujitsu MAW-series drives. Older MAP's are unaffected,
hitachi's and maxtor's too.

Comment 6 Masaji Takeyama 2006-06-16 01:08:32 UTC
(In reply to comment #5)
> Seems you're right, I retested with -36 and -37 saw no improvement. Strange, I
> could swear I saw normal speed with -36 some weeks ago. Never put any box to
> production with it however.
I think the problem not to be found easily from the following two points. 
 1. The problem doesn't always happen.
 2. It happens by the server of RAID1(hardware). 
    (Another RAID system(RAID0) is not trying me. )

Please look at ChangeLog of kernel 2.6.17-rc4. 
Some Fusion MPT driver is improved. 
# http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.17-rc4

I think that the problem might be solved if Fusion MPT driver can be 
made the latest. 



Comment 7 Masaji Takeyama 2006-06-29 03:43:00 UTC
(In reply to comment #6)

> I think that the problem might be solved if Fusion MPT driver can be 
> made the latest. 
I tested again with some kernel. 
# The parameter of grub was variously changed. 
#(/boot/grub/menu.lst(hiddenmenu, timeout))

I did new discoveries.
* The kernel is chosen by the menu of grub and "Enter" key is pushed.
* The "Enter" key is pushed by the menu of grub.
  (With the default entry maintained)
The problem did not occur for these. 

The problem occurs when completely automating it.
 (When you do not input the key at all. )


kernel version        Action                   Result
---------------------------------------------------------
2.6.9-34.0.1.ELsmp    FullAuto(100%)           NG
2.6.9-34.0.1.ELsmp    "Enter" key is pushed.   OK

2.6.9-37.ELsmp        FullAuto(100%)           NG
2.6.9-37.ELsmp       "Enter" key is pushed.    OK

2.6.9-39.ELsmp        FullAuto(100%)           NG
2.6.9-39.ELsmp       "Enter" key is pushed.    OK


Please teach that there is a method of automatically sending "Enter" 
key code by the menu of grub. 


Comment 8 Jussi Silvennoinen 2006-07-03 05:47:10 UTC
Just tried with -40 kernel and got good results.

# time sh -c "dd if=/dev/zero of=foo bs=1M count=500 ; sync"
500+0 records in
500+0 records out

real    0m8.454s
user    0m0.000s
sys     0m1.959s

Booting back to -34.0.1 and got the very nice ~4mins.
 

Comment 9 Jussi Silvennoinen 2006-07-03 06:24:03 UTC
There's something rather weird going on, now it's dead slow again with -40 kernel.


Comment 10 Masaji Takeyama 2006-07-04 01:46:05 UTC
(In reply to comment #9)
> There's something rather weird going on, now it's dead slow again with -40 kernel.
The problem might be able to be evaded by taking the following procedures. 
1) Kernel uppdate(-40 kernel)
2) System reboot
3) initrd(initrd-2.6.9-40.ELsmp.img) is made again. 
  3)-1  depmod -a;
  3)-2  uname -r;
  3)-3  mkinitrd -f -v /boot/initrd-2.6.9-40.ELsmp.img 2.6.9-40.ELsmp;
4) System reboot again
# The "Enter" key is pushed by the menu of grub.

#####-----
# uname -r
2.6.9-40.ELsmp

# hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads:  250 MB in  3.00 seconds =  83.21 MB/sec
#####-----




Comment 11 Jussi Silvennoinen 2006-07-24 09:18:35 UTC
Got 2.6.17-1.2366.fc6 from rawhide (Fusion MPT base driver 3.04.00), performance
seems fine with it after about a week of continous dd-loop.


Comment 12 Masaji Takeyama 2006-07-25 09:14:22 UTC
(In reply to comment #3)

> The kernel update kit(Fujitsu's) was separately an answer of necessity to use 
> the inquiry 2.6.9-34.ELsmp(2.6.9-34.0.1.ELsmp) for Fujitsu. 
> It seems to be released from Fujitsu in a few days. 
> I will test it. 
The problem has not been improved though I tested with the update kit 
from Fujitsu. 
(kernel-2.6.9-34.0.1 +  update kit(2.6.9-34.0.1.EL))

# rpm -V kernel-smp-2.6.9-34.0.1.EL
missing     /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/aic7xxx/aic7xxx.ko
missing     /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/lpfc/lpfc.ko
SM5....T    /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/qla2xxx/qla2300.ko
SM5....T    /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/qla2xxx/qla2322.ko
SM5....T    /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/qla2xxx/qla2xxx.ko

#Contents are Japanese. (Day of opening to the public: July 6,2006(v1.0l20))
http://www.fmworld.net/cgi-bin/drviasearch/drviadownload.cgi?DRIVER_NUM=F1004315



Comment 13 Masaji Takeyama 2006-07-25 09:46:24 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > There's something rather weird going on, now it's dead slow again with -40
kernel.
It tried again. 
The result of the test by 40(kernel-hugemem-2.6.9-40.EL) was as follows. 
#Even -42(kernel-smp-2.6.9-42.EL) was not good. 

As for the result of the test, putting the return key seemed to become better. 
#The reason is not understood.

kernel version               Action                        Result
-----------------------------------------------------------------------
(Pattern 1)
kernel-hugemem-2.6.9-40.EL   reboot(nothing(timeout=30))   NG - 3/OK -2

(Pattern 2)
kernel-hugemem-2.6.9-40.EL   "Enter" key is pushed.        NG - 2/OK -3
#It waits with grub menu for 28 seconds. 
#(It remains, kernel-hugemem-2.6.9-40.EL is selected at two seconds, and start. )


##(Pattern 1)
#(1)
# uname -r
2.6.9-40.ELhugemem
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:    6 MB in  3.27 seconds =   1.83 MB/sec
#(2)
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:    6 MB in  3.31 seconds =   1.81 MB/sec
#(3)
 hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:    6 MB in  3.28 seconds =   1.83 MB/sec
#(4)
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:  248 MB in  3.01 seconds =  82.46 MB/sec
#(5)
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:  242 MB in  3.02 seconds =  80.25 MB/sec

##(Pattern 2)
#(1)
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:  246 MB in  3.01 seconds =  81.66 MB/sec
#(2)
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:  246 MB in  3.01 seconds =  81.63 MB/sec
#(3)
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:    6 MB in  3.48 seconds =   1.73 MB/sec
#(4)
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:    6 MB in  3.26 seconds =   1.84 MB/sec
#(5)
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads:  244 MB in  3.00 seconds =  81.21 MB/sec


Comment 14 Masaji Takeyama 2006-07-25 10:07:50 UTC
(In reply to comment #11)
> Got 2.6.17-1.2366.fc6 from rawhide (Fusion MPT base driver 3.04.00), performance
> seems fine with it after about a week of continous dd-loop.

I tried.
# Development version of Fedora(2.6.17-1.2437.fc6)
# New driver 3.04.xx(Fusion MPT base driver 3.04.01)

It seemed to be good. 


I thought that there were the following two points as a noted point. 
 + Whether GAM(Global Array Manager) operated correctly or not?
 + Driver's structure had changed. 

# ls -l /lib/modules/2.6.9-34.0.2.ELsmp/kernel/drivers/message/fusion
total 264
-rwxr--r--  1 root root 71364 Jun 30 23:53 mptbase.ko
-rwxr--r--  1 root root 41792 Jun 30 23:53 mptctl.ko
-rwxr--r--  1 root root 15756 Jun 30 23:53 mptfc.ko
-rwxr--r--  1 root root 21476 Jun 30 23:53 mptlan.ko
-rwxr--r--  1 root root 16828 Jun 30 23:53 mptsas.ko
-rwxr--r--  1 root root 49392 Jun 30 23:53 mptscsi.ko
-rwxr--r--  1 root root  6760 Jun 30 23:53 mptscsih.ko
-rwxr--r--  1 root root 16600 Jun 30 23:53 mptspi.ko

# ls -l /lib/modules/2.6.17-1.2437.fc6/kernel/drivers/message/fusion
total 252
-rwxr--r--  1 root root 69248 Jul 23 05:47 mptbase.ko
-rwxr--r--  1 root root 33020 Jul 23 05:47 mptctl.ko
-rwxr--r--  1 root root 24104 Jul 23 05:47 mptfc.ko
-rwxr--r--  1 root root 23284 Jul 23 05:47 mptlan.ko
-rwxr--r--  1 root root 35548 Jul 23 05:47 mptsas.ko
-rwxr--r--  1 root root 31416 Jul 23 05:47 mptscsih.ko
-rwxr--r--  1 root root 25332 Jul 23 05:47 mptspi.ko




Comment 15 Jussi Silvennoinen 2006-08-03 12:07:11 UTC
It's restructuring the driver from LSI.

RH made some changes to the driver they put to RHEL-kernels to avoid support
nightmare. Without it, anybody updating to -34 from -22 kernel would have to
change their modprobe.conf:

alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptscsih

-->

alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptspi

Otherwise they'll end up with a kernel panic.
Fedora-kernels don't have this helpful patch. Nor should they.


Comment 16 Jussi Silvennoinen 2006-08-17 12:37:06 UTC
Do a "sg_reset -h /dev/sda" and see everything almost as fast as with
-22.0.2.ELsmp. Tested with -42 (official u4-kernel).



Comment 17 Masaji Takeyama 2006-08-18 00:31:05 UTC
(In reply to comment #16)
> Do a "sg_reset -h /dev/sda" and see everything almost as fast as with
> -22.0.2.ELsmp. Tested with -42 (official u4-kernel).
Thanks. 
Some servers were given to -42. 
# The server with the problem has not been given to -42(U4) yet. 
# At least one schedules to do at the end of this week. 
# (I think that it completes it at latest at the end of next week. )

If the result of sg_reset avoids it, I adopt it. 
# After the system starts, the execution script is added. 

I will report on the result of the test. 


Comment 18 Masaji Takeyama 2006-08-21 10:21:11 UTC
(In reply to comment #16)
> Do a "sg_reset -h /dev/sda" and see everything almost as fast as with
> -22.0.2.ELsmp. Tested with -42 (official u4-kernel).
I tested with -34.0.2.ELsmp. 
And, it was confirmed that it was effective. 

Is it good when doing according to which timing if "sg_reset -b /dev/sda" is
executed?

I think that I become timing that the stage when write ended or write starts.
Concretely, I think that I am just behind "Remount the root filesystem
read-write" in rc.sysinit(/etc/rc.d/rc.sysinit). 



Comment 19 Masaji Takeyama 2006-08-25 04:27:00 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > Do a "sg_reset -h /dev/sda" and see everything almost as fast as with
> > -22.0.2.ELsmp. Tested with -42 (official u4-kernel).
> I tested with -34.0.2.ELsmp. 
> And, it was confirmed that it was effective. 
I tested with -42.0.2.ELsmp. It was fine.

I changed rc.sysinit(/etc/rc.d/rc.sysinit). 
"sg_reset" was executed before "Remount the root filesystem read-write"
was done. 

The change point is as follows. 
# diff rc.sysinit.org  rc.sysinit
443a444,449
> # SCSI host adapter reset (for 53C1030)
> if [ -x /usr/bin/sg_reset ]; then
> ##    /usr/bin/sg_reset -h /dev/sda >/dev/null 2>&1
>     /usr/bin/sg_reset -h /dev/sda
> fi
>


I was able to find the method of evading the problem. 
How does a fundamental problem solving in ES4 become it?
# I think that the driver of MPT has to reset "Host adapter" neatly. 


Comment 20 Jiri Pallich 2012-06-20 13:25:23 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.