Bug 194462
Summary: | The Disk I/O is very slow. (Fusion MPT base driver) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Masaji Takeyama <ccs-se> |
Component: | kernel | Assignee: | Tom Coughlan <coughlan> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | jbaron |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 13:25:23 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Masaji Takeyama
2006-06-08 11:25:17 UTC
A me-too report. Only affects certain Fujitsu harddrives. -36 kernel has a fix for this. (In reply to comment #2) > A me-too report. Only affects certain Fujitsu harddrives. -36 kernel has a fix > for this. Is "-36 kernel" kernel 2.6.9-36.x.x ? Where are rpm and srpm(kernel-2.6.9-36.xx.xx.EL.src.rpm) ? The kernel update kit(Fujitsu's) was separately an answer of necessity to use the inquiry 2.6.9-34.ELsmp(2.6.9-34.0.1.ELsmp) for Fujitsu. It seems to be released from Fujitsu in a few days. I will test it. I checked document(released-drivers.xls) "Controller and Driver for PRIMERGY Server" from Fujitsu Siemens Computer. Kernel 2.6.9-34.EL(Update 3) was "3.02.62.01rh native" at "LSI 53C1020/53C1030 Dual IME(mptbase/mptscsih)". #(lspci -v) # 03:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) # Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1041 # Flags: bus master, 66Mhz, medium devsel, latency 128, IRQ 193 # I/O ports at 2000 [size=256] # Memory at dd210000 (64-bit, non-prefetchable) [size=64K] # Memory at dd200000 (64-bit, non-prefetchable) [size=64K] # Capabilities: [50] Power Management version 2 # Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- # Capabilities: [68] PCI-X non-bridge device. (In reply to comment #2) > A me-too report. Only affects certain Fujitsu harddrives. -36 kernel has a fix > for this. I found and tested new kernel(kernel-2.6.9-37.EL). # http://people.redhat.com/~jbaron/rhel4/RPMS.kernel/ However, the problem was not solved. I think that the MPT driver has the cause of the problem. # The kernel 2.6.9-34.0.1 and kernel 2.6.9-37 seem the same MPT drivers. # strings /lib/modules/2.6.9-37.ELsmp/kernel/drivers/message/fusion/mptbase.ko| grep "version=" version=3.02.62.01rh C7256E37EDB7163450F426D # strings /lib/modules/2.6.9-37.EL/kernel/drivers/message/fusion/mptbase.ko| grep "version=" version=3.02.62.01rh C7256E37EDB7163450F426D # strings /lib/modules/2.6.9-34.0.1.EL/kernel/drivers/message/fusion/mptbase.ko| grep "version=" version=3.02.62.01rh C7256E37EDB7163450F426D # strings /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/message/fusion/mptbase.ko| grep "version=" version=3.02.62.01rh C7256E37EDB7163450F426D Seems you're right, I retested with -36 and -37 saw no improvement. Strange, I could swear I saw normal speed with -36 some weeks ago. Never put any box to production with it however. I've seen this only with Fujitsu MAW-series drives. Older MAP's are unaffected, hitachi's and maxtor's too. (In reply to comment #5) > Seems you're right, I retested with -36 and -37 saw no improvement. Strange, I > could swear I saw normal speed with -36 some weeks ago. Never put any box to > production with it however. I think the problem not to be found easily from the following two points. 1. The problem doesn't always happen. 2. It happens by the server of RAID1(hardware). (Another RAID system(RAID0) is not trying me. ) Please look at ChangeLog of kernel 2.6.17-rc4. Some Fusion MPT driver is improved. # http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.17-rc4 I think that the problem might be solved if Fusion MPT driver can be made the latest. (In reply to comment #6) > I think that the problem might be solved if Fusion MPT driver can be > made the latest. I tested again with some kernel. # The parameter of grub was variously changed. #(/boot/grub/menu.lst(hiddenmenu, timeout)) I did new discoveries. * The kernel is chosen by the menu of grub and "Enter" key is pushed. * The "Enter" key is pushed by the menu of grub. (With the default entry maintained) The problem did not occur for these. The problem occurs when completely automating it. (When you do not input the key at all. ) kernel version Action Result --------------------------------------------------------- 2.6.9-34.0.1.ELsmp FullAuto(100%) NG 2.6.9-34.0.1.ELsmp "Enter" key is pushed. OK 2.6.9-37.ELsmp FullAuto(100%) NG 2.6.9-37.ELsmp "Enter" key is pushed. OK 2.6.9-39.ELsmp FullAuto(100%) NG 2.6.9-39.ELsmp "Enter" key is pushed. OK Please teach that there is a method of automatically sending "Enter" key code by the menu of grub. Just tried with -40 kernel and got good results. # time sh -c "dd if=/dev/zero of=foo bs=1M count=500 ; sync" 500+0 records in 500+0 records out real 0m8.454s user 0m0.000s sys 0m1.959s Booting back to -34.0.1 and got the very nice ~4mins. There's something rather weird going on, now it's dead slow again with -40 kernel. (In reply to comment #9) > There's something rather weird going on, now it's dead slow again with -40 kernel. The problem might be able to be evaded by taking the following procedures. 1) Kernel uppdate(-40 kernel) 2) System reboot 3) initrd(initrd-2.6.9-40.ELsmp.img) is made again. 3)-1 depmod -a; 3)-2 uname -r; 3)-3 mkinitrd -f -v /boot/initrd-2.6.9-40.ELsmp.img 2.6.9-40.ELsmp; 4) System reboot again # The "Enter" key is pushed by the menu of grub. #####----- # uname -r 2.6.9-40.ELsmp # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 250 MB in 3.00 seconds = 83.21 MB/sec #####----- Got 2.6.17-1.2366.fc6 from rawhide (Fusion MPT base driver 3.04.00), performance seems fine with it after about a week of continous dd-loop. (In reply to comment #3) > The kernel update kit(Fujitsu's) was separately an answer of necessity to use > the inquiry 2.6.9-34.ELsmp(2.6.9-34.0.1.ELsmp) for Fujitsu. > It seems to be released from Fujitsu in a few days. > I will test it. The problem has not been improved though I tested with the update kit from Fujitsu. (kernel-2.6.9-34.0.1 + update kit(2.6.9-34.0.1.EL)) # rpm -V kernel-smp-2.6.9-34.0.1.EL missing /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/aic7xxx/aic7xxx.ko missing /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/lpfc/lpfc.ko SM5....T /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/qla2xxx/qla2300.ko SM5....T /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/qla2xxx/qla2322.ko SM5....T /lib/modules/2.6.9-34.0.1.ELsmp/kernel/drivers/scsi/qla2xxx/qla2xxx.ko #Contents are Japanese. (Day of opening to the public: July 6,2006(v1.0l20)) http://www.fmworld.net/cgi-bin/drviasearch/drviadownload.cgi?DRIVER_NUM=F1004315 (In reply to comment #10) > (In reply to comment #9) > > There's something rather weird going on, now it's dead slow again with -40 kernel. It tried again. The result of the test by 40(kernel-hugemem-2.6.9-40.EL) was as follows. #Even -42(kernel-smp-2.6.9-42.EL) was not good. As for the result of the test, putting the return key seemed to become better. #The reason is not understood. kernel version Action Result ----------------------------------------------------------------------- (Pattern 1) kernel-hugemem-2.6.9-40.EL reboot(nothing(timeout=30)) NG - 3/OK -2 (Pattern 2) kernel-hugemem-2.6.9-40.EL "Enter" key is pushed. NG - 2/OK -3 #It waits with grub menu for 28 seconds. #(It remains, kernel-hugemem-2.6.9-40.EL is selected at two seconds, and start. ) ##(Pattern 1) #(1) # uname -r 2.6.9-40.ELhugemem # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 6 MB in 3.27 seconds = 1.83 MB/sec #(2) # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 6 MB in 3.31 seconds = 1.81 MB/sec #(3) hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 6 MB in 3.28 seconds = 1.83 MB/sec #(4) # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 248 MB in 3.01 seconds = 82.46 MB/sec #(5) # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 242 MB in 3.02 seconds = 80.25 MB/sec ##(Pattern 2) #(1) # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 246 MB in 3.01 seconds = 81.66 MB/sec #(2) # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 246 MB in 3.01 seconds = 81.63 MB/sec #(3) # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 6 MB in 3.48 seconds = 1.73 MB/sec #(4) # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 6 MB in 3.26 seconds = 1.84 MB/sec #(5) # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 244 MB in 3.00 seconds = 81.21 MB/sec (In reply to comment #11) > Got 2.6.17-1.2366.fc6 from rawhide (Fusion MPT base driver 3.04.00), performance > seems fine with it after about a week of continous dd-loop. I tried. # Development version of Fedora(2.6.17-1.2437.fc6) # New driver 3.04.xx(Fusion MPT base driver 3.04.01) It seemed to be good. I thought that there were the following two points as a noted point. + Whether GAM(Global Array Manager) operated correctly or not? + Driver's structure had changed. # ls -l /lib/modules/2.6.9-34.0.2.ELsmp/kernel/drivers/message/fusion total 264 -rwxr--r-- 1 root root 71364 Jun 30 23:53 mptbase.ko -rwxr--r-- 1 root root 41792 Jun 30 23:53 mptctl.ko -rwxr--r-- 1 root root 15756 Jun 30 23:53 mptfc.ko -rwxr--r-- 1 root root 21476 Jun 30 23:53 mptlan.ko -rwxr--r-- 1 root root 16828 Jun 30 23:53 mptsas.ko -rwxr--r-- 1 root root 49392 Jun 30 23:53 mptscsi.ko -rwxr--r-- 1 root root 6760 Jun 30 23:53 mptscsih.ko -rwxr--r-- 1 root root 16600 Jun 30 23:53 mptspi.ko # ls -l /lib/modules/2.6.17-1.2437.fc6/kernel/drivers/message/fusion total 252 -rwxr--r-- 1 root root 69248 Jul 23 05:47 mptbase.ko -rwxr--r-- 1 root root 33020 Jul 23 05:47 mptctl.ko -rwxr--r-- 1 root root 24104 Jul 23 05:47 mptfc.ko -rwxr--r-- 1 root root 23284 Jul 23 05:47 mptlan.ko -rwxr--r-- 1 root root 35548 Jul 23 05:47 mptsas.ko -rwxr--r-- 1 root root 31416 Jul 23 05:47 mptscsih.ko -rwxr--r-- 1 root root 25332 Jul 23 05:47 mptspi.ko It's restructuring the driver from LSI. RH made some changes to the driver they put to RHEL-kernels to avoid support nightmare. Without it, anybody updating to -34 from -22 kernel would have to change their modprobe.conf: alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptscsih --> alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptspi Otherwise they'll end up with a kernel panic. Fedora-kernels don't have this helpful patch. Nor should they. Do a "sg_reset -h /dev/sda" and see everything almost as fast as with -22.0.2.ELsmp. Tested with -42 (official u4-kernel). (In reply to comment #16) > Do a "sg_reset -h /dev/sda" and see everything almost as fast as with > -22.0.2.ELsmp. Tested with -42 (official u4-kernel). Thanks. Some servers were given to -42. # The server with the problem has not been given to -42(U4) yet. # At least one schedules to do at the end of this week. # (I think that it completes it at latest at the end of next week. ) If the result of sg_reset avoids it, I adopt it. # After the system starts, the execution script is added. I will report on the result of the test. (In reply to comment #16) > Do a "sg_reset -h /dev/sda" and see everything almost as fast as with > -22.0.2.ELsmp. Tested with -42 (official u4-kernel). I tested with -34.0.2.ELsmp. And, it was confirmed that it was effective. Is it good when doing according to which timing if "sg_reset -b /dev/sda" is executed? I think that I become timing that the stage when write ended or write starts. Concretely, I think that I am just behind "Remount the root filesystem read-write" in rc.sysinit(/etc/rc.d/rc.sysinit). (In reply to comment #18) > (In reply to comment #16) > > Do a "sg_reset -h /dev/sda" and see everything almost as fast as with > > -22.0.2.ELsmp. Tested with -42 (official u4-kernel). > I tested with -34.0.2.ELsmp. > And, it was confirmed that it was effective. I tested with -42.0.2.ELsmp. It was fine. I changed rc.sysinit(/etc/rc.d/rc.sysinit). "sg_reset" was executed before "Remount the root filesystem read-write" was done. The change point is as follows. # diff rc.sysinit.org rc.sysinit 443a444,449 > # SCSI host adapter reset (for 53C1030) > if [ -x /usr/bin/sg_reset ]; then > ## /usr/bin/sg_reset -h /dev/sda >/dev/null 2>&1 > /usr/bin/sg_reset -h /dev/sda > fi > I was able to find the method of evading the problem. How does a fundamental problem solving in ES4 become it? # I think that the driver of MPT has to reset "Host adapter" neatly. Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |