Bug 653689 - LSI SAS1068E PCI-Express Fusion-MPT controller not working properly with the standard kernel
Summary: LSI SAS1068E PCI-Express Fusion-MPT controller not working properly with the ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-15 22:46 UTC by Andrew J. Schorr
Modified: 2011-06-28 10:34 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-28 10:34:31 UTC


Attachments (Terms of Use)

Description Andrew J. Schorr 2010-11-15 22:46:13 UTC
Description of problem:
In /var/log/messages, we see this message many times:
Nov 15 08:04:01 ti23 kernel: [ID kern.info] [28041.220666] mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort}, SubCode(0x3000)
Eventually, the system becomes so sluggish that it is unusable.

Version-Release number of selected component (if applicable):
kernel-2.6.34.7-61.fc13.x86_64
Fusion MPT base driver 3.04.14
Fusion MPT SAS Host driver 3.04.14

How reproducible:
Do some I/O and wait for the system to hang

Steps to Reproduce:
1. Boot current Fedora 13 kernel on a system using the LSI SAS1068E controller to speak to hard disks.
2. Do some I/O.
3. After a few hours (sometimes faster), the system becomes unresponsive,
although Alt-Sysrequest works for rebooting it.
  
Actual results:
Disk I/O bandwidth becomes incredibly slow and causes the system to stop responding.


Expected results:
A working system.

Additional info:
I have downloaded the newer 4.24.00.00 driver from LSI's web site and patched that into the kernel.  I hope this may solve the problem.
http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html

Comment 1 Andrew J. Schorr 2010-11-19 00:40:29 UTC
FYI, the 4.24.00.00 driver has solved the problem for me.  The system is now stable. Is there a reason it's desirable to keep such an old version in the kernel source tree?

Comment 2 Andrew J. Schorr 2010-11-22 23:22:00 UTC
After 5.5 days, the new kernel started exhibiting similar problems, so I'm afraid the new driver did not fix this completely.  I suppose it could be a hardware problem, although the system ran stably under Fedora Core 6 for years.  Here are the messages we are seeing:

Nov 21 05:52:23 ti23 kernel: [ID kern.info] mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort}, SubCode(0x3000)
Nov 21 05:52:23 ti23 kernel: [ID kern.info] mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort}, SubCode(0x3000)
Nov 21 05:52:23 ti23 kernel: [ID kern.info] mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort}, SubCode(0x3000)
Nov 21 05:52:23 ti23 kernel: [ID kern.info] mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort}, SubCode(0x3000)
Nov 21 05:52:54 ti23 kernel: [ID kern.info] mptscsih: ioc0: attempting task abort! (sc=ffff88000ab48800)
Nov 21 05:52:54 ti23 kernel: [ID kern.info] sd 4:0:9:0: [sdj] CDB: Read(10): 28 00 41 2e 4b 3f 00 01 00 00
Nov 21 05:52:55 ti23 kernel: [ID kern.info] mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Nov 21 05:52:55 ti23 kernel: [ID kern.info] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88000ab48800)
Nov 21 05:52:55 ti23 kernel: [ID kern.info] mptscsih: ioc0: attempting task abort! (sc=ffff880025460200)
Nov 21 05:52:55 ti23 kernel: [ID kern.info] sd 4:0:9:0: [sdj] CDB: Read(10): 28 00 41 2e 45 3f 00 01 00 00
Nov 21 05:52:55 ti23 kernel: [ID kern.info] mptscsih: ioc0: task abort: SUCCESS (sc=ffff880025460200)
...

Comment 3 Andrew J. Schorr 2010-11-22 23:23:39 UTC
And also the kernel is reporting a hung task in the mdadm resync as follows:

Nov 21 06:04:36 ti23 kernel: [ID kern.info] mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort}, SubCode(0x3000)
Nov 21 06:04:36 ti23 kernel: [ID kern.info] mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort}, SubCode(0x3000)
Nov 21 06:04:53 ti23 kernel: [ID kern.err] INFO: task md127_resync:29843 blocked for more than 120 seconds.
Nov 21 06:04:53 ti23 kernel: [ID kern.err] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 21 06:04:53 ti23 kernel: [ID kern.info] md127_resync  D 0000000000000002     0 29843      2 0x00000080
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] ffff880049505b80 0000000000000046 ffff880049505b30 ffffffffa018784f
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] ffff880049505fd8 ffff88007aa49770 00000000000153c0 ffff880049505fd8
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] 00000000000153c0 00000000000153c0 00000000000153c0 00000000000153c0
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] Call Trace:
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffffa018784f>] ? unplug_slaves+0x7f/0xb9 [raid456]
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffffa0187c3e>] get_active_stripe+0x2bc/0x63e [raid456]
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff8104818d>] ? default_wake_function+0x0/0x14
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffffa018b4e6>] sync_request+0x257/0x2e3 [raid456]
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff8135d5e8>] ? is_mddev_idle+0xae/0x102
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff8135dd75>] md_do_sync+0x739/0xb47
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff81066153>] ? autoremove_wake_function+0x0/0x39
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff8135ea80>] md_thread+0xf6/0x114
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff8135e98a>] ? md_thread+0x0/0x114
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff81065cd9>] kthread+0x7f/0x87
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff81065c5a>] ? kthread+0x0/0x87
Nov 21 06:04:53 ti23 kernel: [ID kern.warn] [<ffffffff8100aa60>] ? kernel_thread_helper+0x0/0x10
Nov 21 06:05:07 ti23 kernel: [ID kern.info] mptscsih: ioc0: attempting task abort! (sc=ffff880077583300)
Nov 21 06:05:07 ti23 kernel: [ID kern.info] sd 4:0:7:0: [sdh] CDB: Read(10): 28 00 41 2f 96 c7 00 00 78 00
Nov 21 06:05:08 ti23 kernel: [ID kern.info] mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Nov 21 06:05:08 ti23 kernel: [ID kern.info] mptscsih: ioc0: task abort: SUCCESS (sc=ffff880077583300)
Nov 21 06:05:08 ti23 kernel: [ID kern.info] mptscsih: ioc0: attempting task abort! (sc=ffff8800054c0200)
Nov 21 06:05:08 ti23 kernel: [ID kern.info] sd 4:0:7:0: [sdh] CDB: Read(10): 28 00 41 2f 96 af 00 00 18 00
Nov 21 06:05:08 ti23 kernel: [ID kern.info] mptscsih: ioc0: task abort: SUCCESS (sc=ffff8800054c0200)

Comment 4 Bug Zapper 2011-05-30 13:43:30 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 5 Bug Zapper 2011-06-28 10:34:31 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.