Bug 493093

Summary: Old mptsas driver in RHEL 5.4/ Fedora 12 /RHEL 6 Beta
Product: [Fedora] Fedora Reporter: Max E <max>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 13CC: admin, akrherz, bernhard.kohl, bruno, codehotter, d.bz-redhat, ijones, jimk, jwboyer, kernel-maint, pasik, pmarciniak, pvogel, shigorin
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-27 14:08:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Source Code to replace aging mptsas driver - if you can get it to compile!
none
Port LSI driver to 2.6.27
none
Port LSI driver to 2.6.29
none
Version 4.33 of the LSI MegaRAID SAS drivers none

Description Max E 2009-03-31 15:52:15 UTC
Description of problem:
I hope this will not be a problem, but this is more for information rather than anything else.  I have noticed that the kernel driver for the mptsas module is rather old.  modinfo mptsas report that the driver version is 3.04.07 (which is the same as Redhat Enterprise 5.2).  LSI have been running with version 4 drivers for quite a while now, and whilst this may not seem to be a problem, there are a couple of race conditions that are reported when the SCSI card becomes stressed, which are fixed in newer versions of the driver.  I have a couple of LSI SAS1068E cards at work.

Version-Release number of selected component (if applicable):
Kernel Version: 2.6.27.19-78.2.30.fc9.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:

Driver should be running around the version 4.00.43 (LSI versioning)

Additional info:

Not keen to taint the current kernel; hopefully either Redhat is running a backported version of code from LSI or the version numbers are very different.

Comment 1 Chuck Ebbert 2009-04-01 15:24:02 UTC
The driver looks to be basically unmaintained in the upstream kernel: they fix only severe bugs AFAICT. If you have access to LSI support you might want to ask them why they don't keep the driver up-to-date.

Comment 2 Max E 2009-04-03 10:38:43 UTC
I have emailed LSI and got some responses back.

>We have a relationship with kernel.org and do work to resolve issues 
> when they arise and are reported.
>  
> Please note that driver certification is not instantaneous, and we do 
> not support all versions of Linux.  We have limited kernel support and 
> provide direct fixes for customers who work in supported 
> configurations.  Please accept my apologies, but we are unable to 
> provide source code for our proprietary drivers.

When pushed to provide the latest code to kernel.org, I then got the following back.

>I am unable to guarantee any timeframe for newer drivers to be integrated into >later kernels.

Can you guys try and see whether anybody in Kernel.org actually has contact with LSI?  I don't think I am going to get very far from the user perspective.

Comment 3 Bernhard Kohl 2009-04-07 12:29:21 UTC
Actually I have a problem with this old mptsas driver. On some of our systems, running Fedora 8 (2.6.26.8) or Fedora 10 (2.6.27.21), the hard disk goes offline every few days or even hours.

At LSI they have MPT driver packages for RHEL 4 (mptlinux-3.13.04.00-2) and RHEL 5 (mptlinux-4.00.43.00-1).

In the release notes of the mptlinux-3.13.04.00-2 there is already the following defect fix, which should solve my problem:

SCGCQ00019660: When the diag reset is issued through lsitutil, the driver is not clearing the pending I/O requests and hence the requests got timed out and leads to error recovery. The error recovery actions  may lead to offlining of the device. Also the message frame allocated for event notification is not released when the diag reset is issued which leads to failure of message frame allocation after N number of diag resets. Both the issues are fixed in this version of the driver.

I tried to compile both drivers on Fedora 10, but both got compiler errors. Some of the scsi structures have been changed.

It would be good to get this driver to the upstream kernel.

My systems:

lspci -nn
...
05:00.0 SCSI storage controller [0100]: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS [1000:0056] (rev 02)
...

dmesg
...
mptscsih: ioc0: attempting task abort! (sc=f5310500)
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 07 65 e3 99 00 00 08 00
mptscsih: ioc0: WARNING - TM Handler for type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING -  Issuing HardReset!!
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
sd 0:0:0:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 4, sc=f5310500, mf = f59e8500, idx=ba
sd 0:0:0:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 4, sc=f5310700, mf = f59e9d00, idx=ea
sd 0:0:0:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 4, sc=f5310800, mf = f59e9d80, idx=eb
mptbase: ioc0: WARNING - ResetHistory bit failed to clear!
mptbase: ioc0: ERROR - Diagnostic reset FAILED! (ffffffffh)
mptbase: ioc0: WARNING - NOT READY!
mptbase: ioc0: WARNING - Cannot recover rc = -1!
mptscsih: ioc0: WARNING - TMHandler: HardReset FAILED!!
mptscsih: ioc0: task abort: FAILED (sc=f5310500)
mptscsih: ioc0: attempting task abort! (sc=f5310700)
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 07 65 e4 39 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f5310700)
mptscsih: ioc0: attempting task abort! (sc=f5310700)
sd 0:0:0:0: [sda] CDB: Test Unit Ready: 00 00 00 00 00 00
mptscsih: ioc0: task abort: SUCCESS (sc=f5310700)
mptscsih: ioc0: attempting task abort! (sc=f5310800)
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 07 65 e4 49 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f5310800)
mptscsih: ioc0: attempting task abort! (sc=f5310800)
sd 0:0:0:0: [sda] CDB: Test Unit Ready: 00 00 00 00 00 00
mptscsih: ioc0: task abort: SUCCESS (sc=f5310800)
mptscsih: ioc0: attempting target reset! (sc=f5310500)
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 07 65 e3 99 00 00 08 00
mptscsih: ioc0: target reset: FAILED (sc=f5310500)
mptscsih: ioc0: attempting bus reset! (sc=f5310500)
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 07 65 e3 99 00 00 08 00
mptscsih: ioc0: bus reset: FAILED (sc=f5310500)
mptscsih: ioc0: attempting host reset! (sc=f5310500)
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - ResetHistory bit failed to clear!
mptbase: ioc0: ERROR - Diagnostic reset FAILED! (ffffffffh)
mptbase: ioc0: WARNING - NOT READY!
mptbase: ioc0: WARNING - Cannot recover rc = -1!
mptscsih: ioc0: host reset: FAILED (sc=f5310500)
sd 0:0:0:0: Device offlined - not ready after error recovery
sd 0:0:0:0: Device offlined - not ready after error recovery
sd 0:0:0:0: Device offlined - not ready after error recovery
sd 0:0:0:0: rejecting I/O to offline device
Buffer I/O error on device dm-0, logical block 15464488
lost page write due to I/O error on dm-0
...

Comment 4 Max E 2009-04-18 16:29:17 UTC
Created attachment 340163 [details]
Source Code to replace aging mptsas driver - if you can get it to compile!

Comment 5 Max E 2009-04-18 16:34:35 UTC
It seems that LSI ~do~ infact give out the source-code for this module from their website, so I had a quick stab at it on Fedora 10 x86_64.  You will need to get the kernel sources for your kernel version, and change the kernel BUILD directory to point to where your 'build' directory is located.  This is not so much a point and shoot, you need to run the bash scripts (although I suspect these will need to be modified to get them to work.)

The system complained that some of the kernel configs were invalid when I ran the ./compile bash script, and my knowledge of bodging kernels rather driffed away after kernel version 2.2!  Would it be possible for somebody far more clever than myself to try and get this code to work for FC10?

Comment 6 Bug Zapper 2009-11-18 12:48:35 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Max E 2009-11-18 12:59:33 UTC
Chaps - still no movement with this - still running 3.04.07 on Fedora 11

Comment 8 Bug Zapper 2010-04-27 13:23:05 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Max E 2010-04-28 11:45:42 UTC
Fedora 12 now has version 3.04.12 of the driver.  Still very odd code.  I'm going to keep this open to keep poking RH/Fedora people to try and merge the latest code from LSI.

Comment 10 Emanuel Rietveld 2010-06-17 12:21:46 UTC
Created attachment 424786 [details]
Port LSI driver to 2.6.27

Found this patch here: https://bugzilla.kernel.org/show_bug.cgi?id=12163

Comment 11 Emanuel Rietveld 2010-06-17 12:36:16 UTC
Created attachment 424791 [details]
Port LSI driver to 2.6.29

And I did this one myself. Please review this patch very carefully before attempting to build with it, as I don't really know what I'm doing.

I have managed to build the LSI driver 4.00.43.00 with latest kernel using these two patches

yum install kernel-devel gcc
wget http://www.lsi.com/DistributionSystem/AssetDocument/support/downloads/hbas/sas/software_drivers/linux/MPTLINUX_RHEL5_SLES10_PH14-4.00.43.00-1.zip
unzip MPTLINUX_RHEL5_SLES10_PH14-4.00.43.00-1.zip
cd mptlinux_RHEL5_SLES10_rel
tar xf RHEL5_SLES10.tar.gz
cd srpms-1
rpm2cpio mptlinux-4.00.43.00-1.src.rpm | cpio -idv
tar xf mptlinux-4.00.43.00.tar.gz
patch -p1 < to2.6.27.patch
patch -p1 < to2.6.29.patch
cat Makefile | sed -e "s/\(KERNEL=\)/\1`uname -r`/" > Makefile
ls `uname -r`

I hope it's useful to someone.

Comment 12 Max E 2010-06-18 14:40:57 UTC
Thanks to Emanuel, you ~have~ to down both the patches 2.6.27 and 2.6.29 and apply them to the source...thanks Emanuel.

Emanuel sent this very kind reply when I asked him about the compilation process....

You do not need to change directories. You can download to2.6.27.patch and to2.6.29.patch from the bug report. They update the driver code to be compatible with the api changes in 2.6.27 and 2.6.29 respectively.
After applying those two patches, you can use the sed command in the instructions to update the makefile with the kernel version you are running. If you are compiling for a different kernel, you should install the kernel-devel rpm for that kernel and edit the makefile manually instead. You can compile the driver by running "make." When the command completes, you can find the driver modules in the directory with the same name as the kernel you are compiling for.
 
Let me warn you to backup all data before trying this driver as it is not well tested, especially not with recent kernels.
 
Good luck!

Emanuel


I can confirm that the patch and subsequent code compile in 2.6.32.14-127.fc12.x86_64 - Fedora 12.

I'll give it a go for Fedora 13 and report back!

Comment 13 Pieter Vogel 2010-10-08 13:15:25 UTC
this is still the case in rhel 5.5.
On a Dell 2950 with a lsi SAS1068E raid controller with the latest firmware I get complaints from OMSA tools that the driver is to old
On a Dell T105 I cannot update firmware because the driver is too old. 

Off course I can use the official Dell drivers for it, but these drivers should be in the kernel already

official lsi-driver 4.22.00.00 does nicely compile on rhel 5.5

Comment 14 Pieter Vogel 2010-10-08 13:17:47 UTC
rhel6 beta 2 has also still the old version 3.04.13

Comment 15 Max E 2010-10-28 14:25:00 UTC
Could somebody from Kernel Maintenance give us an idea of what driver we will be expecting in RHEL 6 please?  Current LSI version of the code standards at 4.31.  Could somebody please push a 4.x version of the code into the kernel for us please?

Comment 16 Max E 2010-10-28 14:48:24 UTC
Created attachment 456262 [details]
Version 4.33 of the LSI MegaRAID SAS drivers

Comment 17 Pasi Karkkainen 2010-11-07 21:12:10 UTC
Did someone try that 4.33 version of megaraid_sas with 2.6.32 kernel? 

The default megaraid_sas in 2.6.32.25 seems to be broken (at least on Dell R510 + PERC H700 RAID). Boot fails because the disks are not detected/enabled and driver prints some errors/failures after a while.. so I'm wondering which driver version I should upgrade to.

Comment 18 Pasi Karkkainen 2010-11-07 22:19:37 UTC
I just tried that megaraid_sas v4.33 and it worked OK with Linux 2.6.32.25 kernel!

(the default megaraid_sas driver v4.01 included in 2.6.32.25 is broken, it fails to bring up the disks and the boot fails.)

Comment 19 Max E 2010-11-08 10:59:41 UTC
Pasi,

You might want to raise that issue with the broken driver as another bug; as this bug is more concerned with the version of the drivers.  If the shipped driver is broken, then RH need to have this flag as a high priority and a show-stopper.

Thanks


Max

Comment 20 Bruno Wolff III 2010-11-08 15:10:31 UTC
The problems I see are related to use SMART features. The commands or responses don't seem to be passed through properly. But I don't know if the updated driver would actually fix this issue.

Comment 21 Ian Jones 2010-11-12 18:52:03 UTC
I'm still seeing this problem with the H700 controller.  If your root partition is an array managed by the H700 controller, the system will not boot because it fails to mount the root partition.  I also had this problem with 2.6.32.19 a while ago, and I'm not sure what version it was broken at but I do know that 2.6.27.10 works.  I have tried installing the 4.33 driver version included here in the following way:

make -C /lib/modules/2.6.32.25-grsec/build M=$PWD modules

cp -dp megaraid_sas.ko /lib/modules/2.6.32.25-grsec/kernel/drivers/scsi/megaraid/megaraid_sas.ko

depmod -a 2.6.32.25-grsec

mkinitrd /boot/initrd-2.6.32.25-grsec.img 2.6.32.25-grsec


Still the same problem, can't mount the root partition.
It seems like this should have been and urgent issue for a while now - The Dell H700/LSI MegaSAS 9260 is really common and lots of people are looking to update their kernels with all of the recent security problems.

Comment 22 Pasi Karkkainen 2010-11-12 18:58:52 UTC
v4.33 works for me with H700, with Linux 2.6.32.25.

Ian: You should set up a serial console and log the full boot messages and paste them here.

Comment 23 Jim Knowler 2011-02-05 02:31:08 UTC
This is off a new Dell R410 with a fresh F14 load. It would be nice if Dell OMSA would not complain about this old driver. 

[root@newnte ~]# omreport about

Product name : Server Administrator
Version      : 6.4.0
Copyright    : Copyright (C) Dell Inc. 1995-2010 All rights reserved.
Company      : Dell Inc.

[root@newnte ~]# omreport storage controller
 Controller  SAS 6/iR Integrated (Embedded)
Controllers
ID                                            : 0
Status                                        : Non-Critical
Name                                          : SAS 6/iR Integrated
Slot ID                                       : Embedded
State                                         : Degraded
Firmware Version                              : 00.25.47.00.06.22.03.00
Minimum Required Firmware Version             : Not Applicable
Driver Version                                : 3.04.15
Minimum Required Driver Version               : 3.12.29.00

[root@newnte ~]# modinfo mptsas
filename:       /lib/modules/2.6.35.10-74.fc14.x86_64/kernel/drivers/message/fusion/mptsas.ko
version:        3.04.15
license:        GPL
description:    Fusion MPT SAS Host driver
author:         LSI Corporation
srcversion:     F404DF8025454601A4567FB
alias:          pci:v00001000d00000062sv*sd*bc*sc*i*
alias:          pci:v00001000d00000058sv*sd*bc*sc*i*
alias:          pci:v00001000d00000056sv*sd*bc*sc*i*
alias:          pci:v00001000d00000054sv*sd*bc*sc*i*
alias:          pci:v00001000d00000050sv*sd*bc*sc*i*
depends:        scsi_transport_sas,mptscsih,mptbase
vermagic:       2.6.35.10-74.fc14.x86_64 SMP mod_unload
parm:           mpt_pt_clear: Clear persistency table: enable=1  (default=MPTSCSIH_PT_CLEAR=0) (int)
parm:           max_lun: max lun, default=16895  (int)

Comment 24 Bug Zapper 2011-06-02 18:11:18 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 25 Bug Zapper 2011-06-27 14:08:45 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 26 csb sysadmin 2011-08-01 22:23:10 UTC
RHEL6 final still has this very old version of the driver (3.04.16). Dell support balks at us when we try to get a drive replaced or have any support issue with the RAID controller when we're not running a version of the driver that passes their minimum required driver as expected by OMSA (currently 3.12.29.00 for the LSISAS1068E or SAS 6/iR as Dell refers to it). LSI.com only has up to v4.26 and that's also only listed for RHEL5. The 4.33 driver and now the 4.38 driver is for MegaRAID SAS models and don't specifically list the SAS1068E as a supported product in the readme. Any idea when you guys can backport 4.26 or when LSI might release an official driver for the SAS106E (SAS 6/iR) for RHEL6?

Comment 27 Josh Boyer 2011-08-02 00:48:32 UTC
(In reply to comment #26)
> RHEL6 final still has this very old version of the driver (3.04.16). Dell

You should probably file a bug (or find one already opened) against RHEL itself.  This is a closed bug is against an EOL version of Fedora, and it's quite likely that nobody in RHEL is paying any attention to it.

Comment 28 daryl herzmann 2011-10-25 20:40:50 UTC
Sorry for the noise, but is there a RHEL6 bug regarding this issue?