Bug 229880 - Hang in RHEL3U8 with serveraid 4H and ips driver ver 7.10
Hang in RHEL3U8 with serveraid 4H and ips driver ver 7.10
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.8
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Red Hat Kernel Manager
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-23 17:53 EST by Gary M. Gaydos
Modified: 2007-11-16 20:14 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 14:38:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
spec file for custom patched kernel backporting ips 7.12.02 (56.61 KB, application/x-gzip)
2007-02-23 17:56 EST, Gary M. Gaydos
no flags Details
patch file referenced in spec file (4.73 KB, application/x-gzip)
2007-02-23 17:58 EST, Gary M. Gaydos
no flags Details

  None (edit)
Description Gary M. Gaydos 2007-02-23 17:53:23 EST
Description of problem:
SCSI reset on the console followed by a dead hang.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux ES release 3 (Taroon Update 8)

Linux ltc-eth1000.torolab.ibm.com 2.4.21-47.0.1.ELsmp #1 SMP Fri Oct 13 17:56:20
EDT 2006 i686 i686 i386 GNU/Linux

[mrx@ltc-eth1000 mrx]$ cat /proc/scsi/ips/0

IBM ServeRAID General Information:

        Controller Type                   : ServeRAID 4H
        IO region                         : 0x2300 (256 bytes)
        Memory region                     : 0xedf00000 (1048576 bytes)
        Shared memory address             : 0xf883f000
        IRQ number                        : 24
        BIOS Version                      : 7.10.23
        Firmware Version                  : 7.10.24
        Boot Block Version                : 4.00.26
        Driver Version                    : 7.10.18
        Driver Build                      : 731
        


How reproducible:
less than once per month.


Steps to Reproduce:
1.  fails in normal server operation under heavy i/o load.  typically cron.daily
or cron.weekly.  The server runs innd, postfix, antivirus, web.
2.
3.
  
Actual results:
scsi reset followed by dead hang.  power off and restart required to recover.

Expected results:
no hang on scsi reset

Additional info:
1)
Our RHEL4 U4 systems errata kernels ship with version 7.12.05 of ips driver.  We
have not encountered hangs due to scsi resets.

2)
Version 7.12 ips firmware and ips driver
http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-60624&brandind=5000008

Reference in the README:
2.1  ServeRAID Family 7.10 to 7.12
<snip>
Fixed problem with ServeRAID-4H firmware

3)
Discussion thread about scsi resets on ips
http://www-128.ibm.com/developerworks/forums/dw_thread.jsp?message=13871824&cat=53&thread=136633&treeDisplayType=threadmode1&forum=740#13871824

4) We've patched a RHEL3 U8 2.4.21-47.0.1.ELsmp with a version 7.12.02 ips
driver.  It appears stable on two test systems under heavy i/o load running
stress, one of them periodically has scsi resets and does not hang.  Obviously
this isn't tested as thoroughly as you QC your kernels.  We plan to implement
our patched kernel on our crashing production server along with serveraid
firmware and harddrive firmware updates Feb 28.  I'll attach the patch and .spec
file for your examination.

5)
Please consider updating RHEL3 errata kernels with newer ips drivers if they
contain bug fixes.

6)
Let me know if you're going to pass this back to IBM, I have a similar bug
opened there.

7) The hanging server is System ID 1005928737 in rhn.redhat.com
Comment 1 Gary M. Gaydos 2007-02-23 17:56:52 EST
Created attachment 148725 [details]
spec file for custom patched kernel backporting ips 7.12.02
Comment 2 Gary M. Gaydos 2007-02-23 17:58:14 EST
Created attachment 148726 [details]
patch file referenced in spec file
Comment 3 RHEL Product and Program Management 2007-10-19 14:38:24 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.