Bug 229880
| Summary: | Hang in RHEL3U8 with serveraid 4H and ips driver ver 7.10 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Gary M. Gaydos <mrx> | ||||||
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 3.8 | ||||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | i386 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2007-10-19 18:38:24 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 148725 [details]
spec file for custom patched kernel backporting ips 7.12.02
Created attachment 148726 [details]
patch file referenced in spec file
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |
Description of problem: SCSI reset on the console followed by a dead hang. Version-Release number of selected component (if applicable): Red Hat Enterprise Linux ES release 3 (Taroon Update 8) Linux ltc-eth1000.torolab.ibm.com 2.4.21-47.0.1.ELsmp #1 SMP Fri Oct 13 17:56:20 EDT 2006 i686 i686 i386 GNU/Linux [mrx@ltc-eth1000 mrx]$ cat /proc/scsi/ips/0 IBM ServeRAID General Information: Controller Type : ServeRAID 4H IO region : 0x2300 (256 bytes) Memory region : 0xedf00000 (1048576 bytes) Shared memory address : 0xf883f000 IRQ number : 24 BIOS Version : 7.10.23 Firmware Version : 7.10.24 Boot Block Version : 4.00.26 Driver Version : 7.10.18 Driver Build : 731 How reproducible: less than once per month. Steps to Reproduce: 1. fails in normal server operation under heavy i/o load. typically cron.daily or cron.weekly. The server runs innd, postfix, antivirus, web. 2. 3. Actual results: scsi reset followed by dead hang. power off and restart required to recover. Expected results: no hang on scsi reset Additional info: 1) Our RHEL4 U4 systems errata kernels ship with version 7.12.05 of ips driver. We have not encountered hangs due to scsi resets. 2) Version 7.12 ips firmware and ips driver http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-60624&brandind=5000008 Reference in the README: 2.1 ServeRAID Family 7.10 to 7.12 <snip> Fixed problem with ServeRAID-4H firmware 3) Discussion thread about scsi resets on ips http://www-128.ibm.com/developerworks/forums/dw_thread.jsp?message=13871824&cat=53&thread=136633&treeDisplayType=threadmode1&forum=740#13871824 4) We've patched a RHEL3 U8 2.4.21-47.0.1.ELsmp with a version 7.12.02 ips driver. It appears stable on two test systems under heavy i/o load running stress, one of them periodically has scsi resets and does not hang. Obviously this isn't tested as thoroughly as you QC your kernels. We plan to implement our patched kernel on our crashing production server along with serveraid firmware and harddrive firmware updates Feb 28. I'll attach the patch and .spec file for your examination. 5) Please consider updating RHEL3 errata kernels with newer ips drivers if they contain bug fixes. 6) Let me know if you're going to pass this back to IBM, I have a similar bug opened there. 7) The hanging server is System ID 1005928737 in rhn.redhat.com