Bug 511911 - System hangs with "rejecting I/O to offline device"
Summary: System hangs with "rejecting I/O to offline device"
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-15 15:46 UTC by Scott Marlowe
Modified: 2017-07-31 14:47 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-02 13:01:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 14961 0 None None None 2019-08-01 22:50:57 UTC

Description Scott Marlowe 2009-07-15 15:46:22 UTC
Description of problem:

System randomly enters a state where RAID arrays for Areca 1680 controller go offline.  Occurs about once a month.  These are hard working servers (typical load factors of 4 to 12) but the failure can happen anytime under any load.

Version-Release number of selected component (if applicable):
Was running Centos 5.1 kernel 2.6.18-92.el5 (SMP) with no problems.
Updated to 5.3 online, kernel 2.6.18-128.1.14.el5 in May, problems started.  Have had two identical hangs.  The other server I have (the primary) is still on older kernel, no failures.

How reproducible:
Very, just takes a long time to reproduce.

Steps to Reproduce:
1. Install an Areca 1680 RAID card.  
2. Setup RAID, install to it, use the above mentioned kernel.
3. Wait about a month.
  
Actual results:
Error as seen from console:
Ext3-fs error (device SDA1) ext3_get_inode_bc: unable to read inode block inode=3909144 block=3932162
SD 0:0:0:0: rejecting I/O to offline device

Note that these two machines, running the old kernel, have been 100% rock solid stable for about 9 months while being worked pretty hard.  

I checked the bug db, and the closest bug I found was 460789 but that one seemed to produce the problem almost immediately.  I have about one more month that I can run my backup server on a different kernel to see if this is fixed, then I'll be back to running the old kernel cause it works and I can't afford downtime during a school year.

Comment 1 Scott Marlowe 2009-07-15 15:55:54 UTC
P.s. more hardware info, snagged from lshw:

TYAN Computer Corporation S2932 mobo
Dual Quad-Core AMD Opteron(tm) Processor 2352
32G 667MHz DDR2 memory
PCI bridge NVidia MCP55 PCI Express bridge
ARC-1680 8 port PCIe/PCI-X to SAS/SATA II RAID Controller (actually shows up as two of these, as I have 16 drives)
Two logical volumes, one for OS / logs one for pgsql database

Comment 2 mikki 2009-12-30 18:41:03 UTC
Please update ARCMSR to a newer driver. The 15RH1 driver is unstable and since long replaced by Areca.

I updated to ftp://ftp.areca.com.tw/RaidCards/AP_Drivers/Linux/DRIVER/SourceCode/arcmsr.1.20.0X.15-81103.zip and I have had no hangs.

Mine occured once or twice per week, but no hangs yet.

The newer driver should be committed to the kernel tree and I have also posted this on kernel.org and on centos.org

Comment 3 Scott Marlowe 2010-01-08 03:47:13 UTC
So, when will the next kernel with this driver in it be out?  Or is it already out?  Or is this one of those things that old versions of RHEL will never have backported to it?

Should I just dl and compile my own driver or what?

Comment 4 RHEL Program Management 2014-03-07 12:13:24 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in the  last planned RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX. To request that Red Hat re-consider this request, please re-open the bugzilla via  appropriate support channels and provide additional business and/or technical details about its importance to you.

Comment 5 RHEL Program Management 2014-06-02 13:01:55 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).


Note You need to log in before you can comment on or make changes to this bug.