Description of problem: System randomly enters a state where RAID arrays for Areca 1680 controller go offline. Occurs about once a month. These are hard working servers (typical load factors of 4 to 12) but the failure can happen anytime under any load. Version-Release number of selected component (if applicable): Was running Centos 5.1 kernel 2.6.18-92.el5 (SMP) with no problems. Updated to 5.3 online, kernel 2.6.18-128.1.14.el5 in May, problems started. Have had two identical hangs. The other server I have (the primary) is still on older kernel, no failures. How reproducible: Very, just takes a long time to reproduce. Steps to Reproduce: 1. Install an Areca 1680 RAID card. 2. Setup RAID, install to it, use the above mentioned kernel. 3. Wait about a month. Actual results: Error as seen from console: Ext3-fs error (device SDA1) ext3_get_inode_bc: unable to read inode block inode=3909144 block=3932162 SD 0:0:0:0: rejecting I/O to offline device Note that these two machines, running the old kernel, have been 100% rock solid stable for about 9 months while being worked pretty hard. I checked the bug db, and the closest bug I found was 460789 but that one seemed to produce the problem almost immediately. I have about one more month that I can run my backup server on a different kernel to see if this is fixed, then I'll be back to running the old kernel cause it works and I can't afford downtime during a school year.
P.s. more hardware info, snagged from lshw: TYAN Computer Corporation S2932 mobo Dual Quad-Core AMD Opteron(tm) Processor 2352 32G 667MHz DDR2 memory PCI bridge NVidia MCP55 PCI Express bridge ARC-1680 8 port PCIe/PCI-X to SAS/SATA II RAID Controller (actually shows up as two of these, as I have 16 drives) Two logical volumes, one for OS / logs one for pgsql database
Please update ARCMSR to a newer driver. The 15RH1 driver is unstable and since long replaced by Areca. I updated to ftp://ftp.areca.com.tw/RaidCards/AP_Drivers/Linux/DRIVER/SourceCode/arcmsr.1.20.0X.15-81103.zip and I have had no hangs. Mine occured once or twice per week, but no hangs yet. The newer driver should be committed to the kernel tree and I have also posted this on kernel.org and on centos.org
So, when will the next kernel with this driver in it be out? Or is it already out? Or is this one of those things that old versions of RHEL will never have backported to it? Should I just dl and compile my own driver or what?
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in the last planned RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX. To request that Red Hat re-consider this request, please re-open the bugzilla via appropriate support channels and provide additional business and/or technical details about its importance to you.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).