157638 – Lost RAID status info in /proc/ when upgrading from RHEL 3 to 4 (megaraid_mbox)

Bug 157638 - Lost RAID status info in /proc/ when upgrading from RHEL 3 to 4 (megaraid_mbox)

Summary: Lost RAID status info in /proc/ when upgrading from RHEL 3 to 4 (megaraid_mbox)

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Tom Coughlan
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-05-13 11:57 UTC by Petter Reinholdtsen
Modified:	2007-11-30 22:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-09-23 20:27:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Petter Reinholdtsen 2005-05-13 11:57:33 UTC

The megaraid2 RAID driver in RHEL 3 provided RAID status for our
Dell PowerEdge 2850 machines in /proc/megaraid/hba0/raiddrives-0-9.  This
normally look like this:

  diger.uio.no# cat /proc/megaraid/hba0/raiddrives-0-9
  Logical drive: 0:, state: optimal
  Span depth:  1, RAID level:  1, Stripe size: 64, Row size:  2
  Read Policy: Adaptive, Write Policy: Write back, Cache Policy: Direct IO

  diger.uio.no#

When we started using RHEL 4 on these machines, the kernel driver changed
to megaraid_mbox, and the status information is no longer available.

We depend on the information in /proc/ to monitor the RAID status, and to
pass information about failed RAIDs to our error reporting system (Palantir).
Without easily available status info, the risk of using the hardware RAID
system increases significantly, as we no longer automatically detect failed
disks.  We use home made scripts to check the RAID status every 5 minutes.

This is the PCI ID of the hardware controller used in our Dell PE2850:

  02:0e.0 Class 0104: 1028:0013 (rev 06)
  02:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID controller 4 (rev 
06)

Please update the kernel driver used by this hardware to provide
textual information about the RAID status in /proc/ or /sys/.
If possible, try to make sure all RAID drivers provide status information
using the same format and in files located in a similar path structure in
/proc/ or /sys/.  The latter would make it easier for us to write the scripts
to detect RAID failure.

Comment 1 john.l.villalovos 2005-08-25 23:57:06 UTC

I notice this also in Fedora Core 3.  Where can people get the megaraid card
status from now?

Comment 2 Petter Reinholdtsen 2005-09-23 10:18:15 UTC

The lack of RAID status info make the machine dangerous to use.  We get no
warning when disks fail, and might end up with a complete disk crash if
enough disks crash in the RAID set.

Are there anyone working on addressing this issue?

Comment 3 Tom Coughlan 2005-09-23 20:27:46 UTC

As you may know, there is a movement away from using /proc for this sort of
thing in the 2.6 kernel. It would be ideal if there were a commonly agreed upon
set of values that RAID devices report in sysfs. So far, unfortunately, I have
not seen any such proposals made upstream. 

Currently it looks like your best bet is to get a monitoring utility from the
LSI Logic web page. The MegaRAID Configuration Utility (MEGARC), for example,
looks like it does what you want:

# ./megarc.bin -dispCfg -a0


        **********************************************************************
              MEGARC MegaRAID Configuration Utility(LINUX)-1.11(12-07-2004)
              By LSI Logic Corp.,USA
        **********************************************************************
          [Note: For SATA-2, 4 and 6 channel controllers, please specify
          Ch=0 Id=0..15 for specifying physical drive(Ch=channel, Id=Target)]

        Type ? as command line arg for help


        Finding Devices On Each MegaRAID Adapter...
        Scanning Ha 0, Chnl 3 Target 15


        **********************************************************************
              Existing Logical Drive Information
              By LSI Logic Corp.,USA
        **********************************************************************
          [Note: For SATA-2, 4 and 6 channel controllers, please specify
          Ch=0 Id=0..15 for specifying physical drive(Ch=channel, Id=Target)]


          Logical Drive : 0( Adapter: 0 ):  Status: OPTIMAL
        ---------------------------------------------------
        SpanDepth :01     RaidLevel: 5  RdAhead : No  Cache: DirectIo
        StripSz   :064KB   Stripes  : 4  WrPolicy: WriteThru

        Logical Drive 0 : SpanLevel_0 Disks
        Chnl  Target  StartBlock   Blocks      Physical Target Status
        ----  ------  ----------   ------      ----------------------
        3      00    0x00000000   0x010f1800   ONLINE
        3      01    0x00000000   0x010f1800   ONLINE
        3      03    0x00000000   0x010f1800   ONLINE
        3      04    0x00000000   0x010f1800   ONLINE

Comment 4 Art Hays 2005-10-28 19:19:46 UTC

Why couldnt you put it back in /proc until a better solution is ready, instead
of removing an essential feature (to me) in the absence of a better solution?

Comment 5 Tom Coughlan 2005-10-28 20:00:39 UTC

The MegaRAID Configuration Utility solution (or something like it from LSI
Logic) does not work for you?

Comment 6 Art Hays 2005-10-28 23:47:52 UTC

I run scripts to check RAID status.  If it's possible to use the Configuration
Utility easily from a script it would be fine.

The /proc interface was very convenient (with the eXtremeRAID driver, I could
also initiate commands such as 'rebuild').  As Redhat AS 4 comes now, there is
no way to monitor or manage megaraid included.  Perhaps this is normal, and one
should expect to add on something like MegaRC or Dell's Openmanage?

Comment 7 Petter Reinholdtsen 2005-11-04 13:51:02 UTC

The vendor programs I've seen to extract RAID status have had issues
when used from scripts, and I have never found a satisfying solution
using them.  This is why I prefer to have a text file in /proc/ or /sys/
instead.

When such information isn't available in /proc/ or /sys/, I recommend to not
use the hardware raid in question.  It is sad to have to recommend
against using Dell PowerEdge 2850 with RHEL 4, when it worked just fine with
RHEL 3. :/

People following this bug might find the information available from
<URL: 
http://developer.skolelinux.no/info/prosjektet/delprosjekt/hw-raid-info.html >
interesting.  It is a summary of some of the features in hardware raids on 
linux.

Note You need to log in before you can comment on or make changes to this bug.