Bug 483282

Summary: dmraid -r will falsely toggle the OROM back to 'normal' status after a failed rebuild
Product: Red Hat Enterprise Linux 5 Reporter: Shane Bradley <sbradley>
Component: dmraidAssignee: Heinz Mauelshagen <heinzm>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.4CC: agk, bdonahue, coughlan, ctatman, cward, dwysocha, heinzm, krzysztof.wojcik, marcin.labun, mbroz, prockai, rpacheco, syeghiay, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 11:16:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Series of patches including patch for incorrect status under rebuild.
none
Patch- fix static linking issue none

Description Shane Bradley 2009-01-30 17:28:46 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008120908 Red Hat/3.0.5-1.el5_2 Firefox/3.0.5

On workstation and laptop systems with built-in RAID capability, rebuilding a degraded mirror set with 'dmraid -R' will falsely toggle the OROM back to 'normal' status after a failed rebuild.  This effectively tricks the system into "thinking" that the mirrored volume is good and trying to boot from it.  A grub hang is the result.

Reproducible: Always

Steps to Reproduce:
1.  On a Dell T5400 system with Intel OROM, set up two drives (in OROM) as a RAID1 mirror.
2.  Install RHEL5.3 on the mirrored volume.
3.  Boot completely up into the OS one time and confirm the mirror status with mount and dmraid commands.
4.  Reboot the system into rescue mode and zero out a drive.  
I use the following command:
dd if=/dev/zero of=/dev/sdb &

5.  After zeroing out the drive, boot back into the OROM and ensure that the blanked drive is detected as a new drive and is flagged as "rebuild."

6.  Exit the OROM, and boot into the OS.
7.  Run the 'dmraid -R' command against the volume to rebuild it.  This should fail due to another bug currently being worked by Heinz Mauelshagen.
8.  After the 'dmraid -R' command fails to rebuild the mirror, reboot the box and look at the OROM again.  You should see that the OROM is now toggled back to "normal" status for the mirror.
9.  Exit the OROM and continue the boot.  
Actual Results:  
The system should hang at grub loading stage 2, since the rebuild actual failed.

Expected Results:  
The drive should not be toggle as normal if the rebuild failed.

Link to another bz and comment that talks about issue:
https://bugzilla.redhat.com/show_bug.cgi?id=479419#c4

Comment 1 Heinz Mauelshagen 2009-03-23 12:14:23 UTC
Intel,

any ETA on a patch to allow me to ack this bz ?

Comment 2 Marcin Labun 2009-04-07 13:27:13 UTC
(In reply to comment #0)

> Link to another bz and comment that talks about issue:
> https://bugzilla.redhat.com/show_bug.cgi?id=479419#c4  

Root cause of the problem is lack of DMRAID DSO registration (kernel events are passed to the DSO which properly updates metadata with dmraid). In RH 5.3, manual registration is possible which partly covers the problem. The full solution shall contain:
- auto-registration of the ISW DSO when array is activated
- update of dmraid status to reflect the metadata state.

We are currently working on this problem for RHEL5.4 and will contact Heinz with a draft proposal.

Comment 3 Tom Coughlan 2009-04-29 18:41:51 UTC
(In reply to comment #2)

> We are currently working on this problem for RHEL5.4 and will contact Heinz
> with a draft proposal.  

We are running out of time for 5.4. Heinz will need something within a week to make code freeze.

Comment 4 Heinz Mauelshagen 2009-04-29 19:54:56 UTC
Marcin,

is this status update handled correctpy by the patch attached to bz481749, which addresses the auto (de)registration or not ?

Comment 5 Marcin Labun 2009-05-05 13:18:38 UTC
(In reply to comment #4)
> Marcin,
> 
> is this status update handled correctpy by the patch attached to bz481749,
> which addresses the auto (de)registration or not ?  
The patch is under tests and will be available on Th/Fri.

Comment 11 Krzysztof Wojcik 2009-05-07 18:00:02 UTC
Created attachment 342903 [details]
Series of patches including patch for incorrect status under rebuild.

Series of patches based on DMRAID form RHEL5.3 GA distribution.
File "RebuildStatus.patch" fix problem with incorrect status under rebuilding.

Comment 13 Heinz Mauelshagen 2009-05-11 12:30:22 UTC
Krzystof,

static version build (used in initrd) fails. Have you been able to build it successfully ?

Comment 15 Krzysztof Wojcik 2009-05-14 19:37:39 UTC
Created attachment 344036 [details]
Patch- fix static linking issue

Comment 22 Chris Ward 2009-07-03 18:22:38 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 23 Chris Ward 2009-07-10 19:10:17 UTC
~~ Attention Partners - RHEL 5.4 Snapshot 1 Released! ~~

RHEL 5.4 Snapshot 1 has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular request. Please test and report back your results here, at your earliest convenience. The RHEL 5.4 exception freeze is quickly approaching.

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. 

Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative.

Comment 25 Krzysztof Wojcik 2009-07-30 12:10:19 UTC
Bug is verified in RHEL5.4 Beta so Red Hat can change status to VERIFIED.

Comment 27 errata-xmlrpc 2009-09-02 11:16:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1347.html