Bug 517761

Summary: No way to use the -x or the -r -E options to clean up stale raid metadata
Product: [Fedora] Fedora Reporter: Sam Varshavchik <mrsam>
Component: dmraidAssignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: agk, bmr, dwysocha, hdegoede, heinzm, henrik, jeff, lvm-team, mbroz, misek, Per.t.Sjoholm, prockai
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-01 06:33:25 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Sam Varshavchik 2009-08-16 16:57:33 EDT
Description of problem:

Split off from 49973#c24. Should be applicable to F11 as well, that has the same version of dmraid.

Version-Release number of selected component (if applicable):

1.0.0.rc15

How reproducible:

Always.

Steps to Reproduce:
1. Attach two drives to an Adaptec SCSI controller that supports hardware RAID
2. Format both drives as hardware RAID devices, using the tools in Adaptec's SCSI BIOS.
3. Change your mind, after you discover that an old version of Fedora you were going to install, many years ago, did not support hardware raid back then (pre-dmraid).
4. Turn off RAID in Adaptec SCSI BIOS
5. Format both drives using Linux softraid. Install Fedora.
6. Years later, you're now running F10, and you want to upgrade to F11
7. Bug 499733 prevents you from upgrading to F11, the suggested workaround is to use dmraid -x or dmraid -r -E, first.
  
Actual results:

[root@commodore ~]# dmraid -r
ERROR: asr: Invalid magic number in RAID table; saw 0x0, expected 0x900765C4 on
/dev/sdb
ERROR: asr: Invalid magic number in RAID table; saw 0x0, expected 0x900765C4 on
/dev/sda
no raid disks
[root@commodore ~]# dmraid -x /dev/sdb
ERROR: asr: Invalid magic number in RAID table; saw 0x0, expected 0x900765C4 on
/dev/sdb
ERROR: asr: Invalid magic number in RAID table; saw 0x0, expected 0x900765C4 on
/dev/sda
no raid disks and with names: "/dev/sdb"
[root@commodore ~]# dmraid -r -E /dev/sdb
ERROR: asr: Invalid magic number in RAID table; saw 0x0, expected 0x900765C4 on
/dev/sdb
no raid disks and with names: "/dev/sdb"

Expected results:

There should be a way for me to remove all traces of the aborted hardware RAID signatures, that have been lurking in the background, for all these years.

Additional info:
Comment 1 Henrik Nordström 2009-11-08 16:07:29 EST
This problem very much also affects users various BIOS fakeraid systems. If they have once enabled BIOS fakeraid and then decide to turn it off then there is often stale signatures left behind.

Thankfully F12 again supports the nodmraid boot option to work around this, but would be nice to be able to fix the drives in a non-destructive manner without resorting to using hexedit to manually clear the RAID signature.

dmraid-1.0.0.rc16-4.fc12

related note: That package version is not right. Should have been 1.0.0-0.4.rc16.fc12 as per https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Package_Version
Comment 2 Henrik Nordström 2009-11-08 16:51:34 EST
problem confirmed. dmraid -rE only deletes valid RAID signatures, refusing to touch drives where it looks like there is a RAID signature but there is something wrong with it.

The bad part is that F12 anaconda refuses to access drives having an invalid RAID signature, which means that one MUST use the nodmraid boot option or alternatively manually clear the RAID signature using dd or hexedit..
Comment 3 Hans de Goede 2009-11-09 03:48:07 EST
Heinz,

This looks like it is a real issue for several people, also see:
https://bugzilla.redhat.com/show_bug.cgi?id=499733#c35

For example, it would be great if you can take a look at this when you have some time.

Regards,

Hans
Comment 4 Bug Zapper 2009-11-18 07:11:57 EST
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 5 Henrik Nordström 2009-11-18 10:15:10 EST
Confirmed in F12. Retargeting.
Comment 6 Heinz Mauelshagen 2009-11-25 09:25:30 EST
Hans,

the above error indicates, that the asr metadata is bogus or it is a none supported format.

In general, I'm not allowing to erase metadata areas not correctly discovered and supported.

If there's no ATARAID BIOS at hand to erase such metadata, or the BIOS leaves stale metadata behind anyway, erasing the last MB of the device will erase any supported ATARAID/DDF1/... metadata signatures.

Because this looks broader than just the dmraid unsupported formats:
can we have a "WARNING: get rid of any RAID metadata" function in Anaconda/pyblock ?
Comment 7 Henrik Nordström 2009-11-25 09:34:14 EST
In my case it was an NVidia BIOS raid signature that had been ignored for years. Installed using nodmraid boot option. Not sure why the signature got invalid.

Imho allowing dmraid to erase these unknown signatures is pretty safe. At least much safer than giving the user instructions to try to do the same using dd. dmraid can easily verify that the signature is outside current partitions.

But I fully agree on the second part. anaconda should be much better at informing when it sees disks which are supposed to be part of a unknown RAID set.
Comment 8 Hans de Goede 2009-11-26 06:15:22 EST
Heinz, Henrik, I agree with you anaconda should warn when it ignores a disk because of dmraid metadata. I've fixing this on my todo, this is tracked in bug 
506861.
Comment 9 Sam Varshavchik 2009-11-29 14:09:50 EST
Well, I gave up and wrote a quick program to wipe the last megabyte's worth of blocks on my disks. That seemed to have worked, and dmraid isn't complaining any more.

My normal levels of paranoia required me to sanity check the resulting byte offset after lseek64(fd, -1024 * 1024, SEEK_END) completed, and comparing that against the ending position of the last partition, as reported by fdisk. Only after satisfying myself that I end up seeking past the end of the last partition, did I let it rip, and it seemed to have worked.

Still, that's not something that a non-technical user should be expected to do, in the same situation.