652961 – handling of ex-dmraid drives is getting worse

Bug 652961 - handling of ex-dmraid drives is getting worse

Summary: handling of ex-dmraid drives is getting worse

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anaconda
Sub Component:
Version:	14
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	David Lehman
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-11-13 19:51 UTC by Ray Todd Stevens
Modified:	2011-05-18 16:40 UTC (History)
CC List:	2 users (show)
Fixed In Version:	anaconda-15.0-1
Clone Of:
Environment:
Last Closed:	2011-05-18 16:40:36 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ray Todd Stevens 2010-11-13 19:51:35 UTC

This has been a continuing problem. I can't imagine that I am the only one who recycles drives. Basically I use them for a year or two in high production machines, then use them in low production machines until I note problems. Then I use them in test machines till they die. Then we take a 357 to them before tossing them. Once they are using they don't leave the facility alive. Am I the only one who works it this way????

Anyway one of the things that result from this is that our high production machines generally are hardware raid of one form or other with spare drives. Our low production machines are generally still mirrored with software raid of one for or other, frequently dmraid. Our test machines are well test machines and who knows how they are configured. The problem is when we switch configurations.

I have filed several bug reports over the years on this issue. Basically the machines with no raid or only dmraid detect the hardware raid status of the drive and refuse to install on it in many cases. If we use linux internal software raid which I prefer to dmraid on a drive then again the installer will detect the previous status of the drive. If I added the nodmraid parameter to the install boot in the past at least I could get it to install or upgrade.

Now I had real "fun". I had a set of drives in a standby server. Originally they were hardware raid, and then dmraid. The mother board doesn't support either of these. And I prefer direct software raid for software raid. So it was installed as a software raid using the nodmraid parameter. It has been upgraded one this way successfully. However going 13-14 has hit a SNAG. the nodmraid parameter is allowing me to do a fresh install, but will not allow me to do an upgrade.

If you try and run without the nodmraid parameter then it detects the drives and acts as if all is well until you try to install when it dies. With the nodmraid it still will not fully work.

Can't the system detect that the mother board doesn't support dmraid and not try and use it?

Better yet, can't we have a utility to remove all of these hardware and dmraid falgs from a drive so it can be reused? I have tried the write to sector 0 thing and all kinds of other suggestions without any luck.

Comment 1 David Lehman 2010-11-18 22:14:07 UTC

Ignoring the metadata by passing nodmraid is really just avoiding the issue of having stale/invalid metadata on your disks. The correct approach for you, since you are in the habit of re-purposing firmware-raid disks, is to take matters into your own hands and make removal of stale metadata part of your re-purposing procedure. If you want to make sure there is no such metadata, here is how:

1. To remove all old firmware raid signatures, run this:

    dmraid -E <device>

2. To remove most other signatures of any kind, run this:

    dd if=/dev/zero of=<device> bs=1M count=32

   Note: In most cases, a count of 1 would be more than
         sufficient to clear all metadata, including
         firmware raid signatures.

   Note: This may leave behind some older md metadata, which
         is located at the end of the device. If you want to
         be sure it is gone as well, you can use awk/cut, dd,
         and /proc/partitions to zero out the last MB or so
         of each device.

Here's an untested snippet of python (based on some fairly heavily tested code in anaconda) that writes zeros the first and last MB of a device:

#!/usr/bin/python
import sys
import os

try:
    device = sys.argv[1]
except IndexError:
    print >> sys.stderr, "Usage: %s <device>" % sys.argv[0]
    sys.exit(1)

try:
    fd = os.open(device, os.O_RDWR)
    buf = '\0' * 1024 * 1024
    os.write(fd, buf)
    os.lseek(fd, -1024 * 1024, 2)
    os.write(fd, buf)
except Exception as e:
    if getattr(e, "errno", None) == 28: # No space left in device
        pass
    else:
        print >> sys.stderr, "error zeroing out %s: %s" % (device, e.message)
finally:
    if fd:
        os.close(fd)

Comment 2 Ray Todd Stevens 2010-11-19 00:34:31 UTC

Then there IS a bug in anaconda.

Reference to:

1. To remove all old firmware raid signatures, run this:

    dmraid -E <device>

This only works if you are using the same board and have the complete set on the system.   Or at least dbraid detects that you don't have a dmraid mother board and doesn't even try.   It also detects of the drive set is no complete and exits.

Referencing    

Already had this recommended to me.   

dd if=/dev/zero of=<device> bs=1M count=32

   Note: In most cases, a count of 1 would be more than
         sufficient to clear all metadata, including
         firmware raid signatures.

   Note: This may leave behind some older md metadata, which
         is located at the end of the device. If you want to
         be sure it is gone as well, you can use awk/cut, dd,
         and /proc/partitions to zero out the last MB or so
         of each device.

This does say it works, but anaconda still finds that it is a dmraid drive.

Comment 3 Ray Todd Stevens 2010-11-19 00:52:21 UTC

I think a big clue may be that we also have a lot of identical drives.   When I try and use an ex-dmraid drive with the nodmraid parameter on the kernel it still give some kind of a message about detecting the dmraid on the drive during the bootup.   A dmraid drive I have run the above on also is of a slightly (very slightly, but slightly) smaller size than an identical drive with the identical firmware on the drive which has never been used as a dmraid drive.

I am suspecting that somehow the kernel is setting aside the meta area and making it nonaccessable even if the nodmraid parameter is used.

Comment 4 David Lehman 2010-12-07 18:58:36 UTC

I just learned of a utility called wipefs that might help you. It is part of the util-linux-ng package, which is a good thing since that implies that it uses the same methods to find the signatures as blkid, which can identify dmraid member devices.

    wipefs -a <device>

Let me know how it works.

Comment 5 David Lehman 2011-05-18 16:40:36 UTC

The nodmraid handling should be fixed as of anaconda-15.0-1, or F15.

Note You need to log in before you can comment on or make changes to this bug.