Bug 496440 - RAID1 volume create failed
RAID1 volume create failed
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: anaconda (Show other bugs)
11
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: David Lehman
Fedora Extras Quality Assurance
anaconda_trace_hash:7178954bb666b7128...
:
: 502663 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-19 04:40 EDT by ENDOH takanao
Modified: 2010-01-18 11:10 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-01-18 11:10:03 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Attached traceback automatically from anaconda. (279.02 KB, text/plain)
2009-04-19 04:41 EDT, ENDOH takanao
no flags Details
Attached traceback automatically from anaconda. (498.65 KB, text/plain)
2009-05-07 10:32 EDT, Gerry Reno
no flags Details
fdisk -l (2.75 KB, text/plain)
2009-05-07 12:24 EDT, Gerry Reno
no flags Details
mdadm -E /dev/sdb1 (849 bytes, text/plain)
2009-05-07 12:27 EDT, Gerry Reno
no flags Details
updates with proposed fix (105.46 KB, application/octet-stream)
2009-05-07 13:27 EDT, David Lehman
no flags Details
mdadm -E -s (376 bytes, text/plain)
2009-05-07 14:21 EDT, Gerry Reno
no flags Details
/etc/mdadm.conf (492 bytes, text/plain)
2009-05-07 14:23 EDT, Gerry Reno
no flags Details
Attached traceback automatically from anaconda. (658.49 KB, text/plain)
2009-05-22 11:58 EDT, James Laska
no flags Details
Attached traceback automatically from anaconda. (1.59 MB, text/plain)
2009-06-11 16:01 EDT, Brian
no flags Details

  None (edit)
Description ENDOH takanao 2009-04-19 04:40:58 EDT
The following was filed automatically by anaconda:
anaconda 11.5.0.45 exception report
Traceback (most recent call first):
  File "/usr/lib/anaconda/storage/devicelibs/mdraid.py", line 121, in mdcreate
    raise MDRaidError("mdcreate failed for %s" % device)
  File "/usr/lib/anaconda/storage/devices.py", line 2375, in create
    spares)
  File "/usr/lib/anaconda/storage/deviceaction.py", line 203, in execute
    self.device.create(intf=intf)
  File "/usr/lib/anaconda/storage/devicetree.py", line 658, in processActions
    action.execute(intf=self.intf)
  File "/usr/lib/anaconda/storage/__init__.py", line 234, in doIt
    self.devicetree.processActions()
  File "/usr/lib/anaconda/packages.py", line 116, in turnOnFilesystems
    anaconda.id.storage.doIt()
MDRaidError: mdcreate failed for /dev/md0
Comment 1 ENDOH takanao 2009-04-19 04:41:04 EDT
Created attachment 340215 [details]
Attached traceback automatically from anaconda.
Comment 2 Gerry Reno 2009-05-07 10:32:26 EDT
Created attachment 342852 [details]
Attached traceback automatically from anaconda.
Comment 3 Gerry Reno 2009-05-07 11:01:59 EDT
I've tried several times to use F11 anaconda to create a setup with RAID arrays and every time it fails.
Comment 4 Gerry Reno 2009-05-07 11:28:53 EDT
One other note:
I've noticed that on machines where there is an existing setup that we wish to reuse (same partitions, same raid arrays, same volumes) that when you select Custom Setup to review/edit the setup, that druid has renumbered the raid arrays.  So what was /dev/md0 is now /dev/md1, /dev/md3 is now /dev/md0, etc.  And of course these existing raid arrays are the physical volumes for the volume groups and this is probably why the volume groups are disappearing because their PV's are all wrong.
Comment 5 David Lehman 2009-05-07 11:57:10 EDT
(In reply to comment #3)
> I've tried several times to use F11 anaconda to create a setup with RAID arrays
> and every time it fails.  

You might want to try reformatting the member partitions next time. That will
eliminate the possibility of mdadm finding something that confuses it and
aborting the array creation process.

Also, FYI, I have been informed that spare disks with a RAID1 array "makes no sense for raid1.  Only makes sense for 4/5/6." Just passing it along...
Comment 6 David Lehman 2009-05-07 12:11:42 EDT
It would be very useful for one or both of you to provide the output of the following commands:

  fdisk -l

then run:

  mdadm -E /dev/sda1 (Takanao)

or:

  mdadm -E /dev/sdb1 (Gerry)


Thanks.
Comment 7 Gerry Reno 2009-05-07 12:22:52 EDT
(In reply to comment #5)
> Also, FYI, I have been informed that spare disks with a RAID1 array "makes no
> sense for raid1.  Only makes sense for 4/5/6." Just passing it along...  

This is bad advice.  I've had many RAID1 disks fail and the spare can then immediately be put as part of the array and minimize your risk exposure.  You can have more than 2 active disks as part of a RAID1 array but that slows performance.   So we just use 2-active disks plus 1-spare disk.  The only modes where spare disks do not make sense is where there is no parallelism such as linear modes like RAID0.  Otherwise it is always best practice to use spare disks with the arrays including RAID1 arrays.


As far as formatting member partitions, even though we were keeping the same structure, I did tell druid to format the arrays.  The format check mark was on in the format column next to each array.  So mdadm should not have any reason to be confused.


Per your request attachments coming...
Comment 8 Gerry Reno 2009-05-07 12:24:50 EDT
Created attachment 342874 [details]
fdisk -l
Comment 9 Gerry Reno 2009-05-07 12:27:58 EDT
Created attachment 342875 [details]
mdadm -E /dev/sdb1
Comment 10 David Lehman 2009-05-07 12:38:46 EDT
(In reply to comment #7)
> (In reply to comment #5)
> 
> As far as formatting member partitions, even though we were keeping the same
> structure, I did tell druid to format the arrays.  The format check mark was on
> in the format column next to each array.  So mdadm should not have any reason
> to be confused.

That is why I said "try reformatting the _member_ _partitions_". Explicitly, I mean. Edit the partitions, check the "format" box, then proceed.

> 
> 
> Per your request attachments coming...  

Thanks for the quick turnaround.
Comment 11 Gerry Reno 2009-05-07 12:48:50 EDT
Additional comment about mdadm:
When we go into Custom Setup where there is an existing setup already in place on the machine there is no reason for mdadm to be confused about what the existing setup should look like.  A simple 'mdadm -A -s' should assemble the existing arrays properly. Same for volumes, a simple 'vgchange -ay' should activate all the  existing volumes correctly.  So I don't understand why druid has such a problem with this.


As far as formatting each individual __member__ partition.  You are talking a lot of work here :-)  Druid should just __do_this__.  But I'll go try this and see what happens.
Comment 12 David Lehman 2009-05-07 13:01:45 EDT
(In reply to comment #11)
> Additional comment about mdadm:
> When we go into Custom Setup where there is an existing setup already in place
> on the machine there is no reason for mdadm to be confused about what the
> existing setup should look like.  A simple 'mdadm -A -s' should assemble the
> existing arrays properly. Same for volumes, a simple 'vgchange -ay' should
> activate all the  existing volumes correctly.  So I don't understand why druid
> has such a problem with this.

Basically because it isn't that simple. Consider raid-on-lvm, lvm-on-raid, encrypted devices. Everything has to be assembled incrementally.

If you notice in your mdadm -E output you are taking a raid member with a preferred minor of 1, meaning it was previously a member of md1, and using it as a member of md0 (minor of 0). Are you sure it's us that's mixing up the arrays?

> 
> 
> As far as formatting each individual __member__ partition.  You are talking a
> lot of work here :-)  Druid should just __do_this__.  But I'll go try this and
> see what happens.  

Perhaps we should.
Comment 13 Gerry Reno 2009-05-07 13:24:01 EDT
(In reply to comment #12)
> (In reply to comment #11)
> ...
> Basically because it isn't that simple. Consider raid-on-lvm, lvm-on-raid,
> encrypted devices. Everything has to be assembled incrementally.
> 
So, you do 'mdadm -A -s'.  Oh, it failed, well then let's do 'vgchange -ay'. Ah, that succeeded.  Now let's go do 'mdadm -A -s' again.  Ah-ha, this time it assembles.  I think you are overcomplicating the picture.

> If you notice in your mdadm -E output you are taking a raid member with a
> preferred minor of 1, meaning it was previously a member of md1, and using it
> as a member of md0 (minor of 0). Are you sure it's us that's mixing up the
> arrays?
> 
That partition has never been used as part of md1.  It is way too small.  And if it is marked as such then something (maybe druid?) made that mark.  This setup has been running perfectly on this machine for around a year with no changes being made to it.  No drives have failed and the setup is exactly as it was when it was first constructed using Druid under F9.
Comment 14 David Lehman 2009-05-07 13:27:34 EDT
Created attachment 342892 [details]
updates with proposed fix

Please try this updates image and see if you still see this failure.
Comment 15 Gerry Reno 2009-05-07 13:32:57 EDT
(In reply to comment #12)
> (In reply to comment #11)
> > 
> > As far as formatting each individual __member__ partition.  You are talking a
> > lot of work here :-)  Druid should just __do_this__.  But I'll go try this and
> > see what happens.  
> 
What happens is that Druid won't let me edit the partitions unless I teardown the entire buildup.  Delete all the volumes, delete all the raid arrays.  Then I can edit the partitions.  And then you have to build the whole thing back up.  THAT'S just too much work.  Don't have time to do all that.


I'll check out the updates image in a bit here.
Comment 16 David Lehman 2009-05-07 13:38:04 EDT
(In reply to comment #15)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > 
> > > As far as formatting each individual __member__ partition.  You are talking a
> > > lot of work here :-)  Druid should just __do_this__.  But I'll go try this and
> > > see what happens.  
> > 
> What happens is that Druid won't let me edit the partitions unless I teardown
> the entire buildup.  Delete all the volumes, delete all the raid arrays.  Then
> I can edit the partitions.  And then you have to build the whole thing back up.
>  THAT'S just too much work.  Don't have time to do all that.

Agreed -- not worth it.

> 
> 
> I'll check out the updates image in a bit here.  

Ok.
Comment 17 David Lehman 2009-05-07 13:40:26 EDT
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > ...
> > Basically because it isn't that simple. Consider raid-on-lvm, lvm-on-raid,
> > encrypted devices. Everything has to be assembled incrementally.
> > 
> So, you do 'mdadm -A -s'.  Oh, it failed, well then let's do 'vgchange -ay'.
> Ah, that succeeded.  Now let's go do 'mdadm -A -s' again.  Ah-ha, this time it
> assembles.  I think you are overcomplicating the picture.

Keep in mind this is not a user sitting at a console. And, if you think about it, what you're describing is basically incremental assembly anyway. It's just that you're talking about it without having actually tried to make it work for a large user-base. We're getting information from mdadm about what devices belong to what arrays. We're getting information from lvm about what pvs belong to what vgs. This stuff is working for many users with many varying configurations, including lvm-on-mdraid, so I don't think it's quite as broken as you would like to believe.

Having said that, any patches you produce to improve things are welcome.

> 
> > If you notice in your mdadm -E output you are taking a raid member with a
> > preferred minor of 1, meaning it was previously a member of md1, and using it
> > as a member of md0 (minor of 0). Are you sure it's us that's mixing up the
> > arrays?
> > 
> That partition has never been used as part of md1.  It is way too small.  And
> if it is marked as such then something (maybe druid?) made that mark.  This
> setup has been running perfectly on this machine for around a year with no
> changes being made to it.  No drives have failed and the setup is exactly as it
> was when it was first constructed using Druid under F9.  

When we find a member of an mdraid array, we use the metadata inside that device's raid superblock to see if it has a preferred minor. If it does, we will use that minor when assembling the array. That's what it is there for. If you have a configuration that relies on an mdadm.conf to override the metadata that exists in the members, perhaps you should update the superblocks in the member to reflect that configuration. We will not look for your old mdadm.conf, so we will not know what you have done that requires the config file to work properly. Perhaps this is the source of the problem?
Comment 18 Gerry Reno 2009-05-07 14:20:52 EDT
The /etc/mdadm.conf was generated by anaconda but basically it is the same output as 'mdadm -E -s'.  So the config file is really not doing anything.  It's probably been there since when the system was installed.  Attaching outputs...
Comment 19 Gerry Reno 2009-05-07 14:21:58 EDT
Created attachment 342906 [details]
mdadm -E -s
Comment 20 Gerry Reno 2009-05-07 14:23:11 EDT
Created attachment 342907 [details]
/etc/mdadm.conf
Comment 21 Gerry Reno 2009-05-07 14:41:21 EDT
I just looked at the man page and 2.6 kernels and later should automatically update the preferred-minor upon first write of an assembled array.  I think what happened as to how this wrong mark got into the /dev/sdb1 superblock is that Druid tried to make that partition part of /dev/md1 instead of /dev/md0.  That is why when looked at the Custom Setup that not just /dev/md0 but three other arrays were all numbered incorrectly.  And there's no way that all the partitions in all these arrays had bad preferred-minors.  The kernel would automatically update the preferred-minor everytime the system was booted.  So what happened is that the kernel updated the preferred-minors after Druid incorrectly assembled the arrays.
Comment 22 David Lehman 2009-05-07 14:55:54 EDT
Anaconda uses the preferred minor to determine what array to start, so how would try to start md0 as md1 in the first place if that was not the preferred minor specified by the members' superblocks?

I have personally tested several mdraid configurations, as have many other users of F11, and everyone except you agrees that it generally works as expected.

It is possible that at this point you have inadvertently modified your member partitions superblocks through failed install attempts such that they no longer contain the correct preferred minor. There were real issues in anaconda versions prior to 11.5.0.47 with discovery and assembly of pre-existing mdraid arrays that could have caused this.
Comment 23 Gerry Reno 2009-05-07 15:28:16 EDT
Even if there were failed attempts previously, once the system is rebooted back into the current installation, the kernel should be rewriting the correct preferred-minor into the superblocks.  Are you thinking that this isn't happening?  I'll manually reset the preferred-minors if necessary and retry the install.

BTW, I tried the updates.img but it didn't make any difference.  The install dies with error: cannot create /dev/md0 array.
Comment 24 Gerry Reno 2009-05-07 15:53:37 EDT
For existing setups:
I think it would be good if any existing setup was assembled in the same manner as the system assembles it and that is it looks for UUID's in the config file first.  If it finds them then it assembles the arrays from the UUID's.  And Druid should do the same.  If there is no config or there is nothing in the config like UUID then I think Druid should look at the preferred-minor.  But only after looking at UUID.  This would provide more consistent behavior with existing system assembly. And would take care of cases where the preferred-minor might differ from the assembled array.
Comment 25 David Lehman 2009-05-07 16:04:53 EDT
We don't go looking for config files from previous installations. We aren't about to start, either. We don't want to manage code to parse config files from various utilities like mdadm. That's a maintenance nightmare, not to mention it just isn't anaconda's job.


I would still be interested in the /tmp/anacdump.txt from your last attempt with the updates.
Comment 26 Gerry Reno 2009-05-07 19:30:11 EDT
If you don't want to look at mdadm.conf then you should at least try starting the arrays using 'mdadm -E -s' and see how the system assembles the arrays.  To immediately just go after the preferred-minors doesn't give you the same assemblage.  When you present the user with "Here's your system".  It should be the same layout it is normally and that includes whether something has messed up the preferred-minors or not.  No doubt it was some failed install attempts on this machine that caused these incorrect marks in the superblocks.  I'd tried installing early versions of rawhide and F11, none of which would install.

Anyhow, I checked all the superblocks and found a couple other ones that had differing preferred-minors.  So I fixed all the superblocks.  Now anaconda showed me the "upgrade/install" screen which it was not doing previously.  So I took a chance and selected upgrade.  And lo and behold, it started doing the upgrade.  In fact, I was able to successfully complete the upgrade and the only problem I had was that X refuses to start now.  Remotely logging in I can see the system looks good otherwise.

As far as the /tmp/anacdump.txt file, I didn't know you needed anything from the updates.img attempt so I didn't save anything from the ramdisk and I don't see anything in /tmp on the system either.

So now after a lot of work I have one install completed with F11.  At least we got this one done.  We'll see how it goes on some other similar machines in the next few days.  I'll be inspecting all the superblocks ahead of time.
Comment 27 Gerry Reno 2009-05-07 19:31:43 EDT
(In reply to comment #26)
> the arrays using 'mdadm -E -s' 
I meant 'mdadm -A -s'
Comment 28 James Laska 2009-05-22 11:58:11 EDT
Created attachment 345114 [details]
Attached traceback automatically from anaconda.
Comment 29 James Laska 2009-05-22 11:59:45 EDT
Reproduced using anaconda-11.5.0.54 on ppc by creating 8 raid members (size 4096) then attempting a RAID1 '/' device.
Comment 30 Chris Lumens 2009-05-29 10:56:43 EDT
*** Bug 502663 has been marked as a duplicate of this bug. ***
Comment 31 Bug Zapper 2009-06-09 10:06:08 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 32 Brian 2009-06-11 16:01:20 EDT
Created attachment 347465 [details]
Attached traceback automatically from anaconda.
Comment 33 David Lehman 2010-01-18 11:10:03 EST
This bug was reported during the Fedora 11 development cycle. Since Fedora 12 has been available for some time now, I am closing this bug. If you see behavior similar to that reported here in Fedora 12, please open a new bug. Thanks.

Note You need to log in before you can comment on or make changes to this bug.