Bug 537329

Summary:

ISW (Intel BIOS) RAID sets not discovered correctly at boot time

Product:

[Fedora] Fedora

Reporter:

Adam Williamson <awilliam>

Component:

mdadm

Assignee:

Doug Ledford <dledford>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

medium

Docs Contact:

Priority:

low

Version:

rawhide

CC:

agk, art-rh, beland, davidz, dledford, fp, harald, hdegoede, iarlyy, kzak, mishu, notting, plautrba, thomas.moschny

Target Milestone:

---

Keywords:

CommonBugs, Triaged

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

https://fedoraproject.org/wiki/Common_F12_bugs#intel-bios-raid-postinstall

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2010-07-21 07:19:26 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

552342

Bug Blocks:

507684

Attachments:

Description	Flags
mdadm-autoimsm.conf	none
PATCH: fixing activation of post install created imsm arrays	none

Description Adam Williamson 2009-11-13 07:23:45 UTC

As discussed with Hans via email, our current storage initialization implementation does not correctly discover Intel BIOS RAID sets at boot time. This kind of RAID set is correctly handled by anaconda if it exists at install time, but if an array is created post-install, the system will just see the separate drives, not the set.

Hans wanted me to file a bug on this and assign it to him so he does not forget about the issue. Filed on initscripts as that's where the current handling lives.

For now, the workaround is to add the parameter 'iswnomd', which disables the F12 feature of handling such arrays via mdraid rather than dmraid; dmraid detection is not broken, so the array will be available when booting this way.

Comment 1 Bill Nottingham 2009-11-16 18:44:41 UTC

If an array is created post-install, will not a proper mdadm.conf creation cause it to be properly detected on boot?

Comment 2 Adam Williamson 2009-11-17 02:17:40 UTC

I guess it would. Hans?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 3 Hans de Goede 2009-11-17 08:27:56 UTC

Yes it will be, however I have what I believe is to be a better proposal for
handling this.

Simply switch to always using incremental mdraid assembly from udev, so basically completely remove the mdadm call from rc.sysinit, and do the same for dracut
(adding harald to the CC).

So what I prospose is:

1) Stop writing an mdadm.conf from anaconda (everything we do there
   should work out of the box with just mdadm -I).

2) If there is an existing mdadm.conf do copy it to dracut's /etc, so that
   mdadm -I can use it to determine minor number of the mdraid set and even
   other special cases like a spare shared by multiple sets. If mdadm
   cannot handle these special cases in incremental mode that is an mdadm
   bug, so we will simply file it as such when we encounter this.

3) In dracut always use mdadm -I (atm when there is an mdadm.conf we don't)

3) Save the incremental assembly state somewhere so that udev started by
   rc.sysinit can continue where udev from the initrd left of
   (easiest way is to just patch mdadm to save its incremental map
    stuff under /dev/.mdadm)

4) Have the udev rules in the running system be the same as those in dracut
   (so always call mdadm -I when a new mdraid or isw member is found

5) Remove the "manual" mdadm startup from rc.sysinit


Having everything by udev seems to be the direction we want to head in anyways,
so I think this is a good path to take, and it will automatically make new isw sets work without requiring mdadm.conf creation for them.

This is all for F-13 of course (and maybe also RHEL-6)

Comment 4 Bill Nottingham 2009-11-17 16:14:20 UTC

That's what we did briefly in the F-10/F-11 timeframe, and the mdadm maintaners did not like it at all.

Comment 5 Hans de Goede 2009-11-17 16:23:03 UTC

Adding dledford to CC.

Doug, can you take a look at the setup proposed in comment #3, and at comment #4, and explain if #4 is still true and if so why / what your objections are.

Comment 6 Doug Ledford 2009-11-25 22:26:35 UTC

There are several objections.

First, mdadm is run on boxes that have access to SAN environments where the machines find devices that are intended to be mounted on various different machines and which also use md raid setups. You absolutely can *NOT* assume that every single md raid device found by a box *should* be started. You need the mdadm.conf file to help identify those situations. We could argue that they need to create a custom, hand crafted mdadm.conf in those situations I suppose. However, first and foremost the idea that we can *always* do without an mdadm.conf is just flat wrong. So you can totally drop any idea that we can make this a confless setup and rip out any conf handling code, it's always going to have to be there. That being the case, the argument for not creating a conf file is weaker.

Second, you refer to copying mdadm.conf to dracut's /etc so mdadm -I can find the minor number of the device. This totally ignores the fact that mdadm devices are (and have been for quite a while) moving away from being numbered to being named. As long as you are busy thinking solely about numbered devices, you are missing the future *completely*. As of mdadm-3.1.1, the default superblock is a version 1.1 superblock that does not even contain the super-minor field and does contain the name field. The flip has been switched upstream (and we should have flipped it ourselves long ago but we didn't), so time to get on the bandwagon and drop the numbered device usage.

The incremental map is already in /dev and should transfer from initrd/dracut to real root (we had to fix that for F11 actually).

We tried making the udev rules be the only source of startup and it didn't work. If you want to track down why the mdadm rules were not being fired during udev startup in rc.sysinit, and why we had to either A) do the manual call to mdadm -A or B) call udev trigger to redo the disk events in order to get the devices started then be my guest. But in F11 it simply didn't work, events were being lost.

Another thing to keep in mind is that the mdadm -As --run command in rc.sysinit allows us to differentiate between boot time necessary arrays and other random arrays. Specifically, it will start any array listed in mdadm.conf and the --run parameter will cause it to try very hard to start the array, including starting degraded arrays. Our mdadm -I option on the other hand is not used until after rc.sysinit is complete, which means it will never be starting arrays used in /etc/fstab, and therefore will only be starting optional arrays that maybe have been plugged into the system, and because of that we can (and do) choose to be more conservative and only start the array if all members are present and not to automatically start it if its got enough members to start degraded but not clean. The user can, however, go ahead and start it degraded manually if they wish.

To sum up my objections in a nutshell, I take objection to the idea that all arrays that a machine sees should be treated equally. The real world, and especially the server world, is not nearly that black and white. There are some arrays that we should never, ever fail to start. There are others that we should never start degraded automatically. This one size fits all plan that you laid out is great for the installer team, and makes sense when you are only concerned about personal workstations, but it's the wrong thing to do in the server arena by a long shot. Software raid is a complex topic to get right and you can't just wish that it were simple, decide to code things up as if it were simple, and have that make it simple, instead that just makes it wrong.

Comment 7 Hans de Goede 2009-11-26 10:04:12 UTC

(In reply to comment #6)
> There are several objections.
> 
> First, mdadm is run on boxes that have access to SAN environments where the
> machines find devices that are intended to be mounted on various different
> machines and which also use md raid setups.  You absolutely can *NOT* assume
> that every single md raid device found by a box *should* be started.

Sorry, but I do not find this a valid reason to simplify the mdadm storage activation and more importantly make it uniform with how other storage
subsystems do activation.

dmraid will scan and activate all devices it sees
lvm will scan and activatate all devices it sees

Both of these have the same problem in a SAN environment and the answer is
not to use all there resp. custom configuration files to limit scanning, the answer is to filter out SAN lun's at a higher level.

We are working on a filter UI for anaconda where during installation one
can filter out disks, which should not be used during the installation.

A likely continuation of this is to generate udev rules which will stop any device nodes to get created for these devices at all, which could then also
be used on the running system.

There are simply too many tools which just probe and do stuff to all disks they see.

Even with a normal config file, the way you advised to write it in your mail to the anaconda-devel-list, mdadm will still need to scan all disks (and all their partitions) in the SAN to see if there is a superblock and what the UUID is, with a 4000+ disk SAN this is going to cause a very significant startup delay.

So using mdadm.conf with "/sbin/mdadm -As --auto=yes --run" is not even a good
solution for the SAN case, and as such IMHO not a valid reason to not switch to
incremental assembly.


Also please keep the bigger picture in mind here, currently storage activation is a mess, esp. as there are ordering problems, for example currently
using an lv as a mdraid member wont work. I'm not saying that is a good idea, but it is an example of the ordering problems we are having. This is caused
by the static way our storage activation currently works.

Luckily there is light at the end of the tunnel, things seem to be moving to
a more event driven way, and for consistency sake it would be really good to
have mdraid move to this too, just like lvm is moving to udev based device scanning.

> You need
> the mdadm.conf file to help identify those situations.  We could argue that
> they need to create a custom, hand crafted mdadm.conf in those situations I
> suppose.  However, first and foremost the idea that we can *always* do without
> an mdadm.conf is just flat wrong.  So you can totally drop any idea that we can
> make this a confless setup and rip out any conf handling code, it's always
> going to have to be there.  That being the case, the argument for not creating
> a conf file is weaker.
> 

If you want anaconda to keep writing an mdadm.conf, that is fine with me. I
thought we would no longer need it for the pretty standard mdraid uses which can be configured inside anaconda, but if you would prefer for it to stay that is fine. What I would like to see, is for the
/sbin/mdadm -As --auto=yes --run
call to be removed from rc.sysinit and do all mdraid activation with mdadm -I

> Second, you refer to copying mdadm.conf to dracut's /etc so mdadm -I can find
> the minor number of the device.  This totally ignores the fact that mdadm
> devices are (and have been for quite a while) moving away from being numbered
> to being named.  As long as you are busy thinking solely about numbered
> devices, you are missing the future *completely*.  As of mdadm-3.1.1, the
> default superblock is a version 1.1 superblock that does not even contain the
> super-minor field and does contain the name field.  The flip has been switched
> upstream (and we should have flipped it ourselves long ago but we didn't), so
> time to get on the bandwagon and drop the numbered device usage.
> 

Ok, search replace minor number with the name, the rest still holds, that
it would be good to have mdadm.conf inside the dracut initrd to get the set name
as found under /dev/md

> The incremental map is already in /dev and should transfer from initrd/dracut
> to real root (we had to fix that for F11 actually).
> 

Ah, I didn't know that that is good to hear.

> We tried making the udev rules be the only source of startup and it didn't
> work.  If you want to track down why the mdadm rules were not being fired
> during udev startup in rc.sysinit, and why we had to either A) do the manual
> call to mdadm -A or B) call udev trigger to redo the disk events in order to
> get the devices started then be my guest.  But in F11 it simply didn't work,
> events were being lost.
> 

We've learned a lot about various udev peculiarities from dracut, so yes if
there are issues which are not caused by mdadm itself, then yes I'm willing to track them down.

> Another thing to keep in mind is that the mdadm -As --run command in rc.sysinit
> allows us to differentiate between boot time necessary arrays and other random
> arrays.  Specifically, it will start any array listed in mdadm.conf and the
> --run parameter will cause it to try very hard to start the array, including
> starting degraded arrays.  Our mdadm -I option on the other hand is not used
> until after rc.sysinit is complete, which means it will never be starting
> arrays used in /etc/fstab, and therefore will only be starting optional arrays
> that maybe have been plugged into the system, and because of that we can (and
> do) choose to be more conservative and only start the array if all members are
> present and not to automatically start it if its got enough members to start
> degraded but not clean.  The user can, however, go ahead and start it degraded
> manually if they wish.
> 

This has already been solved in dracut, we wait for udev to settle (so for
all disks to be found / all drivers to be done probing) and then
force any arrays to run if they have not auto started already as they are complete. We can do the same in rc.sysinit, and to be on the safe side we could limit this to only forcing arrays in mdadm.conf to run.

> To sum up my objections in a nutshell, I take objection to the idea that all
> arrays that a machine sees should be treated equally.  The real world, and
> especially the server world, is not nearly that black and white.  There are
> some arrays that we should never, ever fail to start.  There are others that we
> should never start degraded automatically.

I agree that "mdadm -I" should never start something degraded, and as said we
can solve the this array is needed to boot, so start it in degraded mode if
we cannot find all disks case, by doing so explicitly after udev has settled.

> This one size fits all plan that
> you laid out is great for the installer team

The installer does not care about how the storage gets activated after
the first reboot. I care about this because when people have a raid issue they
tend to look at me, independent of whether that issue is caused by anaconda, dracut or rc.sysinit.

> , and makes sense when you are only
> concerned about personal workstations, but it's the wrong thing to do in the
> server arena by a long shot.  Software raid is a complex topic to get right and
> you can't just wish that it were simple, decide to code things up as if it were
> simple, and have that make it simple, instead that just makes it wrong.  

Simple is not necessarily wrong, your main argument against mdadm -I seems to be SAN's and as I've explained that should be solved at another level in the stack, as working around this by only activating mdraid sets found in mdadm.conf does not help lvm, nor does it stop the huge boot delay caused by
every storage tool scanning all the disks during boot.

Comment 8 David Zeuthen 2009-11-26 16:04:40 UTC

(In reply to comment #7)
> > First, mdadm is run on boxes that have access to SAN environments where the
> > machines find devices that are intended to be mounted on various different
> > machines and which also use md raid setups.  You absolutely can *NOT* assume
> > that every single md raid device found by a box *should* be started.
> 
> Sorry, but I do not find this a valid reason to simplify the mdadm storage
> activation and more importantly make it uniform with how other storage
> subsystems do activation.
> 
> dmraid will scan and activate all devices it sees
> lvm will scan and activatate all devices it sees
> 
> Both of these have the same problem in a SAN environment and the answer is
> not to use all there resp. custom configuration files to limit scanning, the
> answer is to filter out SAN lun's at a higher level.

I respectfully disagree. I think there are two things here

 1. Deciding _when_ it's a good time to assemble a set of block
    devices (into e.g. a RAID array or LVM volume group)

 2. Deciding _if_ said set of block devices should be assembled
    at all

Historically, both 1. and 2. has been done via boot-up scripts which is problematic because you may need to assemble from bits that also needs to be assembled. I think all that Doug is asking for is that we only do 2. after checking a configuration file. Ditto for other things needing assembly - such as LVM.

The solution, I think, here is to trigger assembly from udev rules (if assembly is wanted, cf. looking up a config file). We need to do this anyway as it doesn't really work doing things at boot-up - that is, if we want things like (md-raid on LVM) and (LVM on md-raid) to work at the same time.

From a 50,000 feet point of view doing assembly at start-up is just not going work - it's a holdover from the days from before disks were hotpluggable. We need the OS to be able to handle disks being attached and detached and any time. With things like udev we are 90% there - we need to do the last 10%.

> We are working on a filter UI for anaconda where during installation one
> can filter out disks, which should not be used during the installation.
> 
> A likely continuation of this is to generate udev rules which will stop any
> device nodes to get created for these devices at all, which could then also
> be used on the running system.

FYI, this is likely to not work as applications will trawl sysfs and then malfunction if there's no device node. Also, if you continue to pursue this line of work please do it upstream - e.g. on linux-hotplug list. Also note that we've recently have removing functionality in udev for doing this

http://git.kernel.org/?p=linux/hotplug/udev.git;a=commit;h=cdae488a3fbca5a61b3f8ea0651730cfa2da9cb0

for reasons stated here

 http://article.gmane.org/gmane.linux.hotplug.devel/15016
 http://article.gmane.org/gmane.linux.hotplug.devel/15021

Btw, as you know there's already a couple of other ways of doing LUN filtering - I don't think we need a third (per-system) way as it will complicate already complicated setups. 

In reality I think most SAN admins are careful setting up LUN filtering but in the event they make a mistake... we really don't want to automatically assemble something and cause data loss.

Btw, I don't think local config files specifying what to assemble and what not to assemble are intrinsically bad - we just need a better UI (unless people are happy editing config files - which it seems like they are not) so it is easy for the administrator to specify what should be assembled and what shouldn't - and this is what Anaconda, Palimpsest and other efforts are about.

Hoping my efforts in being constructive are not in vain. Thanks, David.

Comment 9 Hans de Goede 2009-11-26 18:45:32 UTC

(In reply to comment #8)
> I respectfully disagree. I think there are two things here
> 
>  1. Deciding _when_ it's a good time to assemble a set of block
>     devices (into e.g. a RAID array or LVM volume group)
> 
>  2. Deciding _if_ said set of block devices should be assembled
>     at all
> 
> Historically, both 1. and 2. has been done via boot-up scripts which is
> problematic because you may need to assemble from bits that also needs to be
> assembled. I think all that Doug is asking for is that we only do 2. after
> checking a configuration file. Ditto for other things needing assembly - such
> as LVM.
> 

I agree that we need to decide whether or not a device should be assembled at
all, I just disagree that this should happen at the lvm / mdraid / dmraid / dmcrypt / what ever storage tool of the day level.

As most of the time sysadmins want to say ignore this entire disk, and they
don't want to have to enter this in a gazillion different configuration files,
even if we put a nice UI around it, having to put this information in
multiple files is wrong <period>.

> The solution, I think, here is to trigger assembly from udev rules (if assembly
> is wanted, cf. looking up a config file). We need to do this anyway as it
> doesn't really work doing things at boot-up - that is, if we want things like
> (md-raid on LVM) and (LVM on md-raid) to work at the same time.
> 

I think we agree then, my solution too is to assemble form udev rules, the only
thing we seem to disagree on is what to do with unknown (as in the set is not listed in mdadm.conf) raid members, I say by default we should assemble them.

The only case where this can cause issues is the SAN case, and I believe
we need a generic solution here which allows admin to specify a filter,
which disks to use (or not to use).

> > We are working on a filter UI for anaconda where during installation one
> > can filter out disks, which should not be used during the installation.
> > 
> > A likely continuation of this is to generate udev rules which will stop any
> > device nodes to get created for these devices at all, which could then also
> > be used on the running system.
> 
> FYI, this is likely to not work as applications will trawl sysfs and then
> malfunction if there's no device node. Also, if you continue to pursue this
> line of work please do it upstream - e.g. on linux-hotplug list. Also note that
> we've recently have removing functionality in udev for doing this
> 
> http://git.kernel.org/?p=linux/hotplug/udev.git;a=commit;h=cdae488a3fbca5a61b3f8ea0651730cfa2da9cb0
> 
> for reasons stated here
> 
>  http://article.gmane.org/gmane.linux.hotplug.devel/15016
>  http://article.gmane.org/gmane.linux.hotplug.devel/15021
> 

Ok, so we might need another way of doing this, that does not change the point
that implementing the policy of whether or not to use a disk separately in
all the different config files is wrong, we need a single place to configure
this and then all tools should obey this, preferably by making the disks not
available to them at all, like we do with partitions on (raw) disks which are 
are part of a BIOS RAID set.

> In reality I think most SAN admins are careful setting up LUN filtering but in
> the event they make a mistake... we really don't want to automatically assemble
> something and cause data loss.
> 

Hmm, we may need to look at this on a case by case basis, notice that
currently lvm does bring up everything in sight, so there is an
inconsistency here, one which I would like to fix.

Also note that anaconda will bring up everything under the sun while run
on a system (which is why we are adding lun filtering).

Also try to think about scenarios like the livecd, where for a normal
workstation with software raid it is very convenient for the end
user if his raid sets are just there ready to use, just like normal filesystems
on his disk and lvm are.

Comment 10 David Zeuthen 2009-11-27 16:00:10 UTC

(In reply to comment #9)
> I agree that we need to decide whether or not a device should be assembled at
> all, I just disagree that this should happen at the lvm / mdraid / dmraid /
> dmcrypt / what ever storage tool of the day level.
> 
> As most of the time sysadmins want to say ignore this entire disk, and they
> don't want to have to enter this in a gazillion different configuration files,
> even if we put a nice UI around it, having to put this information in
> multiple files is wrong <period>.

Sounds great to me. Remember in Portland during an impromptu hallway-session with yourself and Scott (the one where I was shouting at Scott :-)? I proposed introducing a single config file + library for doing this. I think I called it /etc/mounts.conf but I guess a name like /etc/storage.conf is better. I don't know.

Anyway, in a nutshell this file would supplement (and in long term probably replace) /etc/fstab insofar that it would contain configuration for exactly what devices to assemble (e.g. mdraid, dmraid, lvm, dmcrypt) and what devices to mount (and where). There would be a library, say libstorageconf.so, that provides access for enumerating, adding and removing items from the config file. Ideally this would live in util-linux-ng. There would also be a set of tools for inspecting the /etc/storage.conf. The library/tools would need to answer questions like

 - can this thing be assembled in non-degraded mode?
 - can this thing be assembled in degraded mode?
 - should this thing be assembled at all?
 - are things like settings available to use for assembling the thing?
   (settings could also include LUKS pass-phrases stored on the local
    system - handy for use-cases where you trust your server but not
    your SAN. Settings would also include mount options, mdraim
    assembly options and so on)

and possibly others as well (with thing == dm-raid, md-raid, lvm, luks, multipath and so on).

With such an infrastructure available the udev rules would simply just use this config file (through a tool) to figure out if action should be taken.

In addition mount(8) could use this so things like

 # mount MDRAID_NAME=Saturn /mnt/somewhere -oro

which would assemble the md-raid array with the name "Saturn" (if not already assembled) and then mount it read-only at /mnt/somewhere. That would be really handy. Ditto for other things needing assembly.

I haven't thought a lot about this in detail (only been brainstorming about it) but I think something like this could work. Thoughts?

> Also try to think about scenarios like the livecd, where for a normal
> workstation with software raid it is very convenient for the end
> user if his raid sets are just there ready to use, just like normal
> filesystems on his disk and lvm are.  

Actually it is easy to do this on Fedora 12 - you can start/stop md-raid sets and unlock/lock LUKS devices from both the file manager and from Nautilus:

http://people.freedesktop.org/~david/nautilus-palimpsest-assembly-shots/

Note that we show the icon for the mdraid array even when it is not assembled - we do this by looking at the meta-data for each component. In a way it doesn't really *matter*, UI-wise, whether the array is assembled or not - the user can simply click it and we'll assemble it if needed.

(Also, for degraded mdraid arrays (e.g. if a component is missing), you will also get prompted if you really want to start the array. Similar integration is planned for LVM and, ideally, dm-raid sets. And also for things like iSCSI you will be able to connect/disconnect to the LUN via Nautilus and Palimpsest.)

> Also note that anaconda will bring up everything under the sun while
> run on a system (which is why we are adding lun filtering).

Surely Anaconda can do this the same way Palimpsest/DKD does? I mean, for md-raid you'd query the udev database for md-raid components and use this to figure out what available (whole or parts of) md-raid arrays are available? The bespoke libstorageconf.so library could even make this easy - e.g. it could include helper functions to discover this.

I concede all this is a lot of work but I think long-term this is the direction that storage on Linux should be heading in.

Comment 11 Hans de Goede 2009-12-01 12:57:25 UTC

(In reply to comment #10)
> > Also note that anaconda will bring up everything under the sun while
> > run on a system (which is why we are adding lun filtering).
> 
> Surely Anaconda can do this the same way Palimpsest/DKD does? I mean, for
> md-raid you'd query the udev database for md-raid components and use this to
> figure out what available (whole or parts of) md-raid arrays are available?

We already do, and then when we have found those that way, what do we do with
them we bring them up (to see what is on there so we can show this to the user
in the partitioning gui / when searching for existing installations to upgrade
or rescue).

Hence my comment of "anaconda will bring up everything under the sun while
run on a system"

And no we cannot simply show the user a raidset icon and then only activate the raidset when clicked, we need to know what is on there:
1) To find existing installations to offer to upgrade them
2) As 1 but then for rescue
3) For autopartitioning purposes
4) So that people can specify existing raid devices, or even existing lv's
   on top of mdraid inside kickstart files

This is what anaconda has been doing for years, the only new thing is we now do this udev driven. And yes this means we have the SAN issue, hence the storage (lun / disk) filtering we are currently working on.

Note that in a lot of cases the proper answer to the SAN scenario is use SAN level acl's so the machine only sees the disks it is supposed to touch. Only when that somehow cannot be done, storage filtering should come in to play.

Comment 12 David Zeuthen 2009-12-01 14:06:24 UTC

(In reply to comment #11)
> This is what anaconda has been doing for years, the only new thing is we now do
> this udev driven. And yes this means we have the SAN issue, hence the storage
> (lun / disk) filtering we are currently working on.

Right, I remember seeing (and liking) Mo's mock-ups for that. And I very much agree that it is the user experience we want - e.g. we want the installer to be smart: assemble things and automatically search for installations to upgrade. Because that's (partly) what the installer is for - upgrading installations.

On the other hand a file manager or disk utility app (or in general, any app) won't need to automatically assemble anything because it is not concerned with searching for previous installations to upgrade. So we won't need autoassembly for this thing. (said apps does need to show the presence of devices/components, needs to offer the possibility to assemble said devices/components and does need to provide an option to automatically do this at device detection or boot time and all that jazz.)

So... this does not imply that the OS always needs to autoassemble something from every storage device it sees. Anaconda should be able to just assemble whatever it needs after the user has gone through and selected what storage devices makes sense for the box. E.g. the way I could see this work is

 - The OS boots and all attached storage is detected by udev
   - no autoassembly is performed unless some config file
     specifically says so

 - Anaconda starts
   - user goes through the storage filtering UI
   - anaconda assembles bits that only are on the selected storage devices
   - anaconda goes ahead and searches for installations to upgrade

I guess I'm really saying that the installer is the special case here - other parts of the OS won't need to e.g. search for previous OS installations - so autoassembly / automounting of anything under the sun is not needed (and I guess, generally not wanted).

> Note that in a lot of cases the proper answer to the SAN scenario is use SAN
> level acl's so the machine only sees the disks it is supposed to touch. Only
> when that somehow cannot be done, storage filtering should come in to play.

Yeah, very much agree. 

But it's also about playing things safe - autoassembly and automounting is generally a very tricky business. It is, basically, a guessing game and you need to deal with stale signatures, probing priority and signature ambiguities. And when you get it wrong, you generally destroy peoples data.

FWIW, this whole thing first came up around 2003/2004 when we started automounting storage devices for real (prior to that we relied on whitelisted storage devices). Back then there was a lot of pushback for doing this and concerns about data loss - and even a few data loss cases when our early fs probing code got it wrong and mounted raid components. FWIW, nowadays the default install only automounts USB and Firewire gizmos, never any filesystems from e.g. SATA, SAS and FC drives. If anything, I think the lesson learned is that we need to be careful and not automagically automount/autoassemble any device under the sun. It is just too dangerous.

(Sorry if this comment is a bit long / rambly!)

Comment 13 Doug Ledford 2009-12-01 16:26:26 UTC

(In reply to comment #7)
> (In reply to comment #6)
> > There are several objections.
> > 
> > First, mdadm is run on boxes that have access to SAN environments where the
> > machines find devices that are intended to be mounted on various different
> > machines and which also use md raid setups.  You absolutely can *NOT* assume
> > that every single md raid device found by a box *should* be started.
> 
> Sorry, but I do not find this a valid reason to simplify the mdadm storage
> activation and more importantly make it uniform with how other storage
> subsystems do activation.
> 
> dmraid will scan and activate all devices it sees

Irrelevant.  dmraid has never supported anything other than BIOS raid devices which never exist on SAN disks or other non-BIOS accessible disks, so it is always safe to assume you can and should start all dmraid devices.

> lvm will scan and activatate all devices it sees
> 
> Both of these have the same problem in a SAN environment and the answer is
> not to use all there resp. custom configuration files to limit scanning, the
> answer is to filter out SAN lun's at a higher level.

While that's certainly preferred, I don't think it can be assumed that lun filtering at a higher level is always possible, nor should we allow the fact that it's preferred to cause us to remove support for filtering devices out ourselves via other means.

With the addition of ddf and imsm superblock format support in mdadm, we actually added the ability to filter on what a person might consider more relevant details.  We can control auto assembly in the mdadm.conf file based upon the type of superblock we are looking at.  This line:

AUTO +ddf +imsm -0.90 -1.x

would cause mdadm -I to assemble all ddf or imsm based arrays whether they were listed in mdadm.conf or not, but to only assemble md raid superblock 0.90 or 1.x arrays if they have an ARRAY line in mdadm.conf.  This is actually preferable to lun filtering simply because we don't care, and shouldn't care, what device name/lun a ddf or imsm device has, we know it's a BIOS device and we should support it, and likewise for our listed md raid devices, but it would filter out all non-listed md raid arrays that might exist on a SAN.  IMO, this is far preferable to static lun filtering.

> We are working on a filter UI for anaconda where during installation one
> can filter out disks, which should not be used during the installation.
> 
> A likely continuation of this is to generate udev rules which will stop any
> device nodes to get created for these devices at all, which could then also
> be used on the running system.
> 
> There are simply too many tools which just probe and do stuff to all disks they
> see.
> 
> Even with a normal config file, the way you advised to write it in your mail to
> the anaconda-devel-list, mdadm will still need to scan all disks (and all their
> partitions) in the SAN to see if there is a superblock and what the UUID is,
> with a 4000+ disk SAN this is going to cause a very significant startup delay.

No, it wouldn't.  It would scan until it has started the devices listed in mdadm.conf and then stop.  If there is no mdadm.conf, it will essentially do nothing.  The --scan option to mdadm does not mean scan all devices, it means scan mdadm.conf for information not passed in on the command line (like the mdadm devices you want assembled).

> So using mdadm.conf with "/sbin/mdadm -As --auto=yes --run" is not even a good
> solution for the SAN case, and as such IMHO not a valid reason to not switch to
> incremental assembly.

It works just fine in the SAN case.

> Also please keep the bigger picture in mind here, currently storage activation
> is a mess, esp. as there are ordering problems, for example currently
> using an lv as a mdraid member wont work. I'm not saying that is a good idea,
> but it is an example of the ordering problems we are having. This is caused
> by the static way our storage activation currently works.

No argument.

> Luckily there is light at the end of the tunnel, things seem to be moving to
> a more event driven way, and for consistency sake it would be really good to
> have mdraid move to this too, just like lvm is moving to udev based device
> scanning.

Argument.  You can be event based, or you can do static activation like we have in the past and simply reiterate the activation loop until nothing new pops up.  Either way would solve the ordering issues between md/lvm/crypto/etc.

> > You need
> > the mdadm.conf file to help identify those situations.  We could argue that
> > they need to create a custom, hand crafted mdadm.conf in those situations I
> > suppose.  However, first and foremost the idea that we can *always* do without
> > an mdadm.conf is just flat wrong.  So you can totally drop any idea that we can
> > make this a confless setup and rip out any conf handling code, it's always
> > going to have to be there.  That being the case, the argument for not creating
> > a conf file is weaker.
> > 
> 
> If you want anaconda to keep writing an mdadm.conf, that is fine with me. I
> thought we would no longer need it for the pretty standard mdraid uses which
> can be configured inside anaconda,

I find this particular statement to be absurd.  Not because we shouldn't be able to do without an mdadm.conf, but because it is *precisely* "the standard mdraid uses which can be configured inside anaconda" that make us need one!  Anaconda uses old version 0.90 superblocks that only store the preferred minor number, no name, and no homehost.  Anaconda starts the numbering at 0 and works upwards.  So if we ever plug an array from one host into another for the purpose of, say, transferring data, then anaconda has created a situation in which we are likely to have the absolute *highest* degree of probability that there will be a name conflict, and has created things in such a way that we would have the absolute *lowest* degree of confidence that we could tell our own md0 array from an md0 we just plugged in!  To be honest, I feel like you are rushing the cart in front of the horse.  Yes, md raid arrays could be automatically brought up, and yes we could do away with the mdadm.conf file, but only after we've started creating arrays with a little intelligence and not making them all carbon copies of each other from machine to machine!

> but if you would prefer for it to stay that
> is fine. What I would like to see, is for the
> /sbin/mdadm -As --auto=yes --run
> call to be removed from rc.sysinit and do all mdraid activation with mdadm -I

It's not what I prefer.  It's what *needs* to be done because of how anaconda has created raid arrays for years!

> > Second, you refer to copying mdadm.conf to dracut's /etc so mdadm -I can find
> > the minor number of the device.  This totally ignores the fact that mdadm
> > devices are (and have been for quite a while) moving away from being numbered
> > to being named.  As long as you are busy thinking solely about numbered
> > devices, you are missing the future *completely*.  As of mdadm-3.1.1, the
> > default superblock is a version 1.1 superblock that does not even contain the
> > super-minor field and does contain the name field.  The flip has been switched
> > upstream (and we should have flipped it ourselves long ago but we didn't), so
> > time to get on the bandwagon and drop the numbered device usage.
> > 
> 
> Ok, search replace minor number with the name, the rest still holds, that
> it would be good to have mdadm.conf inside the dracut initrd to get the set
> name
> as found under /dev/md

Actually, no.  If you made the arrays with the proper homehost:name syntax on version 1.x superblocks, then you *wouldn't* need the mdadm.conf in dracut, only the hostname of the machine.  You could still use an mdadm.conf for extra protection against a host name change, but that's all you would need it for.

> Simple is not necessarily wrong, your main argument against mdadm -I seems to
> be SAN's

SANs and history, meaning that moving a hot plug array from machine to machine is not safe at all currently due to version 0.90 superblocks and over dependence on minor numbers instead of names.

> and as I've explained that should be solved at another level in the
> stack, as working around this by only activating mdraid sets found in
> mdadm.conf does not help lvm, nor does it stop the huge boot delay caused by
> every storage tool scanning all the disks during boot.  

And activating everything under the sun when you've made everything under the sun identical for years on end without a config file to tell you which carbon copy you are actually looking for is dangerous.

Comment 14 Hans de Goede 2009-12-01 19:21:23 UTC

(In reply to comment #13)
> Irrelevant.  dmraid has never supported anything other than BIOS raid devices
> which never exist on SAN disks or other non-BIOS accessible disks, so it is
> always safe to assume you can and should start all dmraid devices.
> 

Ok, so then at least we are in agreement that we should make some changes to autostart isw raid sets even if not present in /etc/mdadm.conf, which is the main reason for the existence of this bug. As for my proposal to achieve this by doing major surgery to the way we activate mdraid handled raid sets, I take it back.

> With the addition of ddf and imsm superblock format support in mdadm, we
> actually added the ability to filter on what a person might consider more
> relevant details.  We can control auto assembly in the mdadm.conf file based
> upon the type of superblock we are looking at.  This line:
> 
> AUTO +ddf +imsm -0.90 -1.x
> 

Ah this is interesting, currently the mean use for "mdadm -I" I've seen is
from udev rules, and then for example (this is an isw example, hence the
whole disk usage):
mdadm -I /dev/sda

Would the above line work in this case too, iow will sda only be incremental assembled if it has imsm or ddf metadata and not if it has mdraid native metadata ?

Although I don't think that is actually needed, I think we can simply solve this by adding the following line to a udev rule somewhere:

SUBSYSTEM=="block", ACTION=="add", ENV{ID_FS_TYPE}=="isw_raid_member", \
  RUN+="mdadm -I $env{DEVNAME}"

Does this sound acceptable to you?

> 
> And activating everything under the sun when you've made everything under the
> sun identical for years on end without a config file to tell you which carbon
> copy you are actually looking for is dangerous.  

I see your point I had not taking the moving disks which are part of an native metadata mdraid set around scenario into account and I agree that using
newer metadata formats + setting homehost will help here. I'm afraid it is not
a 100% solution though as people often leave the hostname set to localhost.

Comment 15 Doug Ledford 2009-12-01 19:58:45 UTC

(In reply to comment #14)
> (In reply to comment #13)
> > Irrelevant.  dmraid has never supported anything other than BIOS raid devices
> > which never exist on SAN disks or other non-BIOS accessible disks, so it is
> > always safe to assume you can and should start all dmraid devices.
> > 
> 
> Ok, so then at least we are in agreement that we should make some changes to
> autostart isw raid sets even if not present in /etc/mdadm.conf, which is the
> main reason for the existence of this bug.

Yes, I can agree to that.

> As for my proposal to achieve this
> by doing major surgery to the way we activate mdraid handled raid sets, I take
> it back.

Good, because that's where we disagreed ;-)

> > With the addition of ddf and imsm superblock format support in mdadm, we
> > actually added the ability to filter on what a person might consider more
> > relevant details.  We can control auto assembly in the mdadm.conf file based
> > upon the type of superblock we are looking at.  This line:
> > 
> > AUTO +ddf +imsm -0.90 -1.x
> > 
> 
> Ah this is interesting, currently the mean use for "mdadm -I" I've seen is
> from udev rules, and then for example (this is an isw example, hence the
> whole disk usage):
> mdadm -I /dev/sda
> 
> Would the above line work in this case too, iow will sda only be incremental
> assembled if it has imsm or ddf metadata and not if it has mdraid native
> metadata ?

Yes.

> Although I don't think that is actually needed, I think we can simply solve
> this by adding the following line to a udev rule somewhere:
> 
> SUBSYSTEM=="block", ACTION=="add", ENV{ID_FS_TYPE}=="isw_raid_member", \
>   RUN+="mdadm -I $env{DEVNAME}"
> 
> Does this sound acceptable to you?

If I understand you correctly, either sounds acceptable.

> newer metadata formats + setting homehost will help here. I'm afraid it is not
> a 100% solution though as people often leave the hostname set to localhost.  

In the event that someone doesn't set a hostname for a machine, it's always possible to assign the machine a unique, random hostname value, enter that in the mdadm.conf file as the HOMEHOST entry, use it in the --homehost field when creating arrays, and they will work as expected.  In addition, new arrays created by the user will also get the same homehost setting if you also enter the random homehost entry in the CREATE line in mdadm.conf.  So, while a set host name is preferred, we can make do with what amounts to a machine UUID of sorts.

Comment 16 arth 2009-12-01 22:39:49 UTC

(In reply to comment #13)
> 
> Irrelevant.  dmraid has never supported anything other than BIOS raid devices
> which never exist on SAN disks or other non-BIOS accessible disks, so it is
> always safe to assume you can and should start all dmraid devices.

Not necessarily.  A user may have removed one or more former dmraid device from an array and want to use them as standalones or in a softraid.

Or a user may have used dd to make a verbatim copy of a disk, in order to be able to quickly revert.

Or (as happened to me), a server died, and the drives were salvaged and put into another machine.  They carry the markers, but should NOT be started as dmraid/mdraid devices.  To get these drives usable under Fedora, I had to calculate just where on the disk the markers would be, and dd from /dev/zero to zonk the blocks, otherwise the drives would be detected and claimed.

Scenarios where it is guaranteed to be safe may be few and far between.

Comment 17 Doug Ledford 2009-12-02 00:02:14 UTC

(In reply to comment #16)
> (In reply to comment #13)
> > 
> > Irrelevant.  dmraid has never supported anything other than BIOS raid devices
> > which never exist on SAN disks or other non-BIOS accessible disks, so it is
> > always safe to assume you can and should start all dmraid devices.
> 
> Not necessarily.  A user may have removed one or more former dmraid device from
> an array and want to use them as standalones or in a softraid.

That's perfectly fine and why both mdadm and dmraid supply commands to zero out the ddf or imsm superblocks and revert the drives back to a sane state.  However, to ignore those particular superblocks because a drive moved machines is like ignoring a partition table.  It's a standard, it doesn't make sense to ignore it.  It makes sense to wipe it out of you aren't using it any more.

> Or a user may have used dd to make a verbatim copy of a disk, in order to be
> able to quickly revert.

In which case they should have done the dd of the raid volume, not the bare drive, and that would have copied everything *but* the superblock and not caused a problem.

> Or (as happened to me), a server died, and the drives were salvaged and put
> into another machine.  They carry the markers, but should NOT be started as
> dmraid/mdraid devices.  To get these drives usable under Fedora, I had to
> calculate just where on the disk the markers would be, and dd from /dev/zero to
> zonk the blocks, otherwise the drives would be detected and claimed.
> 
> Scenarios where it is guaranteed to be safe may be few and far between.  

Again, the tools were there to clear the disk superblocks and have them treated as normal disks again.  Just because they can be used doesn't mean we should ignore them.  By far, the more common case is if they exist, they should be used.  Scenarios like you paint are rare and easily handled with the tools at hand.

Comment 18 David Zeuthen 2009-12-02 00:54:24 UTC

(In reply to comment #17)
> That's perfectly fine and why both mdadm and dmraid supply commands to zero out
> the ddf or imsm superblocks and revert the drives back to a sane state. 
> It's a standard, it doesn't make sense to
> ignore it.  It makes sense to wipe it out of you aren't using it any more.

In reality no-one clears superblocks.

> However, to ignore those particular superblocks because a drive moved machines
> is like ignoring a partition table.  

No one clears partition tables either. Or filesystem signatures.

People will just plug in old disks from old machines - it's not like they are thinking "oh, wait, I should clear these magic signatures" either when decommission the disk or just before plugging it in. 

And when people do plug the disks in they will be unpleasantly surprised that they suddenly can't access the disks before of autoassembly.

What I really fail to understand is the "why". Why would we even open ourselves to the can of worms that autoassembly entails? We've already established we can't do it for everything - thus, doing it for only something is just confusing in addition to being dangerous. It has also been established that it is not necessary (the installer is fully capable of activating whatever it needs - we don't need the OS to mimic this behavior). And, further, it has been suggested how to create a much more user friendly workflow (e.g. 'mount MD_RAID_NAME="My DVR Backups" /mnt/video' cf comment 10).

(It's not that I care too much about this from the standpoint of the DKD/gnome-disk-utility stuff I'm working on - it really doesn't matter whether something is automatically assembled or not since that whole thing is about providing an UI to turn things on/off. I'm more concerned about Fedora or RHEL w/ autoassembly suddenly breaking huge boxes, SANS, disrupting existing workflows and creating nasty surprises. So feel free to ignore me. )

    David

Comment 19 Hans de Goede 2010-01-04 18:44:18 UTC

Bill,

I'm going to attach an mdadm-imsmauto.conf which contains a config telling mdadm to only autoassemble imsm sets, and is otherwise empty, this is needed as with
a non empty mdamd.conf mdadm -As will only assemble sets listed in the
mdadm.conf and we want to also assemble imsm sets newly created in the BIOS after installation.

I will also be attaching an initscripts patch which calls mdadm with this configfile. I've tested that with this patch Intel  BIOS RAID sets which were created after installation will get activated properly.

re-assigning back to you for adding these changes to initscripts.

Regards,

Hans

Comment 20 Hans de Goede 2010-01-04 18:45:13 UTC

Created attachment 381614 [details]
mdadm-autoimsm.conf

Comment 21 Hans de Goede 2010-01-04 18:45:52 UTC

Created attachment 381615 [details]
PATCH: fixing activation of post install created imsm arrays

Comment 22 Hans de Goede 2010-01-04 18:46:51 UTC

Note that this patch depends on bug 552342 being fixed, as that fixes a bug in mdadm when the auto keyword is used in the config file.

Comment 23 Bill Nottingham 2010-01-04 20:01:23 UTC

Does that need to be a packaged conf file, as opposed to just some command line options?

Also, that specifically seems like the sort of thing that could (and should) be udev-triggered.

Comment 24 Hans de Goede 2010-01-05 08:32:54 UTC

(In reply to comment #23)
> Does that need to be a packaged conf file, as opposed to just some command line
> options?
> 

Yes (unless we patch mdadm to give it some cmdline options to get the same behavior).

mdadm has 2 ways of assembling:

1) Assemble what ever is in the config file (this happens when the config file
   is not empty wrt array statements)

2) Auto assembly

We want it to do 2 here, so there are 2 reasons to use a packaged config file:

- To get it to auto assemble we must have an empty config file (wrt array
  statements)
- We only want to do 2 here for imsm arrays, which can be specified using the
  AUTO keyword in the config file, but not on the cmdline.


> Also, that specifically seems like the sort of thing that could (and should) be
> udev-triggered.  

That was my first idea to handle this too, but using udev means doing incremental assembly, and we started out doing this in dracut. but ended up
falling back to normal assembly (using mdadm.conf, and both udevadm settle and
modprove scsi_wait_scan to make sure we have all the disks before trying assembly) as we hit various issues with incremental assembly.

Comment 25 Bill Nottingham 2010-01-05 16:28:58 UTC

I suppose you'll never have USB/firewire/etc. isw disks, but that's not going to be workable long-term.

Comment 26 Hans de Goede 2010-01-06 08:10:22 UTC

(In reply to comment #25)
> I suppose you'll never have USB/firewire/etc. isw disks, but that's not going
> to be workable long-term.  

Which part do you consider not workable ? The needing a special config file
(that can be fixed by an mdadm patch adding some cmdline options), or the not using udev ?

I was a proponent of doing all mdraid activation from udev myself, but as the
above discussion shows, that has been veto'd by the mdadm maintainer, and I must admit I have moved over to have more or less his view of things. As for only activating imsm mdraid sets from udev, I don't think that mixing and matching
udev based with non udev based activation is a good plan.

For now (for both F-13 and RHEL-6) I would really like to see this fix included,
this would fix amongst other things the rather ugly hack where we now still use dmraid (instead of mdraid) for Intel BIOS RAID sets from the livecd, because they
or not found otherwise. And ofcourse fix the issue with detection of post installation created sets.

Comment 27 David Zeuthen 2010-01-06 16:41:17 UTC

(In reply to comment #26)
> (In reply to comment #25)
> > I suppose you'll never have USB/firewire/etc. isw disks, but that's not going
> > to be workable long-term.  
> 
> Which part do you consider not workable ? The needing a special config file
> (that can be fixed by an mdadm patch adding some cmdline options), or the not
> using udev ?

I think Bill is referring to the fact that this means that we will be doing auto-assembly on *all* disks with ISW signatures - e.g. not only when attached to the onboard SATA controller (e.g. the controller the BIOS also sees)... but also when attached via USB/Firewire enclosures and, for that matter, visible from a SAN or attached via a big SAS enclosure.

Also, I think the "not workable long-term" comment refers to the fact that users, especially people doing hardware testing, often moves a lot of disks around. For example, it's not far fetched that a user might use five disks for testing ISW raid and then move these disks to e.g. a SAN. Personally I do this all the time. But I don't know if our user base is likely to do this. I *think* so but this is only a guess.

Again, ideally we wouldn't auto-assemble or auto-mount *anything* - and with the way we're going to handle multipathing in the future (see bug 548874), we are actually going to rely on only doing assembly when the user has *explicitly* requested it.

Anyway, I didn't mean to speak for Bill on this - it's possible he meant something else.

Comment 28 Bill Nottingham 2010-01-06 18:12:10 UTC

(In reply to comment #26)
> (In reply to comment #25)
> > I suppose you'll never have USB/firewire/etc. isw disks, but that's not going
> > to be workable long-term.  
> 
> Which part do you consider not workable ? The needing a special config file
> (that can be fixed by an mdadm patch adding some cmdline options), or the not
> using udev ?

Relying on 'udevadm settle' and 'scsi_wait_scan' before assembling does not work on USB, firewire, or similar hotplug buses where there's actually a delay that you can't track between when the host controller loads, and the disks appear.

It works in this specific case because ISW disks aren't (normally) connected via usb/firewire. But if this is going to be extended to formats on USB or other hotplug devices, and you want auto-assembly, you can't just run this command at this one place in rc.sysinit and expect it to reliably work.

Comment 29 Hans de Goede 2010-01-06 19:21:29 UTC

(In reply to comment #28)
> (In reply to comment #26)
> > (In reply to comment #25)
> > > I suppose you'll never have USB/firewire/etc. isw disks, but that's not going
> > > to be workable long-term.  
> > 
> > Which part do you consider not workable ? The needing a special config file
> > (that can be fixed by an mdadm patch adding some cmdline options), or the not
> > using udev ?
> 
> Relying on 'udevadm settle' and 'scsi_wait_scan' before assembling does not
> work on USB, firewire, or similar hotplug buses where there's actually a delay
> that you can't track between when the host controller loads, and the disks
> appear.
> 
> It works in this specific case because ISW disks aren't (normally) connected
> via usb/firewire. But if this is going to be extended to formats on USB or
> other hotplug devices, and you want auto-assembly, you can't just run this
> command at this one place in rc.sysinit and expect it to reliably work.  

That is true, but that is already the case for disks with native mdadm metatdata on them, and having usb disks with native mdadm metadata on them is actually a far more likely scenario then having imsm metadata on there.

The patch I propose is mirroring the behavior for existing mdraid sets, and since the new external metadata support makes imsm effectively also mdraid sets that seems like the right thing to do.

Yes this has all the issues of the existing mdadm activation code, but I promise
to make sure that if / when regular mdraid set activation moves to udev I'll test and write patches if necessary for imsm metadata mdraid sets.

Comment 30 Bill Nottingham 2010-01-07 20:33:13 UTC

From a efficiency/boot speed standpoint, would it make sense to make the test gate on [ -f /etc/whatever.conf ], and ship whatever.conf in the dmraid package?

Comment 31 Hans de Goede 2010-01-08 07:53:38 UTC

(In reply to comment #30)
> From a efficiency/boot speed standpoint, would it make sense to make the test
> gate on [ -f /etc/whatever.conf ], and ship whatever.conf in the dmraid
> package?  

That won't help much from efficiency/boot speed standpoint, as we install dmraid by default, and:

[hans@localhost ~]$ sudo rpm -e dmraid
        dmraid is needed by (installed) dracut-003-1.fc12.noarch
        dmraid is needed by (installed) dracut-tools-003-1.fc12.noarch

People who really want the last bit of boot speed, and are not using BIOS RAID could add nomdraid on the kernel cmdline, that will not only safe them from mdadm scanning the disks, but also from dmraid doing the same thing.

Comment 32 Bill Nottingham 2010-01-08 22:34:29 UTC

> That won't help much from efficiency/boot speed standpoint, as we install
> dmraid by default, and:
> 
> [hans@localhost ~]$ sudo rpm -e dmraid
>         dmraid is needed by (installed) dracut-003-1.fc12.noarch
>         dmraid is needed by (installed) dracut-tools-003-1.fc12.noarch
> 
> People who really want the last bit of boot speed, and are not using BIOS RAID
> could add nomdraid on the kernel cmdline, that will not only safe them from
> mdadm scanning the disks, but also from dmraid doing the same thing.  

I suppose, although I'd like to think in the future the dependencies wouldn't be hardcoded. (Also, if the mdadm conf file changes with a new version of it/dmraid it would be nice to not have to release in sync). But, oh well.

More importantly, when is the fix for bug 552342 supposed to land?

Comment 33 Hans de Goede 2010-01-09 09:36:38 UTC

(In reply to comment #32)
> More importantly, when is the fix for bug 552342 supposed to land?    

Good point I've just pinged Doug about this.

Comment 34 Hans de Goede 2010-01-13 08:21:31 UTC

*** Bug 542022 has been marked as a duplicate of this bug. ***

Comment 35 Bill Nottingham 2010-01-14 18:35:24 UTC

OK, I'm going back to the first principles of this bug.

The only arrays that are not currently automatically assembled are newly created arrays, where the admin has not written them into mdadm.conf, correct?

This is normal for all other mdadm metadata formats. Why would we add a special case for isw?

(Also, if we really wanted this, I think it's better to add it to a stock mdadm.conf.)

Comment 36 Hans de Goede 2010-01-15 08:36:03 UTC

(In reply to comment #35)
> OK, I'm going back to the first principles of this bug.
> 
> The only arrays that are not currently automatically assembled are newly
> created arrays, where the admin has not written them into mdadm.conf, correct?
> 
> This is normal for all other mdadm metadata formats. Why would we add a special
> case for isw?
> 

"all other" mdadm metadata formats boils down to native / normal software raid
mdadm metadata formats. This is not standard software RAID, but BIOS RAID, and for all other BIOS RAID metadata formats we automatically activate newly
created arrays without requiring user configuration. And in F-11 we did that too
for isw arrays, so not doing that now, technically is a regression.

> (Also, if we really wanted this, I think it's better to add it to a stock
> mdadm.conf.)    

Erm, no because mdadm.conf's AUTO keyword only comes in to play when there are
no arrays configured in mdadm.conf, when there are arrays listed, only those
listed get activated (and the AUTO keyword effectively gets ignored).

Thus we need a separate empty (except for the AUTO keyword) config file.

I know you don't like this change, and I agree it ain't pretty. But this is
the consequence of Linux having 2 different RAID subsystems and that is not
something which we are going to fix over night.

Comment 38 Bill Nottingham 2010-01-15 19:52:52 UTC

(In reply to comment #36)
> > This is normal for all other mdadm metadata formats. Why would we add a special
> > case for isw?
> > 
> 
> "all other" mdadm metadata formats boils down to native / normal software raid
> mdadm metadata formats. This is not standard software RAID, but BIOS RAID, and
> for all other BIOS RAID metadata formats we automatically activate newly
> created arrays without requiring user configuration. And in F-11 we did that
> too
> for isw arrays, so not doing that now, technically is a regression.

Yes, but if we're moving to only activating specific RAID or
LVM sets in the future, it seems better to set a precedent.

Moreover, we aren't doing this for DDF, which is a similar BIOS container type.(albeit one that likely nothing uses.)

> > (Also, if we really wanted this, I think it's better to add it to a stock
> > mdadm.conf.)    
> 
> Erm, no because mdadm.conf's AUTO keyword only comes in to play when there are
> no arrays configured in mdadm.conf, when there are arrays listed, only those
> listed get activated (and the AUTO keyword effectively gets ignored).

That's not what the man page says. See the example:

 
       DEVICE /dev/sd[bcdjkl]1
       DEVICE /dev/hda1 /dev/hdb1

       # /dev/md0 is known by its UUID.
       ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
       # /dev/md1 contains all devices with a minor number of
       #   1 in the superblock.
       ARRAY /dev/md1 superminor=1
       # /dev/md2 is made from precisely these two devices
       ARRAY /dev/md2 devices=/dev/hda1,/dev/hdb1

       # /dev/md4 and /dev/md5 are a spare-group and spares
       #  can be moved between them
       ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df
                  spare-group=group1
       ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977
                  spare-group=group1
       # /dev/md/home is created if need to be a partitionable md array
       # any spare device number is allocated.
       ARRAY /dev/md/home UUID=9187a482:5dde19d9:eea3cc4a:d646ab8b
                  auto=part

       MAILADDR root
       PROGRAM /usr/sbin/handle-mdadm-events
       CREATE group=system mode=0640 auto=part-8
       HOMEHOST <system>
       AUTO +1.x -all

Maybe the man page is misleading, but according to it, it seems a much simpler solution would be to add a default mdadm.conf with AUTO +imsm -all.

Comment 39 Doug Ledford 2010-01-15 21:58:53 UTC

The AUTO keyword is supposed to work even if there are arrays listed.  If it isn't, then that's a bug.  Regardless of that bug though, the oops on AUTO keyword bug has been resolved in rawhide/rhel6.

Comment 40 Hans de Goede 2010-01-19 09:35:21 UTC

(In reply to comment #39)
> The AUTO keyword is supposed to work even if there are arrays listed.  

If that were true, it would be great, then anaconda can simply write this to mdadm.conf and we are done. But that is not how it works atm, I verified this
(again) to be sure. And the documentation is a but ambiguous on this. 

Quoting from "man mdadm.conf":

"AUTO   A  list  of names of metadata format ..."

"When  mdadm  is auto-assembling an array, with via --assemble or
--incremental and it finds metadata of a given type,  it  checks
that metadata type against those listed in this line."

Note the "When  mdadm  is auto-assembling an array" in the above.

And from: "man mdadm":

"ASSEMBLE MODE ..."

"Auto Assembly

When  --assemble  is  used with --scan and no devices are listed, mdadm
will first attempt to assemble all the  arrays  listed  in  the  config
file.

In  no array at listed in the config (other than those marked <ignore>)
it will look through the available devices for possible arrays and will
try  to  assemble  anything  that it finds."


And this is the ambiguous part. Note it says: 
"will first attempt to assemble all the  arrays  listed  in  the  config file."

With the first suggesting it will start assembling non listed arrays later, but the next alinea says: "In  no array at listed in the config (sic)", which
I think should be read as: "If no arrays are listed in the config".

Note that the second alinea is what currently happens, mdadm will only
try to auto assemble unlisted arrays if none are listed.

I would be more then happy to create a patch to make it always
auto assemble non listed arrays when run as "mdadm -As", or at least
to do that when the AUTO keyword is present. But it would be good to
first know exactly how mdadm should behave, and that we have upstream buy
in for that.

Comment 41 Doug Ledford 2010-01-19 18:21:53 UTC

What make sense is to do the listed test on a per metadata format basis.  So, if auto is +imsm +ddf +1.x -all (as a convoluted example) and we have some 1.x based arrays listed on ARRAY lines, then I would expect it to autoassemble all imsm arrays, all ddf arrays, only the listed 1.x arrays, and no 0.90 arrays.  I say this should be the way it works on the basis that imsm and ddf arrays can be created outside of mdadm's control and we want to handle that gracefully even if we are being good and making sure all of our 1.x arrays are listed in the mdadm.conf file.  That's my take on it, and what I would argue upstream.  I would simply make the man page match this logic versus trying to figure out what the man page really means ;-)

Comment 42 Hans de Goede 2010-01-19 19:00:47 UTC

(In reply to comment #41)
> What make sense is to do the listed test on a per metadata format basis.  So,
> if auto is +imsm +ddf +1.x -all (as a convoluted example) and we have some 1.x
> based arrays listed on ARRAY lines, then I would expect it to autoassemble all
> imsm arrays, all ddf arrays, only the listed 1.x arrays, and no 0.90 arrays.  I
> say this should be the way it works on the basis that imsm and ddf arrays can
> be created outside of mdadm's control and we want to handle that gracefully
> even if we are being good and making sure all of our 1.x arrays are listed in
> the mdadm.conf file.  That's my take on it, and what I would argue upstream.  I
> would simply make the man page match this logic versus trying to figure out
> what the man page really means ;-)    

Hi,

Thanks for the quick response. I'm not sure that would very usefull though,
I would like to be able to add a line:

AUTO +imsm -all

To mdadm.conf, and have that mean, auto assemble all found imsm arrays (independent if they are listed or not), for all others only auto assemble if
listed. Which is different from what you are advocating.

What you are advocating would work too, but would require writing:

AUTO +imsm +1.x -all

For systems with "regular" mdraid sets, and writing:
AUTO +imsm -all

For systems without, which is not that hard todo, but if someone then later
adds a regular raid set, he not only needs to add the line for the array, but
also change the auto line, and that might cause some grief.

Regards,

Hans

Comment 43 Doug Ledford 2010-01-20 19:00:56 UTC

It's not that much different. The primary difference in what you are saying and what I'm saying is that you want the AUTO keyword to force auto assembly even if some of the particular type are present in the config file while what I'm saying is the AUTO keyword only applies if none of the particular type are in the config file. In other words, it's a requirement that the user either list all of their arrays in mdadm.conf, or none of them, not just a partial set. You are advocating for a partial set being OK. I disagree.

And as far as what to write in the file, there is no need to distinguish between with or without 1.x superblock items, just always write +imsm +1.x -all and if the user doesn't currently have any version 1.x arrays, who cares. When they add them later they will either not put them in the mdadm.conf in which case they will get started via the AUTO setting, or they will and the AUTO setting will no longer matter, but the array will get started all the same. There is no need to change the AUTO line nor any extra work on the part of the user. In fact, it would be the first time that they weren't required to entire the array in the mdadm.conf file (assuming no other 1.x arrays are there already).

It really only boils down to needing documented: AUTO only works when no arrays of the given type are listed in the mdadm.conf file. So, AUTO +<format> will autostart arrays of that type, unless you specify an array of that type on an ARRAY line, and in that case all arrays of that type that you wish to be started will need an ARRAY line in mdadm.conf as the AUTO keyword will be ignored for that array type.

Comment 44 Hans de Goede 2010-01-27 11:11:07 UTC

(In reply to comment #43)
> It's not that much different.  The primary difference in what you are saying
> and what I'm saying is that you want the AUTO keyword to force auto assembly
> even if some of the particular type are present in the config file while what
> I'm saying is the AUTO keyword only applies if none of the particular type are
> in the config file.  In other words, it's a requirement that the user either
> list all of their arrays in mdadm.conf, or none of them, not just a partial
> set.  You are advocating for a partial set being OK.  I disagree.
> 

OK,

After giving this more thought, they way you suggest this to work will work
as well, anaconda then simply needs to stop writing imsm sets to mdadm.conf
completely, which is easy.

> And as far as what to write in the file, there is no need to distinguish
> between with or without 1.x superblock items, just always write +imsm +1.x -all
> and if the user doesn't currently have any version 1.x arrays, who cares.  When
> they add them later they will either not put them in the mdadm.conf in which
> case they will get started via the AUTO setting, or they will and the AUTO
> setting will no longer matter, but the array will get started all the same. 
> There is no need to change the AUTO line nor any extra work on the part of the
> user.  In fact, it would be the first time that they weren't required to entire
> the array in the mdadm.conf file (assuming no other 1.x arrays are there
> already).
>

Hmm, this contradicts your SAN arguments about not blindly auto assembling
earlier. I think I will go for the safe option and not put
AUTO +1.x in the anaconda written cfg when no regular arrays were found.

Could you do a patch implementing the behavior of the AUTO keyword as you suggested and get that in place in time for RHEL-6 ?

Comment 45 Christopher Beland 2010-02-12 22:14:05 UTC

Nominating as F13Target since this is listed on Common Bugs.

Comment 46 Doug Ledford 2010-07-20 17:03:48 UTC

Hans, I believe this is long since fixed.  If you agree, could you close this bug out?

Comment 47 Hans de Goede 2010-07-21 07:19:26 UTC

(In reply to comment #46)
> Hans, I believe this is long since fixed.  If you agree, could you close this
> bug out?    

I agree, closing.