Red Hat Bugzilla – Bug 469466
Backtrace when trying to Live install with disks that were mdraid
Last modified: 2013-01-09 22:25:25 EST
This bug was filed automatically by anaconda.
Created attachment 322157 [details]
Attached traceback automatically from anaconda.
Steps to reproduce:
Create raid sets (/boot raid 1, / raid 0) during an install.
Start a Live install (KDE) and choose default partitioning scheme.
I'm going to test with other live images.
This happens from the normal live install too.
When this happens, do you have swaps/lvm/raid activated? We do try to deactivate some stuff before starting the install, but we may need to add more to the "what we do" list
Oh yeah, thought that'd be show in the logs.
There seems to be an inactive md array setup, and I'm assuming that's what is conflicting with the installer.
md_d0 : inactive sdb1(S)
unused devices: <none>
ah! i hit this bug trying to install on my eee pc 901. does this mean that i can avoid the error by doing a reformat of the install drives before clicking on "install to hard drive"?
also, not sure if this is a clue or another bug but i tried the default install options with 'Encrypt system' ticked and got a different error:
cryptodev.py line 194 RuntimeError Device has no UUID.
also i've tried re-creating the same RAID-based layout but whenever i try to create the actual md devices anaconda rearranges the partitions so they're both on the same drive. again, don't know if that's a symptom - it worked fine on F10 beta (obviously - that's how it got there in the first place)
(i was using the gnome i686 livecd - F10 preview)
i have 'inactive' things appearing in my /proc/mdstat too but i think they're only what's left over from my old raid arrays after one failed 'install to hard drive'.
they show up as "md: <bind> ..." messages in dmesg.
running liveinst from the command line produces a number of "FATAL: Module xxxx not found." messages - xxxx in [ext3, dm_mod, dm_zero, dm_mirror, dm_snapshot] - and a "0 logical volume(2) in volume group "VolGroup00" now active" message.
i've tried removing all partitions by running fdisk as root. rebooting to pick up those changes. dmesg messages have gone and mdstat is 'empty'. same error.
used dd if=/dev/zero to clear the first 100M of sda and sdb. rebooted.
well, it's different: "Would you like to initialize this drive..." ... but then the same error.
it's 2am - an ideal time to give up on this for now i think!
doing a 'hard disk install' using /images/boot.iso from the DVD plus the contents of the DVD itself appears to be working.
once it's done i'll try the install from the livecd again - see if it's happier with a system previously formatted by the F10 Preview installer.
right. well, that worked - i've installed F10 preview from the livecd. so i'm very confused. any ideas anyone?
just to recap, as far as i understand it - steps to reproduce:
1. install F10 beta with software RAID.
2. attempt to install F10 preview from livecd - fails.
3. completely erase disk contents
4. attempt to install F10 preview from livecd - fails.
5. install F10 preview from DVD image - succeeds.
6. attempt to install F10 preview from livecd - succeeds!
gone 2am again. grrrr.
So I don't think we're going to find a real fix for this in time for F10. We have a workaround if you're going to be wiping the disks, so it's not a show stopper, just rather inconvenient. Moving to target, and will hopefully hook up with some docs writers to get release notes for this.
just to be clear - "wiping the disks" did NOT workaround the problem. it's "install ... from DVD image" which was the workaround. if people can't do that then they're still stuck unless another solution can be found.
i'm going to attempt to reproduce steps 1-4 from my comment 10 now - i'm deeply suspicious/concerned about step 4.
I'm pretty sure it's enough to create different partitions on the disk than the previous ones. Wiping the disk by removing them via parted, or using dd won't change the detectable partition table for things looking for raid signature. You'd have to create other new partitions that are in different places and that should get you around it.
ah, cool. i'm up to step 2 so i'll try that instead of step 3.
right. well, i didn't do _exactly_ what you suggested - creating different partitions. instead i thought that the problem might be RAID 'superblocks' still present at the _end_ of the disk (recall i only wiped the first 100M before) - so i took the time to do a complete erase.
dd if=/dev/zero of=/dev/sda bs=1M
took a couple of minutes for 4G
dd if=/dev/zero of=/dev/sdb bs=1M
took about 16 minutes for 16G
(i had tried using mdadm --zero-superblock but it said it couldn't write to the partitions)
a reboot was required after wiping the disks but after that an install from the liveCD worked okay.
so, yes, i think we do have a workaround without needing the DVD install method.
For release notes, how does this text work for you guys?
Installing from Live Images on Systems Using Software RAID
A bug in the installer program on Fedora Live images can prevent proper installation on systems with pre-existing software RAID arrays. Users with these configurations should install from the Fedora installation DVD if possible. To perform a fresh installation from the Live image, intrepid users can choose one of the following options:
1. Wipe the disks using 'dd' or a similar command. Then reboot and install in the usual fashion.
2. Repartition the disks using different sizes for the software RAID partitions, which will prevent the original partitions from being detected and triggering the bug.
WARNING: Back up important data before attempting either of the workarounds for Live image installation on a software RAID system.
I don't think step 1 will be enough in some cases. Best to just suggest step 2.
All right, new draft is:
Installing from Live Images on Systems Using Software RAID
A bug in the installer program on Fedora Live images can prevent proper
installation on systems with pre-existing software RAID arrays. Users with
these configurations should install from the Fedora installation DVD if
possible. To perform a fresh installation from the Live image, intrepid users
can first repartition the disks using different sizes for the software RAID
partitions. This procedure prevents the original partitions from being detected and triggering the bug.
WARNING: Back up important data before attempting this workaround for Live image installation on a software RAID system.
hmm. i'm not convinced that step 2 will always be enough either. if the raid superblock is at the end of the partition couldn't it still be found if you repartition in such a way that the last partition still ends in the same place?
e.g. i have a 20G disk made up of 0.1G + 19.9G partitions. both are raid with superblock at the end. i repartition the disk to 0.2G + 19.8G - isn't the superblock for the larger partition in the same place?
obviously i might be hugely misunderstanding the way superblocks work. and the way this bug works.
also i can't think how using dd to clear every last bit of the disk could possibly fail. have i misunderstood that too?
oh, if we do include both options can we make it an unordered (bullet) list please?
(In reply to comment #19)
> also i can't think how using dd to clear every last bit of the disk could
> possibly fail. have i misunderstood that too?
I don't understand why it would fail either, but I trust that Jesse will tell us... ;-)
I had to get zero-day content out to our translators to give them a shot, so this will fall by the wayside for that deadline. However, let's get the text clear here, and add it to the wiki's common F10 bugs page:
That should be sufficient to offer guidance, and might even be more appropriate anyway, as this is not an intentional change in behavior but an actual bug.
dd to the tail end may work, I haven't tested that particular scenario.
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.
More information and reason for this action is here:
looks like this problem is still around in F11 alpha. with the same workaround. (dd to the end and reboot).
Is this still an issue in post-F11 beta rawhide? As you know, there's been a whole lot of storage work going on, especially in the RAID area.
possibly. now i'm hitting bug 491729 instead (which can be solved with the same workaround). so maybe this one's gone - or maybe we'll hit it again as soon as 491729's fixed.
I don't seem to have this problem anymore. I think this is fixed. I'm closing.