Bug 469466

Summary: Backtrace when trying to Live install with disks that were mdraid
Product: [Fedora] Fedora Reporter: Jesse Keating <jkeating>
Component: anacondaAssignee: Anaconda Maintenance Team <anaconda-maint-list>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 10CC: anaconda-maint-list, cje, dcantrell, stickster
Target Milestone: ---Flags: jkeating: fedora_requires_release_note+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: anaconda_trace_hash:77fe14c8f3e35cf2934cecfc0a602dc744e0756c8da5f9ef2576303e769fe638
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-20 01:24:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 151189, 438944    
Attachments:
Description Flags
Attached traceback automatically from anaconda. none

Description Jesse Keating 2008-11-01 02:35:40 UTC
This bug was filed automatically by anaconda.

Comment 1 Jesse Keating 2008-11-01 02:35:44 UTC
Created attachment 322157 [details]
Attached traceback automatically from anaconda.

Comment 2 Jesse Keating 2008-11-01 02:37:32 UTC
Steps to reproduce:

Create raid sets (/boot raid 1, / raid 0) during an install.

Start a Live install (KDE) and choose default partitioning scheme.

I'm going to test with other live images.

Comment 3 Jesse Keating 2008-11-01 02:41:13 UTC
This happens from the normal live install too.

Comment 4 Jeremy Katz 2008-11-03 15:33:39 UTC
When this happens, do you have swaps/lvm/raid activated?  We do try to deactivate some stuff before starting the install, but we may need to add more to the "what we do" list

Comment 5 Jesse Keating 2008-11-03 15:55:06 UTC
Oh yeah, thought that'd be show in the logs.

There seems to be an inactive md array setup, and I'm assuming that's what is conflicting with the installer.

Personalities : 
md_d0 : inactive sdb1[1](S)
      20964672 blocks
       
unused devices: <none>

Comment 6 cje 2008-11-09 01:07:05 UTC
ah!  i hit this bug trying to install on my eee pc 901.  does this mean that i can  avoid the error by doing a reformat of the install drives before clicking on "install to hard drive"?

also, not sure if this is a clue or another bug but i tried the default install options with 'Encrypt system' ticked and got a different error:
cryptodev.py line 194 RuntimeError Device has no UUID.

also i've tried re-creating the same RAID-based layout but whenever i try to create the actual md devices anaconda rearranges the partitions so they're both on the same drive.  again, don't know if that's a symptom - it worked fine on F10 beta (obviously - that's how it got there in the first place)

Comment 7 cje 2008-11-09 01:08:43 UTC
(i was using the gnome i686 livecd - F10 preview)

Comment 8 cje 2008-11-09 02:00:17 UTC
i have 'inactive' things appearing in my /proc/mdstat too but i think they're only what's left over from my old raid arrays after one failed 'install to hard drive'.

they show up as "md: <bind> ..." messages in dmesg.

running liveinst from the command line produces a number of "FATAL: Module xxxx not found." messages - xxxx in [ext3, dm_mod, dm_zero, dm_mirror, dm_snapshot] -  and a "0 logical volume(2) in volume group "VolGroup00" now active" message.

i've tried removing all partitions by running fdisk as root.  rebooting to pick up those changes.  dmesg messages have gone and mdstat is 'empty'. same error.

used dd if=/dev/zero to clear the first 100M of sda and sdb.  rebooted.

well, it's different: "Would you like to initialize this drive..." ... but then the same error.

it's 2am - an ideal time to give up on this for now i think!

Comment 9 cje 2008-11-10 22:21:25 UTC
doing a 'hard disk install' using /images/boot.iso from the DVD plus the contents of the DVD itself appears to be working.

once it's done i'll try the install from the livecd again - see if it's happier with a system previously formatted by the F10 Preview installer.

Comment 10 cje 2008-11-11 02:24:58 UTC
right.  well, that worked - i've installed F10 preview from the livecd.  so i'm very confused.  any ideas anyone?

just to recap, as far as i understand it - steps to reproduce:

1. install F10 beta with software RAID.
2. attempt to install F10 preview from livecd - fails.
3. completely erase disk contents
4. attempt to install F10 preview from livecd - fails.
5. install F10 preview from DVD image - succeeds.
6. attempt to install F10 preview from livecd - succeeds!

gone 2am again.  grrrr.

Comment 11 Jesse Keating 2008-11-14 23:45:32 UTC
So I don't think we're going to find a real fix for this in time for F10.  We have a workaround if you're going to be wiping the disks, so it's not a show stopper, just rather inconvenient.  Moving to target, and will hopefully hook up with some docs writers to get release notes for this.

Comment 12 cje 2008-11-15 00:40:03 UTC
just to be clear - "wiping the disks" did NOT workaround the problem.  it's "install ... from DVD image" which was the workaround.  if people can't do that then they're still stuck unless another solution can be found.

i'm going to attempt to reproduce steps 1-4 from my comment 10 now - i'm deeply suspicious/concerned about step 4.

Comment 13 Jesse Keating 2008-11-15 00:44:06 UTC
I'm pretty sure it's enough to create different partitions on the disk than the previous ones.  Wiping the disk by removing them via parted, or using dd won't change the detectable partition table for things looking for raid signature.  You'd have to create other new partitions that are in different places and that should get you around it.

Comment 14 cje 2008-11-15 01:55:48 UTC
ah, cool.  i'm up to step 2 so i'll try that instead of step 3.

Comment 15 cje 2008-11-15 04:36:44 UTC
right.  well, i didn't do _exactly_ what you suggested - creating different partitions.  instead i thought that the problem might be RAID 'superblocks' still present at the _end_ of the disk (recall i only wiped the first 100M before) - so i took the time to do a complete erase.

dd if=/dev/zero of=/dev/sda bs=1M
took a couple of minutes for 4G
dd if=/dev/zero of=/dev/sdb bs=1M
took about 16 minutes for 16G

(i had tried using mdadm --zero-superblock but it said it couldn't write to the partitions)

a reboot was required after wiping the disks but after that an install from the liveCD worked okay.

so, yes, i think we do have a workaround without needing the DVD install method.

Comment 16 Paul W. Frields 2008-11-21 03:31:14 UTC
For release notes, how does this text work for you guys?

'''
Installing from Live Images on Systems Using Software RAID

A bug in the installer program on Fedora Live images can prevent proper installation on systems with pre-existing software RAID arrays.  Users with these configurations should install from the Fedora installation DVD if possible.  To perform a fresh installation from the Live image, intrepid users can choose one of the following options:

1.  Wipe the disks using 'dd' or a similar command.  Then reboot and install in the usual fashion.

2.  Repartition the disks using different sizes for the software RAID partitions, which will prevent the original partitions from being detected and triggering the bug.

WARNING:  Back up important data before attempting either of the workarounds for Live image installation on a software RAID system.
'''

Comment 17 Jesse Keating 2008-11-21 17:46:27 UTC
I don't think step 1 will be enough in some cases.  Best to just suggest step 2.

Comment 18 Paul W. Frields 2008-11-21 18:27:42 UTC
All right, new draft is:

'''
Installing from Live Images on Systems Using Software RAID

A bug in the installer program on Fedora Live images can prevent proper
installation on systems with pre-existing software RAID arrays.  Users with
these configurations should install from the Fedora installation DVD if
possible.  To perform a fresh installation from the Live image, intrepid users
can first repartition the disks using different sizes for the software RAID
partitions.  This procedure prevents the original partitions from being detected and triggering the bug.

WARNING:  Back up important data before attempting this workaround for Live image installation on a software RAID system.
'''

Comment 19 cje 2008-11-21 18:47:16 UTC
hmm.  i'm not convinced that step 2 will always be enough either.  if the raid superblock is at the end of the partition couldn't it still be found if you repartition in such a way that the last partition still ends in the same place?

e.g. i have a 20G disk made up of 0.1G + 19.9G partitions.  both are raid with superblock at the end.  i repartition the disk to 0.2G + 19.8G - isn't the superblock for the larger partition in the same place?

obviously i might be hugely misunderstanding the way superblocks work.  and the way this bug works.

also i can't think how using dd to clear every last bit of the disk could possibly fail.  have i misunderstood that too?

Comment 20 cje 2008-11-21 18:54:42 UTC
oh, if we do include both options can we make it an unordered (bullet) list please?

Comment 21 Paul W. Frields 2008-11-22 03:36:50 UTC
(In reply to comment #19)
> also i can't think how using dd to clear every last bit of the disk could
> possibly fail.  have i misunderstood that too?

I don't understand why it would fail either, but I trust that Jesse will tell us... ;-)

Comment 22 Paul W. Frields 2008-11-22 05:11:57 UTC
I had to get zero-day content out to our translators to give them a shot, so this will fall by the wayside for that deadline.  However, let's get the text clear here, and add it to the wiki's common F10 bugs page:

https://fedoraproject.org/wiki/Common_F10_bugs

That should be sufficient to offer guidance, and might even be more appropriate anyway, as this is not an intentional change in behavior but an actual bug.

Comment 23 Jesse Keating 2008-11-23 20:44:42 UTC
dd to the tail end may work, I haven't tested that particular scenario.

Comment 24 Bug Zapper 2008-11-26 04:35:54 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 25 cje 2009-02-20 02:59:13 UTC
looks like this problem is still around in F11 alpha.  with the same workaround. (dd to the end and reboot).

Comment 26 Chris Lumens 2009-04-08 20:02:15 UTC
Is this still an issue in post-F11 beta rawhide?  As you know, there's been a whole lot of storage work going on, especially in the RAID area.

Comment 27 cje 2009-04-15 03:49:38 UTC
possibly.  now i'm hitting bug 491729 instead (which can be solved with the same workaround).  so maybe this one's gone - or maybe we'll hit it again as soon as 491729's fixed.

Comment 28 Jesse Keating 2009-05-20 01:24:27 UTC
I don't seem to have this problem anymore.  I think this is fixed.  I'm closing.