Red Hat Bugzilla – Bug 443451
Could not get Fedora 9 to install to hard disk
Last modified: 2013-01-09 23:40:31 EST
Description of problem:
Version-Release number of selected component (if applicable):
Fedora 9 Preview Release
Steps to Reproduce:
1. d/l F9 preview release
2. try to install
installer gets unhandled exception
installer installs system
In Preview, whenever I press the Next button after finishing all the partition,
raid and lvm configuration, nothing happens. It will not go to the next screen.
The install is doa at that point.
One time the Next button gave me an unhandled exception error and asked to file
a bug but it refused to find my usbkey and when I tried Remote, it said there
was a problem writing the file. So I scp a file using the same credentials,
same machine, same filename, in a console without any problem. Do not know what
it was having a problem doing. No further details provided. So I replug my
usbkey and try Disk again and then I get an Assertion
(ped_partition_is_active(part) at disk.c:1186 in function
So I d/l the F9 Live CD and try to install to hard drive from it. The installer
in the live cd was able to put the system on the hard disks but then the system
will not boot. I select every hard disk as the boot disk and still the system
will not boot.
See these mailing list threads for more details on the issues trying to get
Fedora 9 installed:
Here is a smolt for the hardware:
fwiw, I'm installing the boot loader on /dev/md0
and my setup looks like:
VolGroup00 LogVol00 /dev/md1 swap
VolGroup01 LogVol00 /dev/md2 / ext3
VolGroup02 LogVol00 /dev/md3 /var/media ext3
/dev/md0 RAID1 /boot ext3 sdg1,sdh1
/dev/md1 RAID1 LVM PV sdg2,sdh2
/dev/md2 RAID1 LVM PV sdg3,sdh3
/dev/md3 RAID5 LVM PV sda1,sdb1,sdc1,sdd1,sde1
The installer will either crash just after you declare all the partitions, raid,
lvm or it will crash when you are at the bootloader screen. Always when you
push the Next button.
And again, this machine has three SATA controllers. F7 and F8 would not even
install because they only see two of the three controllers. F9 at least sees
all the controllers but the installer is crashing.
Created attachment 303234 [details]
Screenshot of unhandled exception.
If you scroll down a little farther in that dialog, you'll see an end to the
backtrace followed by a line that looks something like this:
SystemError: None type has no attribute 'strip'
Or whatever. It could be nothing like that, but the point is there will be some
sort of message right before the dump of all the internal variables. Could you
also grab a picture of that?
Created attachment 303328 [details]
Screenshot of just before the variable dump
Did you run the mediacheck against your CD?
Created attachment 303387 [details]
Screenshot of console sessions showing Input/output errors.
I tried switching to console to manually partition the drives and none of the
drive tools can access the drives. They all give an Input/output error. In
closely watching the drive lights, everytime I try one of the drive tool
commands, the CDROM light starts blinking instead of a drive light.
Jeremy, yes I ran media check and I just ran it again and both times it says
Can you try booting with 'libata.dma=0' and see if that helps?
Created attachment 303396 [details]
Screenshot-1 of unhandled exception - w/libata.dma=0
Created attachment 303398 [details]
Screenshot-2 of unhandled exception - w/libata.dma=0
Created attachment 303400 [details]
Screenshot-3 of unhandled exception - w/libata.dma=0
Adding libata.dma=0 to the install gives a different unhandled exception.
The kernel definitely isn't liking reading from the disc. Odd that mediacheck
succeeds. Any chance that you can try burning again, preferably at a slower
speed, and see if it changes anything?
This happens for both the beta and preview disks. I burned them using k3b at 4x
and they both verified under k3b as well as with mediacheck. I'll burn another
copy just to see but I do not think it is the media.
Burned another copy of Preview and got some mixed results. This copy verified
in k3b. It also verifies with mediacheck. But at the end of mediacheck when it
asks you for another disk or if you want to continue, when you click Continue
all you get is an error box that says Error with an OK button. So I click OK
and wait and then the box pops up again, and again... So I cold booted the
system and reran the install this time without selecting mediacheck. This time
I was able to get through all the formatting and bootloader screens (of course
I'm getting very fast at doing this now) and the system began installing the
software to the disks. But at the end when I click Reboot, it just sat there
with a black screen and the hard drive light on for 10 minutes before it finally
rebooted and then when it got to the point when it should have loaded the
bootloader all the screen shows is GRUB.
So it looks like the installer is having some type of problem reading this
DVD/CDROM drive. Maybe it is a spindown issue. I've gotten so much faster at
going through the screens that maybe that helped. But still doesn't explain the
weird 'Error' box after the successful mediacheck.
The other issue about the bootloader not being found is the same behavior I
noticed when I was able to transfer the live image to the disk and it too had
the same problem.
I switched the boot disk over to the other mirror disk in the /dev/md0 array and
again got grub but this time it was the grub> prompt.
The installer should install grub on both of the partitions for /dev/md0. Maybe
it is doing this but is getting some of the parameters wrong.
Can these issues be fixed before the release of 9? I have a brand new system
that can't install any recent version of Fedora. Please let me know if I can
help test anything else.
Is there any way to handcraft the mbr for the bootloader? Although it's a
struggle w/anaconda I can sometimes get the files to install to the hard disk
but the bootloader just isn't right or doesn't get installed right and so I
always end up with just the GRUB prompt. If I could hand install a good
bootloader then at least I could get the system working.
Created attachment 304355 [details]
dmesg from f9 rescue console in chroot
Created attachment 304356 [details]
f9 rescue console session of some commands in chroot
Jeremy, I posted some info from f9 rescue mode. It looks as though the install
is there except that there is nothing installed under /boot. I'm certain to get
the GRUB prompt with nothing available to load. The installer says at the end
that it is installing bootloader but it seems like it didn't do it.
Created attachment 304358 [details]
f9 rescue console session of some commands in chroot
Created attachment 304359 [details]
Just some add'l comments about how disk druid handles sata:
As you go through install attempts you see that on each reboot, that disk druid
presents sata drives as scsi and the device names change around. What was sda
on first attempt becomes sdd on second attempt and maybe sdc on third attempt,
etc. This makes things very confusing especially when you have eight drives you
are attempting to partition. Just for comparison I looked at deb/ubuntu and I
found that in their installer the use the scsi names as well as the device names
so that between install attempts scsi-18 is always scsi-18 although the device
name assigned to it changes the drives as order by their scsi name and this
makes things a lot more sane.
I've been doing a lot of install attempts this morning with F9 preview and I've
gotten at least 10 different unhandled exceptions for different parts of the
code. It looks like a lot of exception handling needs to be added to the code.
Here are some results from install testing that I did today:
Scenario 1: very simple case
Partitions: 3 (either Linux or LVM/Linux)
bootloader in MBR
BIOS drive order: Drive 1 first
Installation succeeds but there is strange error on boot:
Cannot find resume device /dev/sdh2
(that is my swap partition, should it be looking there?)
I also see this error every so often:
/dev/sdc1 lseek <some huge number> failed: invalid argument
Scenario 2: RAID (add another drive)
Drive 1 and 2:
Partitions: 3 (all software RAID)
bootloader in /dev/md0
BIOS drive order: Drive 1 first
Formatting / file system takes 10 minutes!
Installing bootloader takes 3 minutes!
Reboot results in either GRUB or grub>
Scenario 3: LVM over RAID (add more drives)
Drive 1 and 2:
Partitions: 3 (boot=software RAID1, swap=LVM/RAID1, /=LVM/RAID1)
Partitions: 1 (LVM/RAID5)
bootloader in /dev/md0
BIOS drive order: Drive 1 first
Gets all sorts of Unhandled Exceptions and an assertion error.
Maybe 1 in 10 tries you can actually get the software to load on the disks
but then it always fails on the reboot with GRUB or grub>
When you are in 'custom setup' and you delete an existing VolGroup then you go
to one of the RAID entries that was contained in the VolGroup and you try to
edit it so that you can redeclare the type from LVM to ext3, it will act like it
allows you to do this but when you try to commit the partitioning you will find
that it did not do it.
When you are shown the bootloader choice it is not obvious that you can select
BIOS drive order by clicking on the Manage device button. And when that drive
order list is shown it is nothing like what my BIOS thinks the drive order is.
Is anaconda reading the BIOS to get this information? If so, it does not work.
Or is this just a question to be answered by the user requiring them to select
the order of the drives as they know them to be in their BIOS?
If you're getting unhandled exceptions *please* save them and post them. Without
the files we can only guess what you're hitting.
Have you tried to install from another DVD/CD-ROM drive (USB attached or
whatever)? This might simply be a hardware issue? Worth the shot, anyway, IMO.
I have posted some of the unhandled exceptions here. The bug report wizard
does not work. When you go to select a drive to save the bug all there is is a
weird thin line when you click the dropdown. The network save does not work as
well I fill it in and then it says it cannot do it. So getting bugs filed is
not easy with this installer. Some of the unhandled's that I remember were
Error No 5 Input/Output error.
global "repo" not defined or something equivalent
As far as my dvd drive, I have been using it all the time to burn DVD/CD
without problems. It is not the issue. I can get other distros installed and
F9 installed without problem in the very simple case of just plain linux
partition on one drive. It's when you setup a more complex case involving RAID
that the installer starts having problems.
Created attachment 304473 [details]
Test session of performing partitioning, creating arrays, volume groups, filesystems and mbr without problem on my machine.
The test session in the attachment shows where I am able to successfully created
some rather complex LVM over RAID setups involving multiple RAID-1 and RAID-5
arrays as well as multiple LVM Volume Groups over top of them. All without
problem. These are the same things that Anaconda should be able to do and has
been able to do in the past, but now is not able to do.
Several times I noticed references to 'Mactel' in some of the unhandled
exceptions. If this means that Anaconda thinks the hardware is a Mac, then that
is not correct.
Created attachment 304498 [details]
lvresize unhandled exception - after working with VG1 then it cannot find VG1
I setup all formatting, arrays, logical volumes, filesystems myself manually
and verified that all of this was working just fine including filesystems and
then fed this to anaconda. It complained that VG1 had LV which was 4mb too
large. So in gui I reduced size of LV by 4mb. Then I just added the mount
points necessary and told anaconda to do no formatting. Clicked Next. It
begins with "Checking filesystem on /dev/VolGroup01/LogVol00'. This runs for
quite a while until finally the unhandled exception occurs with the lvresize
The lvresize should not have even been necessary. From looking at the log both
the VG and the LV were of proper size to begin with. Somehow Anaconda thought
that the LV was too large for the VG but from the log they were the same size
and Anaconda should have accepted this.
It looks like the exception occurred because Anaconda stopped the arrays during
the resize and of course that pulled the rug out from under everything.
Investigating the persistent GRUB, grub> problem:
I used the rescue mode to look at the install. I chroot /mnt/sysimage and one
thing I immediately notice is that there is almost nothing under /dev, only one
of my three volume groups are to be found. The root device
/dev/VolGroup01/LogVol00 is there but that is it. No boot device, no swap
device. Now maybe this has something to do with rescue mode only but it does
not seem right and when you try to run 'grub' in the chroot, which usually works
just fine, it does not work right at all.
grub> root (hd0,0)
Error 21: Selected disk does not exist.
# ls boot
So with no boot device (/dev/md0) under /dev there is no way to mount the
partition on /boot.
So I back out of the chroot
first mount the boot device
# mount -t ext3 /dev/md0 /mnt/sysimage/boot
and try grub from the rescue env:
grub> root (hd0,0)
Filesystem type unknown, partition type 0xfd
grub> setup (hd0)
Error 17: Cannot mount selected partition
So Anaconda has left some things out that need to be here in order to get the
bootloader installed with a RAID | LVM/RAID setup.
Here are some other bugs/issues I've found:
In Disk Druid when I checked my VG's. The VG that was over my RAID-5 array
Used 80%: 975,835,823
Free 20%: 211,143,120
Total : 1,186,978,943
That total is more than the entire capacity of the the RAID-5 array. The array
consists of 5 devices and 1 spare device. Each device is a 250GB drive which
nets out at about 244GB avail. each. The capacity of a RAID-5 array is the
total of all the non-spare devices less one device. So in my case that would be
4 devices or 975GB. So it appears that Anaconda is counting the spare device in
its calculations which is wrong.
Another issue is when the installer gets ready to install the software, it just
appears to freeze. So I go into console mode and what is going on is that the
arrays are resyncing. Apparently Anaconda waits for this to complete which on
big arrays can take as long as an hour. So the install appears locked up. But
if you wait it will eventually begin to install the software. Some message
should be presented to the user if Anaconda is going to wait for some kind of
processing to occur so the user does not think that the whole install is locked up.
Investigating more on grub:
When I get the grub> prompt on the failed boot here is what I find there:
grub> find /grub/grub.conf
grub> cat /grub/device.map
grub> cat /grub/grub.conf
# grub.conf generated by anaconda
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# kernel /vmlinuz-version ro root=/dev/VolGroup01/LogVol00
# initrd /initrd-version.img
title Fedora (2.6.25-0.234.rc9.git1.fc9.i686)
kernel /vmlinuz-2.6.25-0.234.rc9.git1.fc9.i686 ro
root=UUID=6b9ef3bc-13db-4447-93a6-d991c09bb02 rhgb quiet
The problem here is that /dev/sde is the spare device in the RAID-1 array and
therefore holds no files yet because it is inactive. grub.conf is showing it as
the root device but there are no files on that device.
How I got the install to finally boot:
at the failed boot grub>
grub> root (hd0,0)
grub> setup (hd0)
grub> root (hd1,0)
grub> setup (hd1) <== this did not work apparently: got could not mount
entered rescue mode
mounted the boot device
edited /boot/grub/device.map and moved the spare raid device to the bottom
edited /boot/grub/grub.conf and changed (hd2,0) to (hd0,0) everywhere.
Success, finally boots.
So, with RAID | LVM setups, it takes me about 10-15 tries to actually get
anaconda to load the software onto the hard disks and then it takes all this
messing around with grub and grub files in the boot and rescue modes to finally
get F9 to boot. This is the first time of all the Fedora releases that Anaconda
has given me this much trouble completing a (hacked) install with RAID and LVM.
Well, something is still not right with the way Anaconda installed the software
because when you enter rescue mode it always tells you that it could not find
all of the system and some parts may be mounted under /mnt/sysimage. So I think
pieces are still missing.
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
There are so many issues and comments in this bug that it's absolutely
impossible to figure out what is broken and what is not. Your original report
appeared to be either burn errors or kernel errors reading the burnt disc.
Please file separate bug reports for other issues. Thanks.
There are serious issues identified in this bug and I spent a great deal of
time documenting these issues about Anaconda and RAID problems with many
screenshots and other documentation. I don't have hours to spend reposting this
amount of stuff to other bugs.
There were no burn errors. I've verified each of the disks several times and
have used them to install F9 in some simple non-RAID setups in VM's. The
problems reading the dvd drive are definitely kernel issues and I opened another
bug regarding dvd r/w problems w/kernel. But the rest is about how seriously
deficient Anaconda was is dealing with RAID setups.
By hand I created multiple RAID setups successfully to prove that it was the
installer and not some other aspect of the machine, or configs or whatever. Yes
there are multiple aspects to the problem in this bug. But they all belong
together because I think you will eventually find common causes to these issues.
1. there are a plethora of unhandled exceptions that are just waiting like
landmines to bite people.
2. my hardware appeared to be identified at times as Mactel - doesn't seem right
- it's a pc not a mac.
3. the installer cannot successfully install any RAID setup which may just be
the fact that it is installing the bootloader in the wrong partition.
Gerry, I personally tested a setup with two sata disks that was raid 1 /boot,
raid 6 (two partitions on each disk) /usr, raid 0 (two partitions on each disk)
lvm PV, the volume group itself had swap and / on it. The installer handles
this without complaint and the boot loader works after the fact.
There are some oddities when dealing with lvm device mapper, particularly when
there was existing LVM stuff on it. For an accurate test I would highly
recommend completely blanking the disks out (recreate partitions/tables a few
times of different design, no lvm, finally dd off the partition table) and
starting an install from scratch with all the disks uninitialized. That should
avoid any gremlins that fall out of the lvm tools when dealing with existing LVM
And Chris is quite right, this single bug report has turned into a dumping
ground by you of a plethora of perceived issues and retries. We'll try to look
through the unhandled exceptions that have files as those may be easy to find
and wrap around, but the other issues are way too muddled to work through. We
appreciate the time you've spent in reporting them, and we'll do our best.
Please try the above scenario I've requested and if it fails, please please file
a new bug and provide the failure reports. We'll work on these one issue at a time.
I understand that this is a tough bug report. I didn't intend it to be that
way. It's just how it went as I was working through the installation.
Throughout this whole install on this machine there were numerous incidents
where it appeared the installer was having trouble reading devices. This could
mean more of a kernel problem than anything. But yet after the install and the
update kernel everything seems to be working fine on this machine once I
handcrafted a bootloader and installed it in the correct partition.
A thought that came to mind is that I always install RAID with at least one
spare device. And maybe these spares have played a role somehow it what I saw.
I don't have an immediate machine available to replay the installation. I now
have some services running on the machine that was used in this bug so I can't
use it now. But if I get time maybe I can try some of this in a VM.
While I hate to continue on this bug...
Gerry, do you use a spare disk for your /boot ? Raid 1 with a spare? I don't
think that's a test case we cover currently, perhaps we should.
Yes, for every array, including boot, swap, root, ... I learned from years of
experience with RAID that you're much safer with spares readily available. You
do not want to be replacing disks against a degraded array. All too often the
drives were installed at the same time and they all get ready to fail at about
the same time. This is why I use spares on all my arrays.
I had a similar problem (and have had it since FC7). I was finally able to
install FC9 by unplugging all of my unneeded IDE devices. Previously, I had 2
HDs, 1 CDRW, 1 DVD-ROM. I unplugged the unneeded HD and the CDRW and rejumpered
the DVD-ROM and other HD, and things installed fine.
It makes no sense at all to me, but that's what happened. My "unhandled
exception" was on tune2fs. It said that it couldn't find "/dev/Vol0" or
something like that.
As a latecomer to this blog, your RAID problems reflect similar problems I had with FC7 and FC8 and now FC10. (I skipped FC9 because of anaconda agony..)
With FC10 and software RAID, I learned that Anaconda may have zapped my MBR. I initially got just GRUB when booting installed FC10. This was repaired by rewriting:
First using the repair DVD
grub> find /boot/grub/stage1
My logic here was that the RAID1 disks sda and sdb both had grub installed in their MBR. If one disk fails, one still wants a viable grub in the remaining disk.
Continuing with my FC10 trials: on boot, the installed FC10 would only give me the raw grub
and nothing more. By keying in the kernel and initrd lines (TAB very helpful here), I was able to get the new FC10 booted.
To move on from just grub>, it was necessary to re-install the stage1 and stage2 stuff:
grub> install ...stage1... ..stage2..
I don't have the grub manual in front of me.
After these components of grub were installed (in MBR?), grub would then give me the grub.conf options and life was regular.
With my install of FC7 and FC8 (one of the two), I noticed that having extra drives (in my case it was a firewire disk used for the backup), would screw up Anaconda. Anaconda seems very brittle in that regard. Removing all extra disks was very helpful. However, Anaconda required a full RAID1 disk complement (two disks). It would not do anything with a one disk RAID array. Again - very brittle behavior.
The wishy washy statement about labels being required, but if you have RAID or LVM, maybe not.. I just removed all of the labels and UUID from grub.conf and fstab. So far, FC10 with software RAID does not complain and boots fine.
I currently have one system left to move to FC10 (with hardware RAID). This does not boot at all under FC10. Currently writing about this in another bugzilla blog - Bug #474399)
We have made extensive changes to the partitioning code for F11 beta, such that it is very difficult to tell whether your bug is still relevant or not. Please test with either the latest rawhide you have access to or F11 and let us know whether you are still seeing this problem. Thanks for the bug report.
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '9'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 9's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 9 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
The best I've been able to do even with F11 anaconda is I first have to manually make sure that every superblock is perfectly matched with regard to preferred minors to its array and then and only then am I able to successfully install Fedora using anaconda. A normal Linux system has tolerance for mismatches in preferred minors but anaconda requires perfect matching of preferred minors which in my opinion is far too restrictive and not reflective of the behavior of normal mdadm operation.
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.