Description of problem: Version-Release number of selected component (if applicable): Fedora 9 Preview Release How reproducible: Always Steps to Reproduce: 1. d/l F9 preview release 2. try to install 3. Actual results: installer gets unhandled exception Expected results: installer installs system Additional info: In Preview, whenever I press the Next button after finishing all the partition, raid and lvm configuration, nothing happens. It will not go to the next screen. The install is doa at that point. One time the Next button gave me an unhandled exception error and asked to file a bug but it refused to find my usbkey and when I tried Remote, it said there was a problem writing the file. So I scp a file using the same credentials, same machine, same filename, in a console without any problem. Do not know what it was having a problem doing. No further details provided. So I replug my usbkey and try Disk again and then I get an Assertion (ped_partition_is_active(part) at disk.c:1186 in function ped_partition_get_flag() failed. So I d/l the F9 Live CD and try to install to hard drive from it. The installer in the live cd was able to put the system on the hard disks but then the system will not boot. I select every hard disk as the boot disk and still the system will not boot. See these mailing list threads for more details on the issues trying to get Fedora 9 installed: http://marc.info/?l=fedora-devel-list&m=120871310211038&w=2 http://marc.info/?l=fedora-devel-list&m=120875155931396&w=2 Here is a smolt for the hardware: http://www.smolts.org/client/show/pub_a85342c8-29ef-4208-a216-161ee87bcb2f
fwiw, I'm installing the boot loader on /dev/md0 and my setup looks like: VolGroup00 LogVol00 /dev/md1 swap VolGroup01 LogVol00 /dev/md2 / ext3 VolGroup02 LogVol00 /dev/md3 /var/media ext3 /dev/md0 RAID1 /boot ext3 sdg1,sdh1 /dev/md1 RAID1 LVM PV sdg2,sdh2 /dev/md2 RAID1 LVM PV sdg3,sdh3 /dev/md3 RAID5 LVM PV sda1,sdb1,sdc1,sdd1,sde1 The installer will either crash just after you declare all the partitions, raid, lvm or it will crash when you are at the bootloader screen. Always when you push the Next button. And again, this machine has three SATA controllers. F7 and F8 would not even install because they only see two of the three controllers. F9 at least sees all the controllers but the installer is crashing.
Created attachment 303234 [details] Screenshot of unhandled exception.
If you scroll down a little farther in that dialog, you'll see an end to the backtrace followed by a line that looks something like this: SystemError: None type has no attribute 'strip' Or whatever. It could be nothing like that, but the point is there will be some sort of message right before the dump of all the internal variables. Could you also grab a picture of that?
Created attachment 303328 [details] Screenshot of just before the variable dump
Did you run the mediacheck against your CD?
Created attachment 303387 [details] Screenshot of console sessions showing Input/output errors. I tried switching to console to manually partition the drives and none of the drive tools can access the drives. They all give an Input/output error. In closely watching the drive lights, everytime I try one of the drive tool commands, the CDROM light starts blinking instead of a drive light.
Jeremy, yes I ran media check and I just ran it again and both times it says successfully verified.
Can you try booting with 'libata.dma=0' and see if that helps?
Created attachment 303396 [details] Screenshot-1 of unhandled exception - w/libata.dma=0
Created attachment 303398 [details] Screenshot-2 of unhandled exception - w/libata.dma=0
Created attachment 303400 [details] Screenshot-3 of unhandled exception - w/libata.dma=0 Adding libata.dma=0 to the install gives a different unhandled exception.
The kernel definitely isn't liking reading from the disc. Odd that mediacheck succeeds. Any chance that you can try burning again, preferably at a slower speed, and see if it changes anything?
This happens for both the beta and preview disks. I burned them using k3b at 4x and they both verified under k3b as well as with mediacheck. I'll burn another copy just to see but I do not think it is the media.
Burned another copy of Preview and got some mixed results. This copy verified in k3b. It also verifies with mediacheck. But at the end of mediacheck when it asks you for another disk or if you want to continue, when you click Continue all you get is an error box that says Error with an OK button. So I click OK and wait and then the box pops up again, and again... So I cold booted the system and reran the install this time without selecting mediacheck. This time I was able to get through all the formatting and bootloader screens (of course I'm getting very fast at doing this now) and the system began installing the software to the disks. But at the end when I click Reboot, it just sat there with a black screen and the hard drive light on for 10 minutes before it finally rebooted and then when it got to the point when it should have loaded the bootloader all the screen shows is GRUB. So it looks like the installer is having some type of problem reading this DVD/CDROM drive. Maybe it is a spindown issue. I've gotten so much faster at going through the screens that maybe that helped. But still doesn't explain the weird 'Error' box after the successful mediacheck. The other issue about the bootloader not being found is the same behavior I noticed when I was able to transfer the live image to the disk and it too had the same problem.
I switched the boot disk over to the other mirror disk in the /dev/md0 array and again got grub but this time it was the grub> prompt. The installer should install grub on both of the partitions for /dev/md0. Maybe it is doing this but is getting some of the parameters wrong.
Can these issues be fixed before the release of 9? I have a brand new system that can't install any recent version of Fedora. Please let me know if I can help test anything else.
Is there any way to handcraft the mbr for the bootloader? Although it's a struggle w/anaconda I can sometimes get the files to install to the hard disk but the bootloader just isn't right or doesn't get installed right and so I always end up with just the GRUB prompt. If I could hand install a good bootloader then at least I could get the system working.
Created attachment 304355 [details] dmesg from f9 rescue console in chroot
Created attachment 304356 [details] f9 rescue console session of some commands in chroot
Jeremy, I posted some info from f9 rescue mode. It looks as though the install is there except that there is nothing installed under /boot. I'm certain to get the GRUB prompt with nothing available to load. The installer says at the end that it is installing bootloader but it seems like it didn't do it.
Created attachment 304358 [details] f9 rescue console session of some commands in chroot
Created attachment 304359 [details] anaconda-ks.cfg
Just some add'l comments about how disk druid handles sata: As you go through install attempts you see that on each reboot, that disk druid presents sata drives as scsi and the device names change around. What was sda on first attempt becomes sdd on second attempt and maybe sdc on third attempt, etc. This makes things very confusing especially when you have eight drives you are attempting to partition. Just for comparison I looked at deb/ubuntu and I found that in their installer the use the scsi names as well as the device names so that between install attempts scsi-18 is always scsi-18 although the device name assigned to it changes the drives as order by their scsi name and this makes things a lot more sane.
I've been doing a lot of install attempts this morning with F9 preview and I've gotten at least 10 different unhandled exceptions for different parts of the code. It looks like a lot of exception handling needs to be added to the code.
Here are some results from install testing that I did today: ================================================================== Scenario 1: very simple case Drives: 1 Drive 1: Partitions: 3 (either Linux or LVM/Linux) bootloader in MBR BIOS drive order: Drive 1 first Result: SUCCESS Installation succeeds but there is strange error on boot: Cannot find resume device /dev/sdh2 (that is my swap partition, should it be looking there?) I also see this error every so often: /dev/sdc1 lseek <some huge number> failed: invalid argument ================================================================== Scenario 2: RAID (add another drive) Drives: 2 Drive 1 and 2: Partitions: 3 (all software RAID) bootloader in /dev/md0 BIOS drive order: Drive 1 first Result: FAILS Formatting / file system takes 10 minutes! Installing bootloader takes 3 minutes! Reboot results in either GRUB or grub> ================================================================== Scenario 3: LVM over RAID (add more drives) Drives: 8 Drive 1 and 2: Partitions: 3 (boot=software RAID1, swap=LVM/RAID1, /=LVM/RAID1) Drive 3-8: Partitions: 1 (LVM/RAID5) bootloader in /dev/md0 BIOS drive order: Drive 1 first Result: FAILS Gets all sorts of Unhandled Exceptions and an assertion error. Maybe 1 in 10 tries you can actually get the software to load on the disks but then it always fails on the reboot with GRUB or grub> ================================================================== Other problem: When you are in 'custom setup' and you delete an existing VolGroup then you go to one of the RAID entries that was contained in the VolGroup and you try to edit it so that you can redeclare the type from LVM to ext3, it will act like it allows you to do this but when you try to commit the partitioning you will find that it did not do it. ================================================================== Possibly confusing: When you are shown the bootloader choice it is not obvious that you can select BIOS drive order by clicking on the Manage device button. And when that drive order list is shown it is nothing like what my BIOS thinks the drive order is. Is anaconda reading the BIOS to get this information? If so, it does not work. Or is this just a question to be answered by the user requiring them to select the order of the drives as they know them to be in their BIOS? ==================================================================
If you're getting unhandled exceptions *please* save them and post them. Without the files we can only guess what you're hitting.
Have you tried to install from another DVD/CD-ROM drive (USB attached or whatever)? This might simply be a hardware issue? Worth the shot, anyway, IMO.
Bill, I have posted some of the unhandled exceptions here. The bug report wizard does not work. When you go to select a drive to save the bug all there is is a weird thin line when you click the dropdown. The network save does not work as well I fill it in and then it says it cannot do it. So getting bugs filed is not easy with this installer. Some of the unhandled's that I remember were things like: Error No 5 Input/Output error. global "repo" not defined or something equivalent Error / As far as my dvd drive, I have been using it all the time to burn DVD/CD without problems. It is not the issue. I can get other distros installed and F9 installed without problem in the very simple case of just plain linux partition on one drive. It's when you setup a more complex case involving RAID that the installer starts having problems.
Created attachment 304473 [details] Test session of performing partitioning, creating arrays, volume groups, filesystems and mbr without problem on my machine.
The test session in the attachment shows where I am able to successfully created some rather complex LVM over RAID setups involving multiple RAID-1 and RAID-5 arrays as well as multiple LVM Volume Groups over top of them. All without problem. These are the same things that Anaconda should be able to do and has been able to do in the past, but now is not able to do.
Several times I noticed references to 'Mactel' in some of the unhandled exceptions. If this means that Anaconda thinks the hardware is a Mac, then that is not correct.
Created attachment 304498 [details] lvresize unhandled exception - after working with VG1 then it cannot find VG1 Scenario: I setup all formatting, arrays, logical volumes, filesystems myself manually and verified that all of this was working just fine including filesystems and then fed this to anaconda. It complained that VG1 had LV which was 4mb too large. So in gui I reduced size of LV by 4mb. Then I just added the mount points necessary and told anaconda to do no formatting. Clicked Next. It begins with "Checking filesystem on /dev/VolGroup01/LogVol00'. This runs for quite a while until finally the unhandled exception occurs with the lvresize error.
The lvresize should not have even been necessary. From looking at the log both the VG and the LV were of proper size to begin with. Somehow Anaconda thought that the LV was too large for the VG but from the log they were the same size and Anaconda should have accepted this. It looks like the exception occurred because Anaconda stopped the arrays during the resize and of course that pulled the rug out from under everything.
Investigating the persistent GRUB, grub> problem: I used the rescue mode to look at the install. I chroot /mnt/sysimage and one thing I immediately notice is that there is almost nothing under /dev, only one of my three volume groups are to be found. The root device /dev/VolGroup01/LogVol00 is there but that is it. No boot device, no swap device. Now maybe this has something to do with rescue mode only but it does not seem right and when you try to run 'grub' in the chroot, which usually works just fine, it does not work right at all. grub> root (hd0,0) root(hd0,0) Error 21: Selected disk does not exist. grub>quit # ls boot # So with no boot device (/dev/md0) under /dev there is no way to mount the partition on /boot. So I back out of the chroot # exit first mount the boot device # mount -t ext3 /dev/md0 /mnt/sysimage/boot and try grub from the rescue env: # grub grub> root (hd0,0) root (hd0,0) Filesystem type unknown, partition type 0xfd grub> setup (hd0) setup (hd0) Error 17: Cannot mount selected partition grub> quit # So Anaconda has left some things out that need to be here in order to get the bootloader installed with a RAID | LVM/RAID setup.
Here are some other bugs/issues I've found: In Disk Druid when I checked my VG's. The VG that was over my RAID-5 array showed this: Used 80%: 975,835,823 Free 20%: 211,143,120 Total : 1,186,978,943 That total is more than the entire capacity of the the RAID-5 array. The array consists of 5 devices and 1 spare device. Each device is a 250GB drive which nets out at about 244GB avail. each. The capacity of a RAID-5 array is the total of all the non-spare devices less one device. So in my case that would be 4 devices or 975GB. So it appears that Anaconda is counting the spare device in its calculations which is wrong. Another issue is when the installer gets ready to install the software, it just appears to freeze. So I go into console mode and what is going on is that the arrays are resyncing. Apparently Anaconda waits for this to complete which on big arrays can take as long as an hour. So the install appears locked up. But if you wait it will eventually begin to install the software. Some message should be presented to the user if Anaconda is going to wait for some kind of processing to occur so the user does not think that the whole install is locked up.
Investigating more on grub: When I get the grub> prompt on the failed boot here is what I find there: grub> find /grub/grub.conf (hd0,0) (hd4,0) 8 grub> cat /grub/device.map (hd2) /dev/sde (hd0) /dev/sdg (hd1) /dev/sdh grub> cat /grub/grub.conf # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root(hd2,0) # kernel /vmlinuz-version ro root=/dev/VolGroup01/LogVol00 # initrd /initrd-version.img #boot=/dev/md0 default=0 timeout=5 splashimage=(hd2,0)/grub/splash.xpm.gz hiddenmenu title Fedora (2.6.25-0.234.rc9.git1.fc9.i686) root (hd2,0) kernel /vmlinuz-2.6.25-0.234.rc9.git1.fc9.i686 ro root=UUID=6b9ef3bc-13db-4447-93a6-d991c09bb02 rhgb quiet initrd /initrd-2.6.25-0.234.rc9.git1.fc9.i686.img grub> The problem here is that /dev/sde is the spare device in the RAID-1 array and therefore holds no files yet because it is inactive. grub.conf is showing it as the root device but there are no files on that device. How I got the install to finally boot: at the failed boot grub> grub> root (hd0,0) grub> setup (hd0) grub> root (hd1,0) grub> setup (hd1) <== this did not work apparently: got could not mount partition error. reboot entered rescue mode mounted the boot device edited /boot/grub/device.map and moved the spare raid device to the bottom edited /boot/grub/grub.conf and changed (hd2,0) to (hd0,0) everywhere. reboot Success, finally boots. So, with RAID | LVM setups, it takes me about 10-15 tries to actually get anaconda to load the software onto the hard disks and then it takes all this messing around with grub and grub files in the boot and rescue modes to finally get F9 to boot. This is the first time of all the Fedora releases that Anaconda has given me this much trouble completing a (hacked) install with RAID and LVM.
Well, something is still not right with the way Anaconda installed the software because when you enter rescue mode it always tells you that it could not find all of the system and some parts may be mounted under /mnt/sysimage. So I think pieces are still missing.
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
There are so many issues and comments in this bug that it's absolutely impossible to figure out what is broken and what is not. Your original report appeared to be either burn errors or kernel errors reading the burnt disc. Please file separate bug reports for other issues. Thanks.
Chris, There are serious issues identified in this bug and I spent a great deal of time documenting these issues about Anaconda and RAID problems with many screenshots and other documentation. I don't have hours to spend reposting this amount of stuff to other bugs. There were no burn errors. I've verified each of the disks several times and have used them to install F9 in some simple non-RAID setups in VM's. The problems reading the dvd drive are definitely kernel issues and I opened another bug regarding dvd r/w problems w/kernel. But the rest is about how seriously deficient Anaconda was is dealing with RAID setups. By hand I created multiple RAID setups successfully to prove that it was the installer and not some other aspect of the machine, or configs or whatever. Yes there are multiple aspects to the problem in this bug. But they all belong together because I think you will eventually find common causes to these issues. To summarize: 1. there are a plethora of unhandled exceptions that are just waiting like landmines to bite people. 2. my hardware appeared to be identified at times as Mactel - doesn't seem right - it's a pc not a mac. 3. the installer cannot successfully install any RAID setup which may just be the fact that it is installing the bootloader in the wrong partition.
Gerry, I personally tested a setup with two sata disks that was raid 1 /boot, raid 6 (two partitions on each disk) /usr, raid 0 (two partitions on each disk) lvm PV, the volume group itself had swap and / on it. The installer handles this without complaint and the boot loader works after the fact. There are some oddities when dealing with lvm device mapper, particularly when there was existing LVM stuff on it. For an accurate test I would highly recommend completely blanking the disks out (recreate partitions/tables a few times of different design, no lvm, finally dd off the partition table) and starting an install from scratch with all the disks uninitialized. That should avoid any gremlins that fall out of the lvm tools when dealing with existing LVM content. And Chris is quite right, this single bug report has turned into a dumping ground by you of a plethora of perceived issues and retries. We'll try to look through the unhandled exceptions that have files as those may be easy to find and wrap around, but the other issues are way too muddled to work through. We appreciate the time you've spent in reporting them, and we'll do our best. Please try the above scenario I've requested and if it fails, please please file a new bug and provide the failure reports. We'll work on these one issue at a time. Thanks!
Jesse, I understand that this is a tough bug report. I didn't intend it to be that way. It's just how it went as I was working through the installation. Throughout this whole install on this machine there were numerous incidents where it appeared the installer was having trouble reading devices. This could mean more of a kernel problem than anything. But yet after the install and the update kernel everything seems to be working fine on this machine once I handcrafted a bootloader and installed it in the correct partition. A thought that came to mind is that I always install RAID with at least one spare device. And maybe these spares have played a role somehow it what I saw. I don't have an immediate machine available to replay the installation. I now have some services running on the machine that was used in this bug so I can't use it now. But if I get time maybe I can try some of this in a VM.
While I hate to continue on this bug... Gerry, do you use a spare disk for your /boot ? Raid 1 with a spare? I don't think that's a test case we cover currently, perhaps we should.
Yes, for every array, including boot, swap, root, ... I learned from years of experience with RAID that you're much safer with spares readily available. You do not want to be replacing disks against a degraded array. All too often the drives were installed at the same time and they all get ready to fail at about the same time. This is why I use spares on all my arrays.
I had a similar problem (and have had it since FC7). I was finally able to install FC9 by unplugging all of my unneeded IDE devices. Previously, I had 2 HDs, 1 CDRW, 1 DVD-ROM. I unplugged the unneeded HD and the CDRW and rejumpered the DVD-ROM and other HD, and things installed fine. It makes no sense at all to me, but that's what happened. My "unhandled exception" was on tune2fs. It said that it couldn't find "/dev/Vol0" or something like that.
As a latecomer to this blog, your RAID problems reflect similar problems I had with FC7 and FC8 and now FC10. (I skipped FC9 because of anaconda agony..) With FC10 and software RAID, I learned that Anaconda may have zapped my MBR. I initially got just GRUB when booting installed FC10. This was repaired by rewriting: First using the repair DVD grub> find /boot/grub/stage1 (hd0,0) (hd1,0) grub> setup(hd0) grub> setup(hd1) My logic here was that the RAID1 disks sda and sdb both had grub installed in their MBR. If one disk fails, one still wants a viable grub in the remaining disk. Continuing with my FC10 trials: on boot, the installed FC10 would only give me the raw grub grub> and nothing more. By keying in the kernel and initrd lines (TAB very helpful here), I was able to get the new FC10 booted. To move on from just grub>, it was necessary to re-install the stage1 and stage2 stuff: grub> install ...stage1... ..stage2.. I don't have the grub manual in front of me. After these components of grub were installed (in MBR?), grub would then give me the grub.conf options and life was regular. ====== With my install of FC7 and FC8 (one of the two), I noticed that having extra drives (in my case it was a firewire disk used for the backup), would screw up Anaconda. Anaconda seems very brittle in that regard. Removing all extra disks was very helpful. However, Anaconda required a full RAID1 disk complement (two disks). It would not do anything with a one disk RAID array. Again - very brittle behavior. The wishy washy statement about labels being required, but if you have RAID or LVM, maybe not.. I just removed all of the labels and UUID from grub.conf and fstab. So far, FC10 with software RAID does not complain and boots fine. I currently have one system left to move to FC10 (with hardware RAID). This does not boot at all under FC10. Currently writing about this in another bugzilla blog - Bug #474399)
We have made extensive changes to the partitioning code for F11 beta, such that it is very difficult to tell whether your bug is still relevant or not. Please test with either the latest rawhide you have access to or F11 and let us know whether you are still seeing this problem. Thanks for the bug report.
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Chris, The best I've been able to do even with F11 anaconda is I first have to manually make sure that every superblock is perfectly matched with regard to preferred minors to its array and then and only then am I able to successfully install Fedora using anaconda. A normal Linux system has tolerance for mismatches in preferred minors but anaconda requires perfect matching of preferred minors which in my opinion is far too restrictive and not reflective of the behavior of normal mdadm operation.
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.