Bug 214494
Summary: | The kernel cannot be booted after installing. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | IBM Bug Proxy <bugproxy> | ||||||
Component: | anaconda | Assignee: | Anaconda Maintenance Team <anaconda-maint-list> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 6 | CC: | cagney, dhowells, dwmw2, gal, jgirouar, marksmit, wtogami | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | powerpc | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | yum-3.0.2 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-02-26 18:42:51 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
IBM Bug Proxy
2006-11-07 21:00:39 UTC
Created attachment 140606 [details]
fc6_working.tar
----- Additional Comments From TBLOTH.com 2006-11-07 16:44 EDT ------- FC6 install logs - working ----- Additional Comments From TBLOTH.com 2006-11-07 16:45 EDT ------- Aloha, please find above my first bunch of logs for the WORKING boot. The current firmware is Global Firmware (PHYP) MB245_300_000. How you guys came to the conclusion that this should be filed against grub is beyond me. Can you boot without "rhgb quiet" and get a log of the boot sequence that way? For now, I'm reassigning this to kernel. If the logs show something else, we can reassign it again. Created attachment 140718 [details]
FC6 install logs - not working
please find attached the logs of failed boot after a successful installation.
--------
Elapsed time since release of system processors: 0 mins 50 secs
Config file read, 1024 bytes
Welcome
Welcome to yaboot version 1.3.13 (Red Hat 1.3.13-2.fc6)
Enter "help" to get some basic usage information
boot: linux
Please wait, loading kernel...
Elf32 kernel loaded...
Loading ramdisk...
ramdisk loaded at 02700000, size: 1469 Kbytes
OF stdout device is: /vdevice/vty@30000000
command line: ro console=hvc0 rhgb quiet root=LABEL=/1
memory layout at init:
alloc_bottom : 02870000
alloc_top : 30000000
alloc_top_hi : ff000000
rmo_top : 30000000
ram_top : ff000000
Looking for displays
instantiating rtas at 0x0764c000 ... done
00000000 : boot cpu 00000000
00000001 : starting cpu hw idx 00000001... done
00000002 : starting cpu hw idx 00000002... done
00000003 : starting cpu hw idx 00000003... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x02b71000 -> 0x02b7229b
Device tree struct 0x02b73000 -> 0x02b8f000
Calling quiesce ...
returning from prom_init
--------
This problem seems to only occur on 32 bit and not 64 bit.
----- Additional Comments From marksmit.com 2006-11-13 21:16 EDT ------- My bug got dup'd to this one. Pardon if I am re-stating what is already known, but I could not find it summarized in this bug. My power5 ppc64 machines and added ones at OSDL are experiencing this problem. Not all lpars hit this problem, but what separates the successful ones from the recreates is the kernel it is trying to boot. All of the lpars that succeed are booting Elf64. All of the lpars that recreate this bug are attempting to boot elf32. I have an OpenPower 710 that has one lpar succeeding and another lpar recreating this bug. By booting in rescue mode and copying the successfull kernel and initrd from /boot onto the victim, this bug no longer recreates - and it says it is booting Elf64. This bug does not recreate on the Oct12 snapshot of rawhide, but does recreate on the FC6-gold that followed a few days later. What puzzles me is what is causing the install process to install a 32bit kernel in one lpar and the 64bit kernel on another lpar in the same system. What puzzles me is ----- Additional Comments From marksmit.com 2006-11-13 21:37 EDT ------- I can work around the problem, with rpm -Uhv kernel-2.6.18-1.2798.fc6.ppc64.rpm --force I copied that rpm over to the lpar when I was network-booted in rescue mode. I found that copying over the kernel and initrd (from a working system) got me booted, but all the modules, including networks were busted until I re- installed the kernel rpm and rebooted. ----- Additional Comments From marksmit.com 2006-11-13 22:45 EDT ------- Interesting recreate: original victim is booted using the workaround just posted. I then yum update to install the Nov10 FC6 updates just available. Upon reboot, the bug is recreated, this time with the new kernel, and it is attempting to load Elf32 again. The 2nd lpar in this system - the lpar that never recreated - successfully did the same yum update, but it installed and boots Elf64. If any logs from the yum update would assist, please let me know. Refiling against anaconda. We shouldn't be installing 32-bit kernels on 64-bit hardware. Please reproduce with anaconda debugging (however that's done). If yum also installs the wrong kernel, please file a separate bug against yum. Do we have an lpar that is exhibiting this with FC6? Not in-house. I haven't seen this on any LPAR FC6 installs. Feel free to blow away uranus.cambridge in an attempt to reproduce though, assuming dhowells concurs. Janice have you seen this on any of the Westford machines? I've no objection to uranus being reinstalled. Paul, I have not install FC6 on the typical systems in the pool. I did install a cellblade the other day, and this problem did not appear. For that system, FC6 gold ( /vol/engineering/redhat/released/FC-6/GOLD)was installed. My notes show that you were the last person who requested squad6-lp1. I would expect this to be available for testing. I noticed that someone installed 5.0 on this about 5 days ago. Was this you? Janice *** Bug 213092 has been marked as a duplicate of this bug. *** *** Bug 214902 has been marked as a duplicate of this bug. *** I would like to offer "kiwi" the OpenPower 710 to help. It is at IBM, but can make it available for debugging. I just recently re-installed FC6 and this continues to recreate in one lpar and not recreate in a 2nd lpar on the same machine. (cannot tell what is causing the diff) The failing lpar has had the workaround applied and is now up on the network. I can give access to Janice and let her investigate, or can collect logs, etc and attach them. ----- Additional Comments From marksmit.com 2006-12-21 00:31 EDT ------- On my VIO client that recreates easily with FC6-gold and updates, I am now observing a successful "yum update" and reboot with the Dec 15 FC6 updates repository. Specifically, reinstalling with FC6-gold - it recreates. On Dec 15, updated to kernel-2.6.18-1.2798.fc6.ppc64.rpm and it recreated. On Dec 20, updated to kernel-2.6.18-1.2868.fc6.ppc64.rpm and it no longer recreates. (note: when updating, I ran "yum update" so it was not just the kernel, but a whole slew of updates being applied.) Is anyone else seeing a similar change? I will re-attempt recreate including another FC6-gold install, just to be sure. ----- Additional Comments From marksmit.com 2006-12-22 01:52 EDT ------- Recreate conclusions: The most significant selections are a. software selection - in addition to default "core" offering, check the boxes to include "development system" and "web server" b. yum install kernel-2.6.18-1.2849.fc6 recreated on systems already running the ppc64 kernel (however the sampling is limited to lpars that previously recreated) After dozens of FC6-gold recreate attempts - manual reinstalls of 2 lpars: 1. OpenPower 710 lpar 0.5 proc units, 2GB mem, vscsi - 1 disk, virtual ethernet 2. p5-550, 4 dedicated procs, 8GB mem, IPR scsi disks, e1000 ethernet, lpar contains all system resources. (i.e. very different lpar configs, but it recreates on both) network boot (yaboot, vmlinuz and ramdisk from ppc/ppc64 dir on server) network install: server has expanded RPMs into a dir; not using ISO's a. Most installs done on one (first) disk (ie. /dev/sda), however default auto- partition scheme always accepted, once I de-selected all disks but sda. b. "remove-all partitions" on sda versus "remove-Linux partitions" (default offering) on sda does not seem to matter c. One sample was recreated using all the IPR disks (sda-sdr), accepting the default partitioning offering. text versus vnc does not matter nfs versus ftp (anonymous) does not seem to matter static eth versus dhcp eth does not matter kickstart files were not used recreates using yum install: There were a few nfs install attempts that succeeded (attempting to install just the "core" default software selection, for example) where the proper ppc64 kernel did install (ie. no bug recreate). In those cases, I could still do a specific yum install that would install the 32bit ppc kernel. Use the (older) Nov12 kernel in the yum repository thus: yum install kernel-2.6.18-1.2849.fc6 ----- Additional Comments From marksmit.com 2006-12-22 09:16 EDT ------- Installed both, picking only the "core" software offered by default. Both successfully installed the ppc64 kerneel. Upon boot after install, then yum install kernel (recreates, installing the 4MB ppc 32bit kernel) yum update kernel (succeeds, installing the 6MB ppc64 kernel) ----- Additional Comments From marksmit.com 2006-12-22 16:42 EDT ------- Even on a system that I cannot get to recreate this problem at install, I run: yum install kernel-2.6.18-1.2849.fc6 and it will recreate. In the transaction, it only offers to download and install one package. Whereas yum install kernel will pull the newer kernel version, but offer to download 2 packages - the ppc and ppc64 packages, and the resulting install is the correct ppc64 kernel. So in theory this should recreatable on any ppc64 system installed with FC6. Just run: yum install kernel-2.6.18-1.2849.fc6 ----- Additional Comments From marksmit.com 2006-12-22 16:46 EDT ------- Oops. correction to previous post: On this system that does not recreate this problem at install, yum install kernel-2.6.18-1.2849.fc6 recreates, but so does yum install kernel It will offer to download 2 packages in the 2nd case, but still chooses to install the ppc 32bit one. Fixed with yum 3.0.2 or later ----- Additional Comments From marksmit.com 2007-01-12 17:23 EDT ------- This still recreates for the victim that originally recreated at install (FC6- gold installs 32bit kernel). The scenario that recreates: # yum install kernel-2.6.18-1.2849.fc6 this recreated with yum-3.0.1-2.fc6.noarch.rpm and again when I enabled the /etc/yum.repos.d/fedora-updates-testing.repo # yum update yum to yum-3.0.3-1.fc6.noarch.rpm Reopening per comment 25 as this problem was recreated using yum-3.0.3-1.fc6.noarch.rpm ----- Additional Comments From rosalesa=40us.ibm.com (prefers email at ro= salesa=40austin.ibm.com) 2007-02-08 13:11 EDT ------- Reopening as this was recreated in: yum-3.0.3-1.fc6.noarch.rpm =20 ----- Additional Comments From jklewis.com 2007-02-13 14:27 EDT ------- I have this problem on just one of my Cell blades, it has a hardware revision of 40. My Cells that install and run FC6 Gold just fine have a revision of 31. Will that help solve this problem? Also, while trying to recover this manually (how is that done BTW?). I now can't boot anything, not even kernels that used to boot. I get "Not a valid ELF image" on every kernel. Did I somehow mess up my yaboot.conf file? I have a copy of it if needed. Unless I missed something this defect should have a much higher severity level. changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jklewis.com ------- Additional Comments From jklewis.com 2007-02-13 18:45 EDT ------- Please ignore what I said about "Not a valid ELF image" in Comment #44. Boneheaded mistake made by me in yaboot.conf. My system is finally up and running fine. I was able to boot into Rescue mode and scp the relevant kernel RPM over. After installing it I had files in /boot with both fc6smp and fc6 in the name. Using the "file" command I was able to find which ones were 64 bit (the fc6 ones) and after updating yaboot.conf the install proceeded normally. So, it's not very clear where are we on this defect. Something, I don't know what, is obviously installing the wrong kernel, but ONLY on certain systems. I currently have some Cell blades that install fine, and one that doesn't, so if I can help with this let me know. Huh? Comment #25 won't do anything at all to help the fact that the install was wrong to begin with. ----- Additional Comments From jklewis.com 2007-02-26 13:59 EDT ------- It's still not clear to me where we are on this one, and also why the severity is not higher. Has anyone been able to reproduce this in Fedora 7? I had mentioned earlier that I had one Cell blade that exhibits this wrong behavior, and several that don't. Unfortunately, the blade that has the install failure is not longer working properly and I have to send it in for repair. |