Bug 470905 - anaconda installs the wrong kernel for i686 xen guests
anaconda installs the wrong kernel for i686 xen guests
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: anaconda (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Anaconda Maintenance Team
Fedora Extras Quality Assurance
:
Depends On:
Blocks: F10Target F11VirtBlocker
  Show dependency treegraph
 
Reported: 2008-11-10 15:10 EST by John Poelstra
Modified: 2013-01-09 23:54 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-03-24 14:18:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
stickster: fedora_requires_release_note?


Attachments (Terms of Use)
domain-builder-ng.log (28.08 KB, text/plain)
2008-11-11 12:41 EST, John Poelstra
no flags Details
xend.log (89.60 KB, text/plain)
2008-11-11 12:41 EST, John Poelstra
no flags Details
virt-manager.log (125.53 KB, text/plain)
2008-11-11 12:42 EST, John Poelstra
no flags Details

  None (edit)
Description John Poelstra 2008-11-10 15:10:48 EST
Description of problem:
After successfully installing today's rawhide (Fedora 10) on a Xen guest and rebooting, the guest will not start and virt-manager returns a traceback.

Version-Release number of selected component (if applicable):
$ rpm -qa | egrep 'xen|virt' | sort
kernel-xen-2.6.18-121.el5
kernel-xen-2.6.18-122.el5
libvirt-0.3.3-14.el5
libvirt-python-0.3.3-14.el5
python-virtinst-0.300.2-11.el5
virt-manager-0.5.3-10.el5
virt-viewer-0.0.2-2.el5
xen-3.0.3-73.el5
xen-libs-3.0.3-73.el5

$ uname -a
Linux screamer 2.6.18-122.el5xen #1 SMP Mon Nov 3 18:49:46 EST 2008 i686 i686 i386 GNU/Linux


Additional info:
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/engine.py", line 514, in run_domain
    vm.startup()
  File "/usr/share/virt-manager/virtManager/domain.py", line 379, in startup
    self.vm.create()
  File "/usr/lib/python2.4/site-packages/libvirt.py", line 228, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: virDomainCreate() failed POST operation failed: (xend.err "Error creating domain: (2, 'Invalid kernel', 'elf_xen_note_check: ERROR: Will only load images built for the generic loader or Linux images')")
Comment 1 John Poelstra 2008-11-10 17:55:12 EST
FWIW... a guest install of the latest RHEL5.3 beta installs and boots fine
Comment 2 Daniel Berrange 2008-11-11 05:16:14 EST
Please provide the log files for

  /root/.virt-manager/virt-manager.log
  /var/log/xen/xend.log
  /var/log/xen/domain-builder-ng.log
Comment 3 John Poelstra 2008-11-11 12:41:24 EST
Created attachment 323205 [details]
domain-builder-ng.log
Comment 4 John Poelstra 2008-11-11 12:41:52 EST
Created attachment 323206 [details]
xend.log
Comment 5 John Poelstra 2008-11-11 12:42:20 EST
Created attachment 323207 [details]
virt-manager.log
Comment 6 John Poelstra 2008-11-11 12:42:48 EST
Logs added.  Curious.... could this be a Fedora bug?
Comment 7 Chris Lalancette 2008-11-17 10:04:31 EST
Well, what it seems like is that the wrong kernel was installed for some reason.  I've definitely installed F-10 guests on RHEL-5.3 before, so something weird is going on.  This has the hallmarks of installing the non-PAE kernel inside the guest, which doesn't, I believe, have pv_ops turned on.  John, can you mount the guest disk loopback and find out which kernel anaconda installed?

Chris Lalancette
Comment 8 Mark McLoughlin 2008-11-17 10:10:44 EST
Yeah, that's the error you get if you try and boot a non-PAE i386 fedora kernel in a xen guest. See also bug #471268
Comment 9 John Poelstra 2008-11-18 12:58:35 EST
Here is what grub.conf looks like for the guest: 

default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Client (2.6.18-122.el5xen)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-122.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet
        initrd /initrd-2.6.18-122.el5xen.img
Comment 10 John Poelstra 2008-11-18 13:01:05 EST
disregard comment #9 information is from the wrong image
Comment 11 John Poelstra 2008-11-18 13:07:55 EST
Here is grub.conf from F10 guest that was installed on RHEL5.3 dom0

default=0
timeout=0
chaintimeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Fedora (2.6.27.4-79.fc10.i686)
        root (hd0,0)
        kernel /vmlinuz-2.6.27.4-79.fc10.i686 ro root=UUID=3dfba3e5-65bc-4466-9499-f3bac2782f86 rhgb quiet
        initrd /initrd-2.6.27.4-79.fc10.i686.img
Comment 12 Chris Lalancette 2008-11-18 13:35:48 EST
OK, yeah.  That does confirm that anaconda chose the wrong kernel.  Odd, because I had done an F-10 install a couple of months ago and it chose the right one.  This is *probably* an anaconda bug against F-10.

Chris Lalancette
Comment 13 Chris Lalancette 2008-11-19 05:08:57 EST
OK.  I'm not exactly sure how the anaconda people would want to fix this, but the problem is in yuminstall.py, selectBestKernel(), here:

        # FIXME: this is a bit of a hack.  we shouldn't hard-code and
        # instead check by provides.  but alas.
        for k in ("kernel", "kernel-smp", "kernel-PAE"):
            if len(self.ayum.tsInfo.matchNaevr(name=k)) > 0:
                self.selectModulePackages(anaconda, k)
                foundkernel = True

        if not foundkernel and (isys.smpAvailable() or isys.htavailable()):
            try:
                ksmp = getBestKernelByArch("kernel-smp", self.ayum)
            except PackageSackError:
                ksmp = None
                log.debug("no kernel-smp package")

            if ksmp and ksmp.returnSimple("arch") == kpkg.returnSimple("arch"):
                foundkernel = True
                log.info("selected kernel-smp package for kernel")
                self.ayum.install(po=ksmp)
                self.selectModulePackages(anaconda, ksmp.name)

                if len(self.ayum.tsInfo.matchNaevr(name="gcc")) > 0:
                    log.debug("selecting kernel-smp-devel")
                    self.selectPackage("kernel-smp-devel.%s" % (kpkg.arch,))

        if not foundkernel and isys.isPaeAvailable():
            try:
                kpae = getBestKernelByArch("kernel-PAE", self.ayum)
            except PackageSackError:
                kpae = None
                log.debug("no kernel-PAE package")

            if kpae and kpae.returnSimple("arch") == kpkg.returnSimple("arch"):
                foundkernel = True
                log.info("select kernel-PAE package for kernel")
                self.ayum.install(po=kpae)
                self.selectModulePackages(anaconda, kpae.name)

                if len(self.ayum.tsInfo.matchNaevr(name="gcc")) > 0:
                    log.debug("selecting kernel-PAE-devel")
                    self.selectPackage("kernel-PAE-devel.%s" % (kpkg.arch,))

        if not foundkernel:
            log.info("selected kernel package for kernel")
            self.ayum.install(po=kpkg)
            self.selectModulePackages(anaconda, kpkg.name)

            if len(self.ayum.tsInfo.matchNaevr(name="gcc")) > 0:
                log.debug("selecting kernel-devel")
                self.selectPackage("kernel-devel.%s" % (kpkg.arch,))

Basically, we get out of that first loop without setting foundkernel to True (I'm not entirely sure what that first loop does).  Then we do the isys.smpAvailable() check, which is true, but that fails because there is no kernel-smp packages available.  Next, we do the isys.isPaeAvailable check, but this doesn't fire because we are using < 4G of memory here.  Finally, we fall through to the default, which is just to select "kernel".  This is confirmed by the anaconda logs:

08:58:03 INFO    : moving (1) to step postselection
08:58:03 DEBUG   : no kernel-smp package
08:58:03 INFO    : selected kernel package for kernel

So, one way to fix this (suggested by Mark McLoughlin) would be to fix this by changing:

        if not foundkernel and isys.isPaeAvailable():

to something like:

        if not foundkernel and (isys.isPaeAvailable() or running_PAE_kernel):

Where running_PAE_kernel would be set to True in the case that the anaconda installer is running on a PAE kernel.  We could determine running_PAE_kernel by doing "os.uname()[2]", and looking for the substring PAE.  I'm sure there are other solutions, but nothing is springing to mind at the moment.

Chris Lalancette
Comment 14 Jesse Keating 2008-11-19 11:09:21 EST
What happens if you manually add say kernel-PAE to a kickstart file?  I suspect you'll get the right kernel (or both) and this would be an OKish workaround for F10.
Comment 15 Mark McLoughlin 2008-11-19 11:26:15 EST
(In reply to comment #14)
> What happens if you manually add say kernel-PAE to a kickstart file?  I suspect
> you'll get the right kernel (or both) and this would be an OKish workaround for
> F10.

We haven't tried it yet, but yeah that would probably be our workaround if it doesn't get fixed for F10.
Comment 16 Paul W. Frields 2008-11-19 12:10:12 EST
Should we add a release note for this for the 0-day update?
Comment 17 John Poelstra 2008-11-19 12:29:02 EST
I haven't done a kickstart install in a long time and I think it is an unreasonably high requirement for a "workaround".  This is also a regression from Fedora 9.
Comment 18 Jesse Keating 2008-11-19 12:42:41 EST
Given that you can't run Fedora as a Xen host, which means you need to have $SOMETHING_ELSE installed to run into this, I think it's unreasonable to consider this a blocker bug.  For now, using a kickstart script is likely a workable workaround.  In the near future we could have this fixed with an updates.img

The class of people that are going to have access to a Xen capable setup, but not know how to do a basic kickstart install (especially provided the Fedora and RHEL documentation on doing them) is going to be quite small in my opinion.  I do believe we should have some release notes for this issue, be them notes about using kickstart, using an updates.img, or both.
Comment 19 Will Woods 2008-11-24 17:34:17 EST
Release note added:

https://fedoraproject.org/wiki/Common_F10_bugs#Fedora_10_i686_Xen_guest_won.27t_boot

As I understand it, this only affects 32-bit Xen hosts, which are far less common than 64-bit hosts. Still, it would probably be good if we had an updates.img for this problem soon.
Comment 20 Bug Zapper 2008-11-26 00:09:08 EST
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 21 Daniel Berrange 2009-03-24 14:18:21 EDT
Installed a new guest  on RHEL-5 Xen host with:

# virt-install --paravirt --location http://download.fedora.redhat.com/pub/fedora/linux/development/i386/os/  --name f11i686xen --file /var/lib/xen/images/f11i686.img --file-size 5 --vnc --ram 900  --noautoconsole


And post-install the guest has

# uname -r
2.6.29-0.258.rc8.git2.fc11.i686.PAE


So anaconda is installing the correct kernel now.
Comment 22 Bill McGonigle 2009-05-26 06:29:55 EDT
If the kernel choices showed up in the Base package selection list the user could just fix this as he goes.  I recently installed a virtual machine with 2GB using HVM and the standard anaconda install method, so I got the non-PAE kernel.  The workaround was to boot the machine again with HVM and then yum install the PAE kernel, adjust the yum timeout and default kernel, and then finally boot the machine PVM.  One side effect of moving Xen to PAE is that one kernel can't boot HVM and PVM anymore, so perhaps if anaconda knew something about the HVM VM's fingerprint it could just install both, anticipating a common usage scenario.

Note You need to log in before you can comment on or make changes to this bug.