Bug 470905

Summary: anaconda installs the wrong kernel for i686 xen guests
Product: [Fedora] Fedora Reporter: John Poelstra <poelstra>
Component: anacondaAssignee: Anaconda Maintenance Team <anaconda-maint-list>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: anaconda-maint-list, berrange, bill-bugzilla.redhat.com, clalance, dcantrell, markmc, mishu, stickster, tcallawa, wwoods, xen-maint
Target Milestone: ---Flags: stickster: fedora_requires_release_note?
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-24 18:18:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 438944, 480593    
Attachments:
Description Flags
domain-builder-ng.log
none
xend.log
none
virt-manager.log none

Description John Poelstra 2008-11-10 20:10:48 UTC
Description of problem:
After successfully installing today's rawhide (Fedora 10) on a Xen guest and rebooting, the guest will not start and virt-manager returns a traceback.

Version-Release number of selected component (if applicable):
$ rpm -qa | egrep 'xen|virt' | sort
kernel-xen-2.6.18-121.el5
kernel-xen-2.6.18-122.el5
libvirt-0.3.3-14.el5
libvirt-python-0.3.3-14.el5
python-virtinst-0.300.2-11.el5
virt-manager-0.5.3-10.el5
virt-viewer-0.0.2-2.el5
xen-3.0.3-73.el5
xen-libs-3.0.3-73.el5

$ uname -a
Linux screamer 2.6.18-122.el5xen #1 SMP Mon Nov 3 18:49:46 EST 2008 i686 i686 i386 GNU/Linux


Additional info:
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/engine.py", line 514, in run_domain
    vm.startup()
  File "/usr/share/virt-manager/virtManager/domain.py", line 379, in startup
    self.vm.create()
  File "/usr/lib/python2.4/site-packages/libvirt.py", line 228, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: virDomainCreate() failed POST operation failed: (xend.err "Error creating domain: (2, 'Invalid kernel', 'elf_xen_note_check: ERROR: Will only load images built for the generic loader or Linux images')")

Comment 1 John Poelstra 2008-11-10 22:55:12 UTC
FWIW... a guest install of the latest RHEL5.3 beta installs and boots fine

Comment 2 Daniel Berrangé 2008-11-11 10:16:14 UTC
Please provide the log files for

  /root/.virt-manager/virt-manager.log
  /var/log/xen/xend.log
  /var/log/xen/domain-builder-ng.log

Comment 3 John Poelstra 2008-11-11 17:41:24 UTC
Created attachment 323205 [details]
domain-builder-ng.log

Comment 4 John Poelstra 2008-11-11 17:41:52 UTC
Created attachment 323206 [details]
xend.log

Comment 5 John Poelstra 2008-11-11 17:42:20 UTC
Created attachment 323207 [details]
virt-manager.log

Comment 6 John Poelstra 2008-11-11 17:42:48 UTC
Logs added.  Curious.... could this be a Fedora bug?

Comment 7 Chris Lalancette 2008-11-17 15:04:31 UTC
Well, what it seems like is that the wrong kernel was installed for some reason.  I've definitely installed F-10 guests on RHEL-5.3 before, so something weird is going on.  This has the hallmarks of installing the non-PAE kernel inside the guest, which doesn't, I believe, have pv_ops turned on.  John, can you mount the guest disk loopback and find out which kernel anaconda installed?

Chris Lalancette

Comment 8 Mark McLoughlin 2008-11-17 15:10:44 UTC
Yeah, that's the error you get if you try and boot a non-PAE i386 fedora kernel in a xen guest. See also bug #471268

Comment 9 John Poelstra 2008-11-18 17:58:35 UTC
Here is what grub.conf looks like for the guest: 

default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Client (2.6.18-122.el5xen)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-122.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet
        initrd /initrd-2.6.18-122.el5xen.img

Comment 10 John Poelstra 2008-11-18 18:01:05 UTC
disregard comment #9 information is from the wrong image

Comment 11 John Poelstra 2008-11-18 18:07:55 UTC
Here is grub.conf from F10 guest that was installed on RHEL5.3 dom0

default=0
timeout=0
chaintimeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Fedora (2.6.27.4-79.fc10.i686)
        root (hd0,0)
        kernel /vmlinuz-2.6.27.4-79.fc10.i686 ro root=UUID=3dfba3e5-65bc-4466-9499-f3bac2782f86 rhgb quiet
        initrd /initrd-2.6.27.4-79.fc10.i686.img

Comment 12 Chris Lalancette 2008-11-18 18:35:48 UTC
OK, yeah.  That does confirm that anaconda chose the wrong kernel.  Odd, because I had done an F-10 install a couple of months ago and it chose the right one.  This is *probably* an anaconda bug against F-10.

Chris Lalancette

Comment 13 Chris Lalancette 2008-11-19 10:08:57 UTC
OK.  I'm not exactly sure how the anaconda people would want to fix this, but the problem is in yuminstall.py, selectBestKernel(), here:

        # FIXME: this is a bit of a hack.  we shouldn't hard-code and
        # instead check by provides.  but alas.
        for k in ("kernel", "kernel-smp", "kernel-PAE"):
            if len(self.ayum.tsInfo.matchNaevr(name=k)) > 0:
                self.selectModulePackages(anaconda, k)
                foundkernel = True

        if not foundkernel and (isys.smpAvailable() or isys.htavailable()):
            try:
                ksmp = getBestKernelByArch("kernel-smp", self.ayum)
            except PackageSackError:
                ksmp = None
                log.debug("no kernel-smp package")

            if ksmp and ksmp.returnSimple("arch") == kpkg.returnSimple("arch"):
                foundkernel = True
                log.info("selected kernel-smp package for kernel")
                self.ayum.install(po=ksmp)
                self.selectModulePackages(anaconda, ksmp.name)

                if len(self.ayum.tsInfo.matchNaevr(name="gcc")) > 0:
                    log.debug("selecting kernel-smp-devel")
                    self.selectPackage("kernel-smp-devel.%s" % (kpkg.arch,))

        if not foundkernel and isys.isPaeAvailable():
            try:
                kpae = getBestKernelByArch("kernel-PAE", self.ayum)
            except PackageSackError:
                kpae = None
                log.debug("no kernel-PAE package")

            if kpae and kpae.returnSimple("arch") == kpkg.returnSimple("arch"):
                foundkernel = True
                log.info("select kernel-PAE package for kernel")
                self.ayum.install(po=kpae)
                self.selectModulePackages(anaconda, kpae.name)

                if len(self.ayum.tsInfo.matchNaevr(name="gcc")) > 0:
                    log.debug("selecting kernel-PAE-devel")
                    self.selectPackage("kernel-PAE-devel.%s" % (kpkg.arch,))

        if not foundkernel:
            log.info("selected kernel package for kernel")
            self.ayum.install(po=kpkg)
            self.selectModulePackages(anaconda, kpkg.name)

            if len(self.ayum.tsInfo.matchNaevr(name="gcc")) > 0:
                log.debug("selecting kernel-devel")
                self.selectPackage("kernel-devel.%s" % (kpkg.arch,))

Basically, we get out of that first loop without setting foundkernel to True (I'm not entirely sure what that first loop does).  Then we do the isys.smpAvailable() check, which is true, but that fails because there is no kernel-smp packages available.  Next, we do the isys.isPaeAvailable check, but this doesn't fire because we are using < 4G of memory here.  Finally, we fall through to the default, which is just to select "kernel".  This is confirmed by the anaconda logs:

08:58:03 INFO    : moving (1) to step postselection
08:58:03 DEBUG   : no kernel-smp package
08:58:03 INFO    : selected kernel package for kernel

So, one way to fix this (suggested by Mark McLoughlin) would be to fix this by changing:

        if not foundkernel and isys.isPaeAvailable():

to something like:

        if not foundkernel and (isys.isPaeAvailable() or running_PAE_kernel):

Where running_PAE_kernel would be set to True in the case that the anaconda installer is running on a PAE kernel.  We could determine running_PAE_kernel by doing "os.uname()[2]", and looking for the substring PAE.  I'm sure there are other solutions, but nothing is springing to mind at the moment.

Chris Lalancette

Comment 14 Jesse Keating 2008-11-19 16:09:21 UTC
What happens if you manually add say kernel-PAE to a kickstart file?  I suspect you'll get the right kernel (or both) and this would be an OKish workaround for F10.

Comment 15 Mark McLoughlin 2008-11-19 16:26:15 UTC
(In reply to comment #14)
> What happens if you manually add say kernel-PAE to a kickstart file?  I suspect
> you'll get the right kernel (or both) and this would be an OKish workaround for
> F10.

We haven't tried it yet, but yeah that would probably be our workaround if it doesn't get fixed for F10.

Comment 16 Paul W. Frields 2008-11-19 17:10:12 UTC
Should we add a release note for this for the 0-day update?

Comment 17 John Poelstra 2008-11-19 17:29:02 UTC
I haven't done a kickstart install in a long time and I think it is an unreasonably high requirement for a "workaround".  This is also a regression from Fedora 9.

Comment 18 Jesse Keating 2008-11-19 17:42:41 UTC
Given that you can't run Fedora as a Xen host, which means you need to have $SOMETHING_ELSE installed to run into this, I think it's unreasonable to consider this a blocker bug.  For now, using a kickstart script is likely a workable workaround.  In the near future we could have this fixed with an updates.img

The class of people that are going to have access to a Xen capable setup, but not know how to do a basic kickstart install (especially provided the Fedora and RHEL documentation on doing them) is going to be quite small in my opinion.  I do believe we should have some release notes for this issue, be them notes about using kickstart, using an updates.img, or both.

Comment 19 Will Woods 2008-11-24 22:34:17 UTC
Release note added:

https://fedoraproject.org/wiki/Common_F10_bugs#Fedora_10_i686_Xen_guest_won.27t_boot

As I understand it, this only affects 32-bit Xen hosts, which are far less common than 64-bit hosts. Still, it would probably be good if we had an updates.img for this problem soon.

Comment 20 Bug Zapper 2008-11-26 05:09:08 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 21 Daniel Berrangé 2009-03-24 18:18:21 UTC
Installed a new guest  on RHEL-5 Xen host with:

# virt-install --paravirt --location http://download.fedora.redhat.com/pub/fedora/linux/development/i386/os/  --name f11i686xen --file /var/lib/xen/images/f11i686.img --file-size 5 --vnc --ram 900  --noautoconsole


And post-install the guest has

# uname -r
2.6.29-0.258.rc8.git2.fc11.i686.PAE


So anaconda is installing the correct kernel now.

Comment 22 Bill McGonigle 2009-05-26 10:29:55 UTC
If the kernel choices showed up in the Base package selection list the user could just fix this as he goes.  I recently installed a virtual machine with 2GB using HVM and the standard anaconda install method, so I got the non-PAE kernel.  The workaround was to boot the machine again with HVM and then yum install the PAE kernel, adjust the yum timeout and default kernel, and then finally boot the machine PVM.  One side effect of moving Xen to PAE is that one kernel can't boot HVM and PVM anymore, so perhaps if anaconda knew something about the HVM VM's fingerprint it could just install both, anticipating a common usage scenario.