Bug 515831

Summary: Switch bzImage from LZMA back to gzip compression so Xen can load Fedora kernels again
Product: [Fedora] Fedora Reporter: John Poelstra <poelstra>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: rawhideCC: clalance, dcantrell, ijc, itamar, jeremy, kernel-maint, markmc, orion, pbonzini, robatino, xen-maint
Target Milestone: ---Flags: poelstra: fedora_requires_release_note+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-08-12 13:54:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 498968, 507676    

Description John Poelstra 2009-08-05 21:20:40 UTC
Description of problem:
traceback installing Fedora 12 Alpha

Version-Release number of selected component (if applicable):
# rpm -qa | egrep 'xen|virt' | sort
kernel-xen-2.6.18-157.el5
kernel-xen-2.6.18-159.el5
kernel-xen-2.6.18-160.el5
kernel-xen-devel-2.6.18-160.el5
libvirt-0.6.3-17.el5
libvirt-python-0.6.3-17.el5
python-virtinst-0.400.3-5.el5
virt-manager-0.6.1-8.el5
virt-viewer-0.0.2-3.el5
xen-3.0.3-93.el5
xen-libs-3.0.3-93.el5


How reproducible:
100%

Steps to Reproduce:
1. attempt http install of Fedora 12 Alpha candidate


Additional info:
Fedora 11 installs just fine


#  virt-install \
>               --paravirt \
>               --name f12 \
>               --ram 800 \
>                --disk path=/home/f12-2009-08-04.img \
>               --vnc \
>               --location http://192.168.1.51/f12 \
>              --network=bridge:xenbr0 \
>                --vcpus=2


Starting install...
Retrieving file .treeinfo...                                                                                                          | 1.4 kB     00:00     
Retrieving file vmlinuz-PAE...                                                                                                        | 2.8 MB     00:00     
Retrieving file initrd-PAE.img...                                                                                                     |  30 MB     00:02     
POST operation failed: xend_post: error from xen daemon: (xend.err "Error creating domain: (2, 'Invalid kernel', 'xc_dom_find_loader: no loader found\\n')")
Domain installation may not have been
 successful.  If it was, you can restart your domain
 by running 'virsh start f12'; otherwise, please
 restart your installation.
ERROR    POST operation failed: xend_post: error from xen daemon: (xend.err "Error creating domain: (2, 'Invalid kernel', 'xc_dom_find_loader: no loader found\\n')")
Traceback (most recent call last):
  File "/usr/sbin/virt-install", line 861, in ?
    main()
  File "/usr/sbin/virt-install", line 759, in main
    start_time, guest.start_install)
  File "/usr/sbin/virt-install", line 814, in do_install
    dom = install_func(conscb, progresscb, wait=(not wait))
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 541, in start_install
    return self._do_install(consolecb, meter, removeOld, wait)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 633, in _do_install
    self.domain = self.conn.createLinux(install_xml, 0)
  File "/usr/lib/python2.4/site-packages/libvirt.py", line 974, in createLinux
    if ret is None:raise libvirtError('virDomainCreateLinux() failed', conn=self)
libvirtError: POST operation failed: xend_post: error from xen daemon: (xend.err "Error creating domain: (2, 'Invalid kernel', 'xc_dom_find_loader: no loader found\\n')")

Comment 1 Mark McLoughlin 2009-08-06 14:35:20 UTC
(In reply to comment #0)

> Additional info:
> Fedora 11 installs just fine

Ah, that suggests its an F-12 bug then

Have we lost xen pv_ops support somehow or ... ?

Comment 2 Mark McLoughlin 2009-08-11 09:04:24 UTC
From:

  http://www.redhat.com/archives/fedora-xen/2009-August/msg00001.html

  xc_dom_parse_image: called
  xc_dom_find_loader: trying ELF-generic loader ... failed
  xc_dom_find_loader: trying Linux bzImage loader ... failed
  xc_dom_find_loader: trying multiboot-binary loader ... failed
  xc_dom_core.c:495: panic: xc_dom_find_loader: no loader found

Comment 3 Mark McLoughlin 2009-08-11 10:50:07 UTC
Okay, so it turns out the problem is

  # CONFIG_KERNEL_GZIP is not set
  # CONFIG_KERNEL_BZIP2 is not set
  CONFIG_KERNEL_LZMA=y

the change was introduced by:

  * Mon Jul 06 2009 Chuck Ebbert <cebbert>
  - Use LZMA for kernel compression on X86.

Xen's bzImage loaded can only handle compressed gzip, not compressed LZMA

Switching to LZMA means that Fedora kernels cannot be booted by any deployed or upstream versions of Xen; we really should switch this back to gzip

Moving back to F12Alpha - we can easily do this in time for Alpha

Comment 4 Mark McLoughlin 2009-08-11 17:51:52 UTC
Okay, we had a bit of a discussion on #fedora-virt; updating here since it's on the alpha blocker list

The gist of the conversation was:

  - Switching from gzip to lzma means that Fedora 12 won't be usable on any
    currently existing xen deployments

  - This is analogous to the switch from vmlinuz to bzImage we made in Fedora 9,
    except there was a patch to support this upstream well in advance of the 
    change and this patch was at least adopted by RHEL in reasonable time

  - The difference here is that there isn't even upstream support yet, but we
    plan to resolve that

  - Suggestion is to delay switching to lzma until Fedora 13 so that upstream
    and enterprise distros have time to react

  - One argument against that is that Fedora shouldn't be held hostage to Xen's
    deficiencies, although since Fedora is changing an ABI here which was
    specifically added for Xen, I think it makes sense to give Xen time to catch
    up

  - Another argument is that lzma reduces the kernel size (e.g. vmlinuz 3.5M to
    2.8M) and this could help livecd, but I think that's a fairly minor size
    gain for dropping the ability to run on Xen

Comment 5 Chris Lalancette 2009-08-11 18:05:21 UTC
(In reply to comment #4)
> Okay, we had a bit of a discussion on #fedora-virt; updating here since it's on
> the alpha blocker list
> 
> The gist of the conversation was:
> 
>   - Switching from gzip to lzma means that Fedora 12 won't be usable on any
>     currently existing xen deployments
> 
>   - This is analogous to the switch from vmlinuz to bzImage we made in Fedora
> 9,
>     except there was a patch to support this upstream well in advance of the 
>     change and this patch was at least adopted by RHEL in reasonable time
> 
>   - The difference here is that there isn't even upstream support yet, but we
>     plan to resolve that
> 
>   - Suggestion is to delay switching to lzma until Fedora 13 so that upstream
>     and enterprise distros have time to react
> 
>   - One argument against that is that Fedora shouldn't be held hostage to Xen's
>     deficiencies, although since Fedora is changing an ABI here which was
>     specifically added for Xen, I think it makes sense to give Xen time to
> catch
>     up
> 
>   - Another argument is that lzma reduces the kernel size (e.g. vmlinuz 3.5M to
>     2.8M) and this could help livecd, but I think that's a fairly minor size
>     gain for dropping the ability to run on Xen  

Thanks Mark.  As we discussed earlier today on IRC, I am working on a patch to make the Xen domain builder understand bzip2 and lzma, which I will post upstream and for RHEL-5 when it's ready.  By delaying to F-13, that should give us time to get it into product, and give upstream time to get it in (maybe even in the stable 3.3 and 3.4 branches, depending on the invasiveness of the patch).

Chris Lalancette

Comment 6 Jeremy Fitzhardinge 2009-08-11 18:18:45 UTC
At one point I had some patches to allow the domain builder to load a bzImage as-is and use its internal decompressor, specifically to deal with the case of the algorithm changing.  It turned out to be fairly complex in a pretty fragile piece of code, and there was general pushback because "the algorithm isn't changing" (about a year ago).

Incorporating lzma into the domain builder should be pretty straightforward, at least for usermode.  But I think the same libxc code is used by Xen for dom0 loading?

It would be nice, for completeness, to include support for the other possible algorithms too, though I'm not sure there's any reason to use them over lzma.

Comment 7 Jesse Keating 2009-08-11 22:36:51 UTC
A build of the f12-alpha kernel with this disabled has been done.  We're testing it now for f12-alpha tagging.

Comment 8 Mark McLoughlin 2009-08-12 13:54:59 UTC
Looks like kernel-2.6.31-0.125.4.2.rc5.git2.fc12 has been tagged and it contains this:

* Tue Aug 11 2009 Kyle McMartin <kyle>
- private-f12-2_6_31_rc5-imeanit: LZMA. OFF. I MEAN IT.

Just checking:

 - with the x86_64 kernel, the magic number is at offset 0x4069

   that's 0x200 + ((setup_sects=0x1e) * 0x200) + (payload_offset=0x269)

 - hexdump -C -n 2 -s 0x4069 vmlinuz-2.6.31-0.125.4.2.rc5.git2.fc12.x86_64
   00004069  1f 8b                                             |..|

 - hexdump -C -n 2 -s 0x4069 vmlinuz-2.6.31-0.118.rc5.fc12.x86_64
   00004069  5d 00                                             |].|

 - 1f 8b is gzip, 5d 00 is lzma

For anyone interested, I've filed a bz asking for lzma support in RHEL5 - bug #517049

Comment 9 Mark McLoughlin 2009-08-20 16:03:49 UTC
This got disabled for the F12 Alpha build, but wasn't disabled in devel/ until now:

* Thu Aug 20 2009 Mark McLoughlin <markmc>
- Disable LZMA for xen (#515831)

Comment 10 Mark McLoughlin 2009-08-21 07:40:05 UTC
Upstream xen now has support for lzma and bzip2. We're going to add these patches to Fedora 12 xen. See bug #518551

Comment 11 Mark McLoughlin 2009-08-21 07:41:17 UTC
(In reply to comment #10)
> See bug #518551

Sorry, it's bug #518588