Bug 461532 - /proc/xen on bare-metal and FV guests causes multiple issues
/proc/xen on bare-metal and FV guests causes multiple issues
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.3
All Linux
medium Severity high
: beta
: ---
Assigned To: Don Dutile
Martin Jenner
: Regression, TestBlocker
: 461658 461823 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-08 17:19 EDT by Mike Gahagan
Modified: 2010-03-14 17:32 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:09:55 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Remove /proc/xen entries being generated by pv-on-hvm subsys on bare-metal kernels (1.61 KB, text/plain)
2008-09-11 14:21 EDT, Don Dutile
no flags Details

  None (edit)
Description Mike Gahagan 2008-09-08 17:19:13 EDT
Description of problem:
kernel-xen does not boot on a system set up to be dom0

Version-Release number of selected component (if applicable):
2.6.18-109 (and earlier, definately down to 105 if not older)

How reproducible:
always

Steps to Reproduce:
1.install kernel-xen on bare-metal system 
notice invalid grub stanza for xen:
"""
title Red Hat Enterprise Linux Server (2.6.18-109.el5)
	kernel /vmlinuz-2.6.18-109.el5 ro root=/dev/VolGroup00/LogVol00 crashkernel=128M@16M
	initrd /initrd-2.6.18-109.el5.img
"""
This results in grub giving an invalid file format error and system does not boot.
2.alternatively, run the /kernel/basic_sanity/boot_rhel5_kernels/ test from rhts.
3.
  
Actual results:

xen kernel fails to boot

Expected results:
a grub.conf stanza that looks something like:
title Red Hat Enterprise Linux Server (2.6.18-109.el5xen)
	kernel /xen.gz-2.6.18-109.el5
	module /vmlinuz-2.6.18-109.el5xen ro root=/dev/VolGroup00/LogVol00 crashkernel=128M@16M
	module /initrd-2.6.18-109.el5xen.img

which then boots the kernel normally

Additional info:

I took a look at this code in the kernel.spec file:

if [ -e /proc/xen/xsd_kva -o ! -d /proc/xen ]; then
        /sbin/new-kernel-pkg --package kernel-xen --mkinitrd --depmod --install 
--multiboot=/%{image_install_path}/xen.gz-%{KVERREL} %{KVERREL}xen || exit $?
else
        /sbin/new-kernel-pkg --package kernel-xen --mkinitrd --depmod --install 
%{KVERREL}xen || exit $?
fi


[root@test177 SPECS]# ll /proc/xen/*
-rw-r--r-- 1 root root 0 Sep  8 17:12 /proc/xen/balloon


If I run this command by hand I get the proper grub stanza:
 /sbin/new-kernel-pkg --package kernel-xen --mkinitrd --depmod --install --multiboot=/boot/xen.gz-2.6.18-109.el5 2.6.18-109.el5xen
Comment 1 Chris Lalancette 2008-09-09 03:19:37 EDT
Hm, I just tried it here, and ran into no such problem.  First, I had a Xen kernel running (2.6.18-107-xen), and installed -109-xen, and got:

title Red Hat Enterprise Linux Server (2.6.18-109.el5xen)
        root (hd0,2)
        kernel /xen.gz-2.6.18-109.el5 com1=115200,8n1
        module /vmlinuz-2.6.18-109.el5xen ro root=/dev/VolGroup00/RHEL5x86_64 console=tty0 console=ttyS0,115200
        module /initrd-2.6.18-109.el5xen.img

Then I rebooted into a 2.6.18-92 kernel (no xen), and installed -109-xen, and got:

title Red Hat Enterprise Linux Server (2.6.18-109.el5xen)
        root (hd0,2)
        kernel /xen.gz-2.6.18-109.el5 com1=115200,8n1
        module /vmlinuz-2.6.18-109.el5xen ro root=/dev/VolGroup00/RHEL5x86_64 console=tty0 console=ttyS0,115200
        module /initrd-2.6.18-109.el5xen.img

(i.e. it was exactly the same).  There was an error from the RPM post scripts about the sha256sum stuff, but that is harmless and should be fixed soon.

Can I have some more details about your setup, including architecture, etc?

Chris Lalancette
Comment 2 Mike Gahagan 2008-09-09 13:48:38 EDT
I'm seeing this on x86_64 and i386 systems. I don't know if it has anything to do with it or not, but the RHTS systems I see this on have never had the xen kernel installed on them as far as I know they just get loaded with the vanilla kernel and the boot_rhel5_kernels test itself installs each kernel variant one by one, boots it and does various tests then moves on to the next kernel. For example, here is basically what happens in one of my typical kernel test workflows:

1.) Install 5.2 on rhts system (done by rhts)
2.) Install the vanilla kernel from a repo I provide (includes all kernel variants for a given kernel version in 1 repo)
3.) boot system with kernel from step 2
4.) Run a bunch of regression, stress tests etc.
5.) The boot_rhel5_kernels test is typically the last one that runs and here is what it does:
    1. Downloads kernel from brew (using wget)
    2. Installs the kernel package (rpm -ivh)
    3. boots system to new kernel
    4. runs some sanity tests (module loading, gdb, dmesg capture etc)
    5. Repeat step 1 with next kernel in the list.

The fact that I'm not running the -92 kernel when I load these newer xen kernels might be significant, I'll look into this in a bit.
Comment 3 Mike Gahagan 2008-09-09 14:13:57 EDT
ok, when I install the -92 kernel then, install the -109xen kernel here is what I get.. looks like everything is working fine, note the missing /proc/xen directory:

[root@test177 x86_64]# ll /proc/xen
ls: /proc/xen: No such file or directory
[root@test177 x86_64]# cat /etc/grub.conf 
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
#          initrd /initrd-version.img
#boot=/dev/sda
default=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-109.el5xen)
	kernel /xen.gz-2.6.18-109.el5
	module /vmlinuz-2.6.18-109.el5xen ro root=/dev/VolGroup00/LogVol00 crashkernel=128M@16M
	module /initrd-2.6.18-109.el5xen.img
Comment 4 Mike Gahagan 2008-09-09 14:25:13 EDT
Note: comment 3 was with the system running the -92 vanilla kernel

and the results I get when installing the -109 xen kernel while running the -109 vanilla kernel:

[root@test177 x86_64]# ll /proc/xen/
total 0
-rw-r--r-- 1 root root 0 Sep  9 14:28 balloon
[root@test177 x86_64]# rpm -ivh kernel-xen-2.6.18-109.el5.x86_64.rpm
Preparing...                ########################################### [100%]
   1:kernel-xen             ########################################### [100%]
[root@test177 x86_64]# uname -r
2.6.18-109.el5
[root@test177 x86_64]# cat /etc/grub.conf 
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
#          initrd /initrd-version.img
#boot=/dev/sda
default=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-109.el5xen)
	kernel /vmlinuz-2.6.18-109.el5xen ro root=/dev/VolGroup00/LogVol00 crashkernel=128M@16M
	initrd /initrd-2.6.18-109.el5xen.img
Comment 5 Chris Lalancette 2008-09-09 16:18:40 EDT
Ah!  OK, now I understand.  We added the PV-on-HVM drivers recently, and now it unconditionally adds a /proc/xen directory (even on bare-metal).  So what's happening is that the existence of the /proc/xen directory fools the %post scripts into thinking it's a domU, so it takes that path instead of the dom0 path.

Now it makes more sense.  OK, we'll have to fix it up one way or the other; thanks for clarifying.

Chris Lalancette
Comment 7 Chris Lalancette 2008-09-10 17:15:00 EDT
*** Bug 461658 has been marked as a duplicate of this bug. ***
Comment 8 Bill Burns 2008-09-11 09:56:40 EDT
*** Bug 461823 has been marked as a duplicate of this bug. ***
Comment 10 Mark McLoughlin 2008-09-11 14:11:51 EDT
Okay, the issues we've seen caused by this so far include:

 1) kernel-xen %post relies on /proc/xen not existing on bare-metal in
    order to know to install add the HV in grub.conf

 2) kdump gets confused by /proc/xen; sounds like this would be an issue
    on both bare-metal and FV guests?

 3) anaconda sees /proc/xen and thinks it's a PV DomU. Again an issue for
    both bare-metal and FV

We may want to change both anaconda and kdump as follows:

  -  if [ -d /proc/xen ] ; then
  +  if [ -d /proc/xen and -f /proc/xen/capabilities ] ; then

so that if we do ever have /proc/xen on FV guests using pv-on-hvm, things don't break again.
Comment 11 Don Dutile 2008-09-11 14:21:52 EDT
Created attachment 316469 [details]
Remove /proc/xen entries being generated by pv-on-hvm subsys on bare-metal kernels

Posted patch
Comment 12 Prarit Bhargava 2008-09-11 14:33:12 EDT
Adding nhorman so he sees comment #10.

P.
Comment 13 Mike Gahagan 2008-09-12 15:58:53 EDT
It looks like the 2.6.18-111.el5dz_test kernel resolves this issue. 

http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=28971

note: I cancelled the job when it was clear the bare-metal systems could boot the xen kernel. The case where boot_rhel5_kernels tries to boot a bare-metal kernel on a xen guest is a known issue with the test, although it might become a moot point pretty soon.
Comment 14 Don Zickus 2008-09-12 21:48:11 EDT
in kernel-2.6.18-113.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 17 Mike Gahagan 2008-10-22 15:12:33 EDT
confirmed fix in the -120 kernel.
Comment 18 Ryan Lerch 2008-11-06 20:03:21 EST
This bug has been marked for inclusion in the Red Hat Enterprise Linux 5.3
Release Notes.

To aid in the development of relevant and accurate release notes, please fill
out the "Release Notes" field above with the following 4 pieces of information:


Cause:   What actions or circumstances cause this bug to present.

Consequence:  What happens when the bug presents.

Fix:   What was done to fix the bug.

Result:  What now happens when the actions or circumstances above occur. (NB:
this is not the same as 'the bug doesn't present anymore')
Comment 20 Don Dutile 2008-11-11 10:51:25 EST
This bug was introduced in 5.3, didn't exist in 5.2, and repaired before
beta.

thus, it doesn't require a release note.
Comment 21 Ryan Lerch 2008-11-11 15:58:21 EST
thanks don!
removing this bug from the 5.3 release notes tracker.
Comment 23 errata-xmlrpc 2009-01-20 15:09:55 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.