Bug 866735 - RHEL7.0 20121012.1 can't be installed on RHEL5.9 kernel-xen host through PXE/DVD method
RHEL7.0 20121012.1 can't be installed on RHEL5.9 kernel-xen host through PXE/...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.0
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Red Hat Kernel Manager
Virtualization Bugs
xen
: Regression, TestBlocker
: 882859 (view as bug list)
Depends On:
Blocks: 761591
  Show dependency treegraph
 
Reported: 2012-10-15 22:45 EDT by Wei Shi
Modified: 2013-01-31 01:19 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-01-25 03:33:26 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
screenshot through PXE installatoin method (17.44 KB, image/png)
2012-10-15 22:47 EDT, Wei Shi
no flags Details

  None (edit)
Description Wei Shi 2012-10-15 22:45:50 EDT
Description of problem:
RHEL7.0 20121012.1 tree can't be installed from both PXE and DVD method on RHEL5.9 and RHEL5.8 xen host.

Version-Release number of selected component (if applicable):
Host : RHEL5.9 20121010.1 x86_64 AMD
       2.6.18-343.el5xen
       xen-3.0.3-142.el5

Guest: RHEL7.0 20121012.1(kernel-3.6.0-0.28.el7) HVM x86_64
       

How reproducible:
100%

Steps to Reproduce:
DVD method:
1. make RHEL7.0 20121012.1 DVD installation file
2. launch guest with a prepared DVD file and prepared raw image (see attachment xm-cdrom.conf)

PXE method:
1. make RHEL7.0 20121012.1 installation PXE profile (see attachment RHEL-7.0-Server-x64-20121012.1.distro, ks-RHEL7-Server-x64-20121012.1-manual-2012-10-15.cfg, RHEL7.0-Server-x64-20121012.1-manual-2012-10-15.profile)
2. launch guest boot from network

Actual results:
  For DVD method, stop after the install grub screen, can be reproduced when installing by using boot.iso. It failed at the very beginning of the installation phase:
    [0.000000] Cannot get hvm parameter 18: -22!

  For PXE method, stop before launch anaconda, see attachment(PXE.png)

Expected results:
  Guest installation progress should lanuched normal.

Additional info:
  Tested PASS with kernel-3.3.0-0.20.el7, RHEL7.0-20120711.2 on RHEL5.8 release
  Tested PASS with kernel-3.3.0-0.20.el7, RHEL7.0-20120711.2 on RHEL5.9 20121010.1
  Tested Fail with kernel-3.6.0-0.28.el7, RHEL7.0 20121012.1 on RHEL5.8 release.
  Tested Fail with kernel-3.6.0-0.28.el7, RHEL7.0 20121012.1 on RHEL5.9 20121010.1.
Comment 1 Wei Shi 2012-10-15 22:47:49 EDT
Created attachment 627915 [details]
screenshot through PXE installatoin method
Comment 3 Andrew Jones 2012-10-17 06:48:34 EDT
I reproduced this.

Running 'xenctx $GUEST | grep rip' a few times I see that we're looping. Doing a 'xenctx --stack-trace $GUEST' and decoding the addrs with the vmlinux from the kernel debuginfo and addr2line yields

[<ffffffff81042e76>] (native_safe_halt [arch/x86/include/asm/irqflags.h:50])
[<ffffffff81893ed8>] (_sdata [??:0])
[<ffffffff8101b79f>] (arch_safe_halt [arch/x86/include/asm/paravirt.h:111]) (default_idle [arch/x86/kernel/process.c:495])
[<ffffffff81893fd8>] (_sdata [??:0])
[<ffffffff81991180>] (__stop___verbose [??:0])
[<ffffffff81893f08>] (_sdata [??:0])
[<ffffffff8101c4ce>] (cpu_idle [arch/x86/kernel/process.c:460])
[<ffffffff81893ef8>] (_sdata [??:0])
[<ffffffff819f6000>] (?? [??:0])
[<ffffffff81893f18>] (_sdata [??:0])
[<ffffffff815ac65e>] (rest_init [init/main.c:386])
[<ffffffff81893f68>] (_sdata [??:0])
[<ffffffff819afc1f>] (start_kernel [init/main.c:638])
[<ffffffff819af672>] (unknown_bootoption [init/main.c:251])
[<ffffffff81893f58>] (_sdata [??:0])
[<ffffffff81a002e0>] (real_mode_blob_end [??:0])
[<ffffffff81ac3000>] (idt_table [??:0])
[<ffffffff81893fa8>] (_sdata [??:0])
[<ffffffff81893f88>] (_sdata [??:0])
[<ffffffff819af356>] (x86_64_start_reservations [arch/x86/kernel/head64.c:123])
[<ffffffff81893fa8>] (_sdata [??:0])
[<ffffffff81893fe8>] (_sdata [??:0])
[<ffffffff819af45a>] (x86_64_start_kernel [arch/x86/kernel/head64.c:94])

cpu_idle immediately caught my eye since I know it likes MONITOR/MWAIT, but we mask those on Xen, leading use to default_idle and native_safe_halt, and doesn't seem to work for us. But it used to (3.3.0-0.20.el7). So what changed?

$ git log --oneline kernel-3.3.0-0.20.el7..HEAD arch/x86/kernel/process.c | grep -v 'Merge'
c767a54 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
1dcc8d7 x86, fpu: drop the fpu state during thread exit
55ccf3f fork: move the real prepare_to_copy() users to arch_dup_task_struct()
c6ae41e x86: replace percpu_xxx funcs with this_cpu_xxx
57da8b9 x86: Avoid double stack traces with show_regs()
38e7c57 x86: Use common threadinfo allocator
85f7f65 x86: Use kick_all_cpus_sync()
19209bb x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly
4504689 x86: Use generic init_task
f636520 x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility
f05e798 Disintegrate asm/system.h for X86
1361b83 i387: Split up <asm/i387.h> into exported and internal interfaces
4845465 x86/tracing: Denote the power and cpuidle tracepoints as _rcuidle()

Nothing too suspicious at first glance. I'll look closer.
Comment 4 Andrew Jones 2012-10-17 08:57:47 EDT
(In reply to comment #3)
> $ git log --oneline kernel-3.3.0-0.20.el7..HEAD arch/x86/kernel/process.c |
> grep -v 'Merge'

My grep above wasn't specific enough, it should have been -v 'Merge branch', talk about bad luck... Changing it pops out a more suspicious commit

commit 90e240142bd31ff10aeda5a280a53153f4eff004
Author: Richard Weinberger <richard@nod.at>
Date:   Sun Mar 25 23:00:04 2012 +0200

    x86: Merge the x86_32 and x86_64 cpu_idle() functions

This patch changes how we exit idle, and even introduces the code in which
cpu_idle [arch/x86/kernel/process.c:460] lives.
Comment 5 Andrew Jones 2012-10-17 09:20:25 EDT
(In reply to comment #4)
> This patch changes how we exit idle, and even introduces the code in which
> cpu_idle [arch/x86/kernel/process.c:460] lives.

Nevermind, this patch actually does a pretty clean move of cpu_idle from process_64.c to process.c, it's probably not the culprit.
Comment 6 Miroslav Rezanina 2012-10-19 09:23:37 EDT
This is caused by "xen/pv-on-hvm kexec: shutdown watches from old kernel" patch. See upstream commit cb6b6df111e46b9d0f79eb971575fd50555f43f4 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=cb6b6df111e46b9d0f79eb971575fd50555f43f4 ) that fixes the problem.
Comment 7 Wei Shi 2012-11-28 01:19:31 EST
Same problem on our new RHEL7.0 tree : RHEL-7.0-20121120.1
Comment 8 Wei Shi 2012-11-29 21:27:54 EST
Just tracking with our new tree:
  Same problem on our new RHEL7.0 tree : RHEL-7.0-20121129.0
Comment 9 Miroslav Rezanina 2012-12-03 04:10:28 EST
*** Bug 882859 has been marked as a duplicate of this bug. ***
Comment 10 Wei Shi 2012-12-17 20:43:55 EST
Just tracking with our new tree:
  Same problem on our new RHEL7.0 tree : RHEL-7.0-20121217.0
Comment 11 Wei Shi 2013-01-24 21:13:53 EST
Just tracking with our new tree:
 Still can not install our new RHEL7.0 tree : RHEL-7.0-20130120.0

 But the problem seems a little bit different, first, the output is different and it can jump over the step :
Before: "[0.000000] Cannot get hvm parameter 18: -22!"

Now:    "[0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!"

 after that, the progress stopped at :
  "dracut-pre-udev[190]: //lib/dracut/hooks/pre-udev/30-anaconda-modprobe.sh: line32:   224 Segmentation fault    modprobe $m &>/dev/null"

  It's similar to https://bugzilla.redhat.com/show_bug.cgi?id=894360#c0 (fedora 18), but parameter "clearcpuid=156" cannot work for rhel7.
Comment 12 Andrew Jones 2013-01-25 03:33:26 EST
(In reply to comment #11)
> Just tracking with our new tree:
>  Still can not install our new RHEL7.0 tree : RHEL-7.0-20130120.0
>

I was able to install rhel7 from RHEL-7.0-20130117.n.1-Server-x86_64-dvd1.iso on an AMD Opteron 6386 SE machine.

 
>  But the problem seems a little bit different, first, the output is
> different and it can jump over the step :
> Before: "[0.000000] Cannot get hvm parameter 18: -22!"
> 
> Now:    "[0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!"

These message (slightly changed with the later kernel) is harmless.

> 
>  after that, the progress stopped at :
>   "dracut-pre-udev[190]:
> //lib/dracut/hooks/pre-udev/30-anaconda-modprobe.sh: line32:   224
> Segmentation fault    modprobe $m &>/dev/null"
> 
>   It's similar to https://bugzilla.redhat.com/show_bug.cgi?id=894360#c0
> (fedora 18), but parameter "clearcpuid=156" cannot work for rhel7.

This does look like 894360. Why do you say clearcpuid=156 doesn't work? I believe it should. Did you try adding it to the kernel command line after starting the install by pressing tab at the grub prompt? I actually always do text installs, so after starting a graphical install I press tab, then add 'console=ttyS0,115200n8 text' to the kernel command line. Then after pressing enter on the graphical window, I connect to the guest's console from another terminal on the host.

In any case the issue this bug is addressing appears to be resolved. New issues should go to new bugs. Closing as currentrelease.
Comment 13 Wei Shi 2013-01-31 01:19:20 EST
(In reply to comment #12)
> In any case the issue this bug is addressing appears to be resolved. New
> issues should go to new bugs. Closing as currentrelease.

Retest on rhel7.0 20130120.0 with "clearcpuid=156", it is a workaround for this bug.

Note You need to log in before you can comment on or make changes to this bug.