Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 866735

Summary: RHEL7.0 20121012.1 can't be installed on RHEL5.9 kernel-xen host through PXE/DVD method
Product: Red Hat Enterprise Linux 7 Reporter: Wei Shi <wshi>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0CC: bfan, drjones, dyuan, leiwang, lersek, lsu, moli, mrezanin, mzhan, qguan, rwu, ydu, yuzhou
Target Milestone: rcKeywords: Regression, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: xen
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-25 08:33:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 761591    
Attachments:
Description Flags
screenshot through PXE installatoin method none

Description Wei Shi 2012-10-16 02:45:50 UTC
Description of problem:
RHEL7.0 20121012.1 tree can't be installed from both PXE and DVD method on RHEL5.9 and RHEL5.8 xen host.

Version-Release number of selected component (if applicable):
Host : RHEL5.9 20121010.1 x86_64 AMD
       2.6.18-343.el5xen
       xen-3.0.3-142.el5

Guest: RHEL7.0 20121012.1(kernel-3.6.0-0.28.el7) HVM x86_64
       

How reproducible:
100%

Steps to Reproduce:
DVD method:
1. make RHEL7.0 20121012.1 DVD installation file
2. launch guest with a prepared DVD file and prepared raw image (see attachment xm-cdrom.conf)

PXE method:
1. make RHEL7.0 20121012.1 installation PXE profile (see attachment RHEL-7.0-Server-x64-20121012.1.distro, ks-RHEL7-Server-x64-20121012.1-manual-2012-10-15.cfg, RHEL7.0-Server-x64-20121012.1-manual-2012-10-15.profile)
2. launch guest boot from network

Actual results:
  For DVD method, stop after the install grub screen, can be reproduced when installing by using boot.iso. It failed at the very beginning of the installation phase:
    [0.000000] Cannot get hvm parameter 18: -22!

  For PXE method, stop before launch anaconda, see attachment(PXE.png)

Expected results:
  Guest installation progress should lanuched normal.

Additional info:
  Tested PASS with kernel-3.3.0-0.20.el7, RHEL7.0-20120711.2 on RHEL5.8 release
  Tested PASS with kernel-3.3.0-0.20.el7, RHEL7.0-20120711.2 on RHEL5.9 20121010.1
  Tested Fail with kernel-3.6.0-0.28.el7, RHEL7.0 20121012.1 on RHEL5.8 release.
  Tested Fail with kernel-3.6.0-0.28.el7, RHEL7.0 20121012.1 on RHEL5.9 20121010.1.

Comment 1 Wei Shi 2012-10-16 02:47:49 UTC
Created attachment 627915 [details]
screenshot through PXE installatoin method

Comment 3 Andrew Jones 2012-10-17 10:48:34 UTC
I reproduced this.

Running 'xenctx $GUEST | grep rip' a few times I see that we're looping. Doing a 'xenctx --stack-trace $GUEST' and decoding the addrs with the vmlinux from the kernel debuginfo and addr2line yields

[<ffffffff81042e76>] (native_safe_halt [arch/x86/include/asm/irqflags.h:50])
[<ffffffff81893ed8>] (_sdata [??:0])
[<ffffffff8101b79f>] (arch_safe_halt [arch/x86/include/asm/paravirt.h:111]) (default_idle [arch/x86/kernel/process.c:495])
[<ffffffff81893fd8>] (_sdata [??:0])
[<ffffffff81991180>] (__stop___verbose [??:0])
[<ffffffff81893f08>] (_sdata [??:0])
[<ffffffff8101c4ce>] (cpu_idle [arch/x86/kernel/process.c:460])
[<ffffffff81893ef8>] (_sdata [??:0])
[<ffffffff819f6000>] (?? [??:0])
[<ffffffff81893f18>] (_sdata [??:0])
[<ffffffff815ac65e>] (rest_init [init/main.c:386])
[<ffffffff81893f68>] (_sdata [??:0])
[<ffffffff819afc1f>] (start_kernel [init/main.c:638])
[<ffffffff819af672>] (unknown_bootoption [init/main.c:251])
[<ffffffff81893f58>] (_sdata [??:0])
[<ffffffff81a002e0>] (real_mode_blob_end [??:0])
[<ffffffff81ac3000>] (idt_table [??:0])
[<ffffffff81893fa8>] (_sdata [??:0])
[<ffffffff81893f88>] (_sdata [??:0])
[<ffffffff819af356>] (x86_64_start_reservations [arch/x86/kernel/head64.c:123])
[<ffffffff81893fa8>] (_sdata [??:0])
[<ffffffff81893fe8>] (_sdata [??:0])
[<ffffffff819af45a>] (x86_64_start_kernel [arch/x86/kernel/head64.c:94])

cpu_idle immediately caught my eye since I know it likes MONITOR/MWAIT, but we mask those on Xen, leading use to default_idle and native_safe_halt, and doesn't seem to work for us. But it used to (3.3.0-0.20.el7). So what changed?

$ git log --oneline kernel-3.3.0-0.20.el7..HEAD arch/x86/kernel/process.c | grep -v 'Merge'
c767a54 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
1dcc8d7 x86, fpu: drop the fpu state during thread exit
55ccf3f fork: move the real prepare_to_copy() users to arch_dup_task_struct()
c6ae41e x86: replace percpu_xxx funcs with this_cpu_xxx
57da8b9 x86: Avoid double stack traces with show_regs()
38e7c57 x86: Use common threadinfo allocator
85f7f65 x86: Use kick_all_cpus_sync()
19209bb x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly
4504689 x86: Use generic init_task
f636520 x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility
f05e798 Disintegrate asm/system.h for X86
1361b83 i387: Split up <asm/i387.h> into exported and internal interfaces
4845465 x86/tracing: Denote the power and cpuidle tracepoints as _rcuidle()

Nothing too suspicious at first glance. I'll look closer.

Comment 4 Andrew Jones 2012-10-17 12:57:47 UTC
(In reply to comment #3)
> $ git log --oneline kernel-3.3.0-0.20.el7..HEAD arch/x86/kernel/process.c |
> grep -v 'Merge'

My grep above wasn't specific enough, it should have been -v 'Merge branch', talk about bad luck... Changing it pops out a more suspicious commit

commit 90e240142bd31ff10aeda5a280a53153f4eff004
Author: Richard Weinberger <richard>
Date:   Sun Mar 25 23:00:04 2012 +0200

    x86: Merge the x86_32 and x86_64 cpu_idle() functions

This patch changes how we exit idle, and even introduces the code in which
cpu_idle [arch/x86/kernel/process.c:460] lives.

Comment 5 Andrew Jones 2012-10-17 13:20:25 UTC
(In reply to comment #4)
> This patch changes how we exit idle, and even introduces the code in which
> cpu_idle [arch/x86/kernel/process.c:460] lives.

Nevermind, this patch actually does a pretty clean move of cpu_idle from process_64.c to process.c, it's probably not the culprit.

Comment 6 Miroslav Rezanina 2012-10-19 13:23:37 UTC
This is caused by "xen/pv-on-hvm kexec: shutdown watches from old kernel" patch. See upstream commit cb6b6df111e46b9d0f79eb971575fd50555f43f4 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=cb6b6df111e46b9d0f79eb971575fd50555f43f4 ) that fixes the problem.

Comment 7 Wei Shi 2012-11-28 06:19:31 UTC
Same problem on our new RHEL7.0 tree : RHEL-7.0-20121120.1

Comment 8 Wei Shi 2012-11-30 02:27:54 UTC
Just tracking with our new tree:
  Same problem on our new RHEL7.0 tree : RHEL-7.0-20121129.0

Comment 9 Miroslav Rezanina 2012-12-03 09:10:28 UTC
*** Bug 882859 has been marked as a duplicate of this bug. ***

Comment 10 Wei Shi 2012-12-18 01:43:55 UTC
Just tracking with our new tree:
  Same problem on our new RHEL7.0 tree : RHEL-7.0-20121217.0

Comment 11 Wei Shi 2013-01-25 02:13:53 UTC
Just tracking with our new tree:
 Still can not install our new RHEL7.0 tree : RHEL-7.0-20130120.0

 But the problem seems a little bit different, first, the output is different and it can jump over the step :
Before: "[0.000000] Cannot get hvm parameter 18: -22!"

Now:    "[0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!"

 after that, the progress stopped at :
  "dracut-pre-udev[190]: //lib/dracut/hooks/pre-udev/30-anaconda-modprobe.sh: line32:   224 Segmentation fault    modprobe $m &>/dev/null"

  It's similar to https://bugzilla.redhat.com/show_bug.cgi?id=894360#c0 (fedora 18), but parameter "clearcpuid=156" cannot work for rhel7.

Comment 12 Andrew Jones 2013-01-25 08:33:26 UTC
(In reply to comment #11)
> Just tracking with our new tree:
>  Still can not install our new RHEL7.0 tree : RHEL-7.0-20130120.0
>

I was able to install rhel7 from RHEL-7.0-20130117.n.1-Server-x86_64-dvd1.iso on an AMD Opteron 6386 SE machine.

 
>  But the problem seems a little bit different, first, the output is
> different and it can jump over the step :
> Before: "[0.000000] Cannot get hvm parameter 18: -22!"
> 
> Now:    "[0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!"

These message (slightly changed with the later kernel) is harmless.

> 
>  after that, the progress stopped at :
>   "dracut-pre-udev[190]:
> //lib/dracut/hooks/pre-udev/30-anaconda-modprobe.sh: line32:   224
> Segmentation fault    modprobe $m &>/dev/null"
> 
>   It's similar to https://bugzilla.redhat.com/show_bug.cgi?id=894360#c0
> (fedora 18), but parameter "clearcpuid=156" cannot work for rhel7.

This does look like 894360. Why do you say clearcpuid=156 doesn't work? I believe it should. Did you try adding it to the kernel command line after starting the install by pressing tab at the grub prompt? I actually always do text installs, so after starting a graphical install I press tab, then add 'console=ttyS0,115200n8 text' to the kernel command line. Then after pressing enter on the graphical window, I connect to the guest's console from another terminal on the host.

In any case the issue this bug is addressing appears to be resolved. New issues should go to new bugs. Closing as currentrelease.

Comment 13 Wei Shi 2013-01-31 06:19:20 UTC
(In reply to comment #12)
> In any case the issue this bug is addressing appears to be resolved. New
> issues should go to new bugs. Closing as currentrelease.

Retest on rhel7.0 20130120.0 with "clearcpuid=156", it is a workaround for this bug.