Bug 866735
| Summary: | RHEL7.0 20121012.1 can't be installed on RHEL5.9 kernel-xen host through PXE/DVD method | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Wei Shi <wshi> | ||||
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 7.0 | CC: | bfan, drjones, dyuan, leiwang, lersek, lsu, moli, mrezanin, mzhan, qguan, rwu, ydu, yuzhou | ||||
| Target Milestone: | rc | Keywords: | Regression, TestBlocker | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | xen | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-01-25 08:33:26 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 761591 | ||||||
| Attachments: |
|
||||||
Created attachment 627915 [details]
screenshot through PXE installatoin method
I reproduced this. Running 'xenctx $GUEST | grep rip' a few times I see that we're looping. Doing a 'xenctx --stack-trace $GUEST' and decoding the addrs with the vmlinux from the kernel debuginfo and addr2line yields [<ffffffff81042e76>] (native_safe_halt [arch/x86/include/asm/irqflags.h:50]) [<ffffffff81893ed8>] (_sdata [??:0]) [<ffffffff8101b79f>] (arch_safe_halt [arch/x86/include/asm/paravirt.h:111]) (default_idle [arch/x86/kernel/process.c:495]) [<ffffffff81893fd8>] (_sdata [??:0]) [<ffffffff81991180>] (__stop___verbose [??:0]) [<ffffffff81893f08>] (_sdata [??:0]) [<ffffffff8101c4ce>] (cpu_idle [arch/x86/kernel/process.c:460]) [<ffffffff81893ef8>] (_sdata [??:0]) [<ffffffff819f6000>] (?? [??:0]) [<ffffffff81893f18>] (_sdata [??:0]) [<ffffffff815ac65e>] (rest_init [init/main.c:386]) [<ffffffff81893f68>] (_sdata [??:0]) [<ffffffff819afc1f>] (start_kernel [init/main.c:638]) [<ffffffff819af672>] (unknown_bootoption [init/main.c:251]) [<ffffffff81893f58>] (_sdata [??:0]) [<ffffffff81a002e0>] (real_mode_blob_end [??:0]) [<ffffffff81ac3000>] (idt_table [??:0]) [<ffffffff81893fa8>] (_sdata [??:0]) [<ffffffff81893f88>] (_sdata [??:0]) [<ffffffff819af356>] (x86_64_start_reservations [arch/x86/kernel/head64.c:123]) [<ffffffff81893fa8>] (_sdata [??:0]) [<ffffffff81893fe8>] (_sdata [??:0]) [<ffffffff819af45a>] (x86_64_start_kernel [arch/x86/kernel/head64.c:94]) cpu_idle immediately caught my eye since I know it likes MONITOR/MWAIT, but we mask those on Xen, leading use to default_idle and native_safe_halt, and doesn't seem to work for us. But it used to (3.3.0-0.20.el7). So what changed? $ git log --oneline kernel-3.3.0-0.20.el7..HEAD arch/x86/kernel/process.c | grep -v 'Merge' c767a54 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> 1dcc8d7 x86, fpu: drop the fpu state during thread exit 55ccf3f fork: move the real prepare_to_copy() users to arch_dup_task_struct() c6ae41e x86: replace percpu_xxx funcs with this_cpu_xxx 57da8b9 x86: Avoid double stack traces with show_regs() 38e7c57 x86: Use common threadinfo allocator 85f7f65 x86: Use kick_all_cpus_sync() 19209bb x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly 4504689 x86: Use generic init_task f636520 x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility f05e798 Disintegrate asm/system.h for X86 1361b83 i387: Split up <asm/i387.h> into exported and internal interfaces 4845465 x86/tracing: Denote the power and cpuidle tracepoints as _rcuidle() Nothing too suspicious at first glance. I'll look closer. (In reply to comment #3) > $ git log --oneline kernel-3.3.0-0.20.el7..HEAD arch/x86/kernel/process.c | > grep -v 'Merge' My grep above wasn't specific enough, it should have been -v 'Merge branch', talk about bad luck... Changing it pops out a more suspicious commit commit 90e240142bd31ff10aeda5a280a53153f4eff004 Author: Richard Weinberger <richard> Date: Sun Mar 25 23:00:04 2012 +0200 x86: Merge the x86_32 and x86_64 cpu_idle() functions This patch changes how we exit idle, and even introduces the code in which cpu_idle [arch/x86/kernel/process.c:460] lives. (In reply to comment #4) > This patch changes how we exit idle, and even introduces the code in which > cpu_idle [arch/x86/kernel/process.c:460] lives. Nevermind, this patch actually does a pretty clean move of cpu_idle from process_64.c to process.c, it's probably not the culprit. This is caused by "xen/pv-on-hvm kexec: shutdown watches from old kernel" patch. See upstream commit cb6b6df111e46b9d0f79eb971575fd50555f43f4 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=cb6b6df111e46b9d0f79eb971575fd50555f43f4 ) that fixes the problem. Same problem on our new RHEL7.0 tree : RHEL-7.0-20121120.1 Just tracking with our new tree: Same problem on our new RHEL7.0 tree : RHEL-7.0-20121129.0 *** Bug 882859 has been marked as a duplicate of this bug. *** Just tracking with our new tree: Same problem on our new RHEL7.0 tree : RHEL-7.0-20121217.0 Just tracking with our new tree: Still can not install our new RHEL7.0 tree : RHEL-7.0-20130120.0 But the problem seems a little bit different, first, the output is different and it can jump over the step : Before: "[0.000000] Cannot get hvm parameter 18: -22!" Now: "[0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!" after that, the progress stopped at : "dracut-pre-udev[190]: //lib/dracut/hooks/pre-udev/30-anaconda-modprobe.sh: line32: 224 Segmentation fault modprobe $m &>/dev/null" It's similar to https://bugzilla.redhat.com/show_bug.cgi?id=894360#c0 (fedora 18), but parameter "clearcpuid=156" cannot work for rhel7. (In reply to comment #11) > Just tracking with our new tree: > Still can not install our new RHEL7.0 tree : RHEL-7.0-20130120.0 > I was able to install rhel7 from RHEL-7.0-20130117.n.1-Server-x86_64-dvd1.iso on an AMD Opteron 6386 SE machine. > But the problem seems a little bit different, first, the output is > different and it can jump over the step : > Before: "[0.000000] Cannot get hvm parameter 18: -22!" > > Now: "[0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!" These message (slightly changed with the later kernel) is harmless. > > after that, the progress stopped at : > "dracut-pre-udev[190]: > //lib/dracut/hooks/pre-udev/30-anaconda-modprobe.sh: line32: 224 > Segmentation fault modprobe $m &>/dev/null" > > It's similar to https://bugzilla.redhat.com/show_bug.cgi?id=894360#c0 > (fedora 18), but parameter "clearcpuid=156" cannot work for rhel7. This does look like 894360. Why do you say clearcpuid=156 doesn't work? I believe it should. Did you try adding it to the kernel command line after starting the install by pressing tab at the grub prompt? I actually always do text installs, so after starting a graphical install I press tab, then add 'console=ttyS0,115200n8 text' to the kernel command line. Then after pressing enter on the graphical window, I connect to the guest's console from another terminal on the host. In any case the issue this bug is addressing appears to be resolved. New issues should go to new bugs. Closing as currentrelease. (In reply to comment #12) > In any case the issue this bug is addressing appears to be resolved. New > issues should go to new bugs. Closing as currentrelease. Retest on rhel7.0 20130120.0 with "clearcpuid=156", it is a workaround for this bug. |
Description of problem: RHEL7.0 20121012.1 tree can't be installed from both PXE and DVD method on RHEL5.9 and RHEL5.8 xen host. Version-Release number of selected component (if applicable): Host : RHEL5.9 20121010.1 x86_64 AMD 2.6.18-343.el5xen xen-3.0.3-142.el5 Guest: RHEL7.0 20121012.1(kernel-3.6.0-0.28.el7) HVM x86_64 How reproducible: 100% Steps to Reproduce: DVD method: 1. make RHEL7.0 20121012.1 DVD installation file 2. launch guest with a prepared DVD file and prepared raw image (see attachment xm-cdrom.conf) PXE method: 1. make RHEL7.0 20121012.1 installation PXE profile (see attachment RHEL-7.0-Server-x64-20121012.1.distro, ks-RHEL7-Server-x64-20121012.1-manual-2012-10-15.cfg, RHEL7.0-Server-x64-20121012.1-manual-2012-10-15.profile) 2. launch guest boot from network Actual results: For DVD method, stop after the install grub screen, can be reproduced when installing by using boot.iso. It failed at the very beginning of the installation phase: [0.000000] Cannot get hvm parameter 18: -22! For PXE method, stop before launch anaconda, see attachment(PXE.png) Expected results: Guest installation progress should lanuched normal. Additional info: Tested PASS with kernel-3.3.0-0.20.el7, RHEL7.0-20120711.2 on RHEL5.8 release Tested PASS with kernel-3.3.0-0.20.el7, RHEL7.0-20120711.2 on RHEL5.9 20121010.1 Tested Fail with kernel-3.6.0-0.28.el7, RHEL7.0 20121012.1 on RHEL5.8 release. Tested Fail with kernel-3.6.0-0.28.el7, RHEL7.0 20121012.1 on RHEL5.9 20121010.1.