Bug 918239
Summary: | kernel-2.6.32-358.0.1 doesn't boot at virtual machine on Xen Cloud Platform | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Constantin Dunayev <constantin_> |
Component: | kernel | Assignee: | Andrew Jones <drjones> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 6.4 | CC: | andrew.cooper3, br, bsarathy, cazhang, dhoward, drjones, htrippaers, jfock, lcui, leiwang, lmiksik, nicolas.breuer, pasik, pasteur, vkuznets, wshi |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | xen | ||
Fixed In Version: | kernel-2.6.32-375.el6 | Doc Type: | Bug Fix |
Doc Text: |
When the Red Hat Enterprise Linux 6 kernel runs as a virtual machine, it performs boot-time detection of the hypervisor in order to enable hypervisor-specific optimizations. Red Hat Enterprise Linux 6.4 introduces detection and optimization for the Microsoft Hyper-V hypervisor. Previously Hyper-V was detected first, however, because some Xen hypervisors can attempt to emulate Hyper-V, this could lead to a boot failure when that emulation was not exact. A patch has been applied to ensure that the attempt to detect Xen is always done before Hyper-V, resolving this issue.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-11-21 16:45:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 653816, 923204 |
Description
Constantin Dunayev
2013-03-05 18:44:27 UTC
tested with 3 VMs (HVM, set up from "RHEL 6 i686" template, initial distribution - CentOS 6.2) 2.6.32-279 works well with EL 6.4 packages Oracle's kernel-uek-2.6.39-400 from OL 6.4 also works Hmm, we need more information. Is there no way to force a crash and get a core? Is all logging enabled to the console (add ignore_loglevel to the kernel cmdline)? The other option is to bisect further by trying more working/not-working kernels until we get down to a single version. At that point we can check the git logs to make some guesses. There is no way to get any log, kernel 2.6.32-358 does not start at all after boot... But hypervisor admin tool shows 100% virtual CPU load 2.6.32-220 all versions worked 2.6.32-279 all versions worked 2.6.32-358 does not work (tested CentOS & Oracle builds) 2.6.32-358.0.1 does not work (tested CentOS & Oracle builds) if You can give me i686 binaries of all versions between -279 and -358, i will test them tested on another server hardware: -358.0.1 x86_64 (CentOS) does not boot on XCP 1.6; virtual CPU shows 3% load I'm ready to test some i686 binary kernel packages (probably with Xen related changes) between -279 and -358 tested Scientific Linux and Springdale builds of -358.0.1 - the same result - does not start Ok, so these are all HVM guests. So you don't get *any* messages from the guest kernel? Did you try setting up a serial console for the domU kernel by editing the kernel cmdline options from grub (ignore_loglevel console=ttyS0,115200) ? and obviously remove any "quiet" options. I have reproduce this issue with the RHEL 6.4 netboot ISO on XenServer trunk Symptioms are: Xentop indicates that the VCPU is spinning at 100% The last message on the console is "Switching to clocksource hyperv_clocksource" which sounds disasteriously wrong on a Xen system Interestingly, at the point of failrue there were two GPFs in Xen's vmx_msr_read_intercept which are caught by the ASM FIXTABLE. I will apply some more debugging to find out which MSR is attempting to be used. From the serial log: (XEN) [2013-03-08 15:04:41] traps.c:3175: GPF (0000): ffff82c4801d8a68 -> ffff82c480232c9e (XEN) [2013-03-08 15:04:41] vmx_msr_read_intercept+0x2db/0x370 -> vmac+0x757/0xa2a (XEN) [2013-03-08 15:04:41] Fault reading msr 0x40000020 (XEN) [2013-03-08 15:04:41] traps.c:3175: GPF (0000): ffff82c4801d8a68 -> ffff82c480232c9e (XEN) [2013-03-08 15:04:41] vmx_msr_read_intercept+0x2db/0x370 -> vmac+0x757/0xa2a (XEN) [2013-03-08 15:04:41] Fault reading msr 0x40000020 The faulting MSRs are both 0x40000020 which at a guest would be to do with the virdian extentions Indeed, xe vm-param-set vm=$VM platform:viridian=false works around the issue. I have set xe vm-param-set uuid=...... platform:viridian=false and VM starts normally here is copypaste from it's console [root@vm ~]# uname -r 2.6.32-358.0.1.el6.i686 Thanks to Andrew Cooper Thanks Andrew! it looks like we need to backport commit 24a42bae6852d27ae569757f5415c91538e6a255 Author: Anupam Chanda <achanda> Date: Fri Jul 8 11:42:50 2011 -0700 x86, hyper: Change hypervisor detection order The patch I pointed out in comment 15 would work fine, but it would also require we backport the use of the hypervisor_x86 interface for xen. We have that interface for vmware and hyperv already, but we never got xen into it - as it didn't exist at the time rhel6 xen hvm init was worked out. The patch to backport all this would still be quite simple, and get us closer to matching upstream, but there's some risk. The patch would move hvm init up earlier in setup, which for RHEL hasn't been tested. Anyway, I now see another recent patch related to this issue commit db34bbb767bdfa1ebed7214b876fe01c5b7ee457 Author: K. Y. Srinivasan <kys> Date: Sun Feb 3 17:22:38 2013 -0800 X86: Add a check to catch Xen emulation of Hyper-V This patch is meant for kernels that don't have xen support compiled in (and thus the hypervisor_x86 ordering doesn't help), however it will also resolve this bug and it has zero risk. I'll go with this one so we can more easily get the patch into z-stream too. A patched rhel6 kernel has been verified by the reporter
> Hi, Andrew!
> I have tested your kernel just now.
> It works. Thanks!
>
> copypaste from VM's console:
>
> [root@nagios ~]# uname -r
> 2.6.32-358.el6_bz918239_hyperv_checkxen.i686
>
> platform:viridian is set to true in vm parameters.
Hi, Constantin: Could you help us to verify this bug when the status of this bug changed to ON_QA, using the fixed package marked on "Fixed In Version" field? Yes, of course Hi guys, i've got same problem as Constantin. I've upgraded CentOS 6.3 to 6.4 (kernel *-358) on XenServer VM. Got black screen just after booting kernel begins and virtual CPU goes to 100% load. Old kernel works fine. Andrew, patch from Comment #13 works for me, thanks. This bug does not affect Centos/RHEL 6.4 PVMs. Hello Any updates about the fix? Thanks *** Bug 888702 has been marked as a duplicate of this bug. *** kernel-2.6.32-358.6.1 (CentOS) works ok. Bug fixed. I confirmed it's fixed Patch(es) available on kernel-2.6.32-375.el6 Hallo I have the same Problem with RHEL6.4 and Kernel 2.6.32-279.el6.x86_64 on Xenserver 6.1 After a couple of time my server freeze Only with a hard rebooot i can start my VM again (In reply to comment #35) > Hallo > I have the same Problem with RHEL6.4 and Kernel 2.6.32-279.el6.x86_64 on > Xenserver 6.1 After a couple of time my server freeze > Only with a hard rebooot i can start my VM again You need kernel 2.6.32-358.5.1.el6 or later. Hallo Thank you for reply I am now using Kernel 2.6.32-356 My system works more then one day then my server freeze my memory is more then 95 % used can you help my (In reply to Fock Johann from comment #37) > Hallo > Thank you for reply > > > I am now using Kernel 2.6.32-356 > > My system works more then one day then my server freeze > > my memory is more then 95 % used > > > can you help my This is a different issue, and is most likely not a xen nor kernel related problem. You can get support to help you determine what application(s) are consuming all memory, and why, by contacting your usual customer support representative. I reproduced this bug with: Host: Xen 4.3.0 on 3.11.0-0.rc6.git4.1.fc21.x86_64 Guest: RHEL6.4 with kernel-2.6.32-358.el6.i686 And config file including `viridian = 1` After `xl create`, the guest hangs. When checking CPU using `xl top`, the guest has a 100% CPU usage. It could be verified that the bug has been fixed in kernel 2.6.32-375.el6.i686: Host: Xen 4.3.0 on 3.11.0-0.rc6.git4.1.fc21.x86_64 Guest: RHEL6.4 with kernel-2.6.32-375.el6.i686 And config file including `viridian = 1` The guest boots normally in these settings. Bug also verified by the reporter per c30 & c31 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1645.html |