Bug 918239 - kernel-2.6.32-358.0.1 doesn't boot at virtual machine on Xen Cloud Platform
Summary: kernel-2.6.32-358.0.1 doesn't boot at virtual machine on Xen Cloud Platform
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.4
Hardware: i686
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Andrew Jones
QA Contact: Virtualization Bugs
URL:
Whiteboard: xen
: 888702 (view as bug list)
Depends On:
Blocks: 653816 923204
TreeView+ depends on / blocked
 
Reported: 2013-03-05 18:44 UTC by Constantin Dunayev
Modified: 2013-11-21 16:45 UTC (History)
16 users (show)

Fixed In Version: kernel-2.6.32-375.el6
Doc Type: Bug Fix
Doc Text:
When the Red Hat Enterprise Linux 6 kernel runs as a virtual machine, it performs boot-time detection of the hypervisor in order to enable hypervisor-specific optimizations. Red Hat Enterprise Linux 6.4 introduces detection and optimization for the Microsoft Hyper-V hypervisor. Previously Hyper-V was detected first, however, because some Xen hypervisors can attempt to emulate Hyper-V, this could lead to a boot failure when that emulation was not exact. A patch has been applied to ensure that the attempt to detect Xen is always done before Hyper-V, resolving this issue.
Clone Of:
Environment:
Last Closed: 2013-11-21 16:45:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:1645 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6 kernel update 2013-11-20 22:04:18 UTC

Description Constantin Dunayev 2013-03-05 18:44:27 UTC
Description of problem: after upgrade of CentOS 6.3 virtual server at Xen Cloud Platform 1.6 (probably also Citrix Xenserver 6.1) with 6.4 EL packages, new kernel (2.6.32-358) does not start (tested CentOS-built kernel and one from Oracle Linux repo). Old versions (2.6.32-279, 2.6.32-220) worked perfectly
Maybe, this is not CentOS or Oracle bug...

Version-Release number of selected component (if applicable):
2.6.32-358.0.1 (CentOS & Oracle Linux)

How reproducible:
reboot VM with new kernel

Steps to Reproduce:
1. install 2.6.32-358 kernel
2. reboot
3. select new kernel from grub  
  
Actual results:
virtual machine hangs immediately after boot from grub, XCP shows 100% virtual CPU load

Expected results:
normal boot

Additional info:
kernel 2.6.32-358.0.1 (CentOS) works well on bare hardware

Comment 2 Constantin Dunayev 2013-03-05 19:43:03 UTC
tested with 3 VMs (HVM, set up from "RHEL 6 i686" template, initial distribution - CentOS 6.2)
2.6.32-279 works well with EL 6.4 packages
Oracle's kernel-uek-2.6.39-400 from OL 6.4 also works

Comment 3 Andrew Jones 2013-03-05 19:58:41 UTC
Hmm, we need more information. Is there no way to force a crash and get a core? Is all logging enabled to the console (add ignore_loglevel to the kernel cmdline)? The other option is to bisect further by trying more working/not-working kernels until we get down to a single version. At that point we can check the git logs to make some guesses.

Comment 4 Constantin Dunayev 2013-03-06 06:44:08 UTC
There is no way to get any log, kernel 2.6.32-358 does not start at all after boot... But hypervisor admin tool shows 100% virtual CPU load

Comment 5 Constantin Dunayev 2013-03-06 06:46:30 UTC
2.6.32-220 all versions worked
2.6.32-279 all versions worked

Comment 6 Constantin Dunayev 2013-03-06 06:56:07 UTC
2.6.32-358 does not work (tested CentOS & Oracle builds)
2.6.32-358.0.1 does not work (tested CentOS & Oracle builds)

Comment 7 Constantin Dunayev 2013-03-06 06:59:51 UTC
if You can give me i686 binaries of all versions between -279 and -358, i will test them

Comment 8 Constantin Dunayev 2013-03-07 06:40:49 UTC
tested on another server hardware: -358.0.1 x86_64 (CentOS) does not boot on XCP 1.6; virtual CPU shows 3% load

Comment 9 Constantin Dunayev 2013-03-07 06:51:34 UTC
I'm ready to test some i686 binary kernel packages (probably with Xen related changes) between -279 and -358

Comment 10 Constantin Dunayev 2013-03-08 11:03:17 UTC
tested Scientific Linux and Springdale builds of -358.0.1 - the same result - does not start

Comment 11 Pasi Karkkainen 2013-03-08 12:59:23 UTC
Ok, so these are all HVM guests. So you don't get *any* messages from the guest kernel? 

Did you try setting up a serial console for the domU kernel by editing the kernel cmdline options from grub (ignore_loglevel console=ttyS0,115200) ? and obviously remove any "quiet" options.

Comment 12 Andrew Cooper 2013-03-08 14:32:50 UTC
I have reproduce this issue with the RHEL 6.4 netboot ISO on XenServer trunk

Symptioms are:

Xentop indicates that the VCPU is spinning at 100%

The last message on the console is

"Switching to clocksource hyperv_clocksource"

which sounds disasteriously wrong on a Xen system

Interestingly, at the point of failrue there were two GPFs in Xen's vmx_msr_read_intercept which are caught by the ASM FIXTABLE.  I will apply some more debugging to find out which MSR is attempting to be used.

Comment 13 Andrew Cooper 2013-03-08 15:17:21 UTC
From the serial log:

(XEN) [2013-03-08 15:04:41] traps.c:3175: GPF (0000): ffff82c4801d8a68 -> ffff82c480232c9e
(XEN) [2013-03-08 15:04:41]      vmx_msr_read_intercept+0x2db/0x370 ->  vmac+0x757/0xa2a
(XEN) [2013-03-08 15:04:41] Fault reading msr 0x40000020
(XEN) [2013-03-08 15:04:41] traps.c:3175: GPF (0000): ffff82c4801d8a68 -> ffff82c480232c9e
(XEN) [2013-03-08 15:04:41]      vmx_msr_read_intercept+0x2db/0x370 ->  vmac+0x757/0xa2a
(XEN) [2013-03-08 15:04:41] Fault reading msr 0x40000020

The faulting MSRs are both 0x40000020 which at a guest would be to do with the virdian extentions

Indeed,

xe vm-param-set vm=$VM platform:viridian=false

works around the issue.

Comment 14 Constantin Dunayev 2013-03-08 16:03:47 UTC
I have set 

xe vm-param-set uuid=...... platform:viridian=false
and VM starts normally

here is copypaste from it's console

[root@vm ~]# uname -r
2.6.32-358.0.1.el6.i686
Thanks to Andrew Cooper

Comment 15 Andrew Jones 2013-03-11 10:05:24 UTC
Thanks Andrew!

it looks like we need to backport

commit 24a42bae6852d27ae569757f5415c91538e6a255
Author: Anupam Chanda <achanda@nicira.com>
Date:   Fri Jul 8 11:42:50 2011 -0700

    x86, hyper: Change hypervisor detection order

Comment 17 Andrew Jones 2013-03-11 13:06:27 UTC
The patch I pointed out in comment 15 would work fine, but it would also require we backport the use of the hypervisor_x86 interface for xen. We have that interface for vmware and hyperv already, but we never got xen into it - as it didn't exist at the time rhel6 xen hvm init was worked out. The patch to backport all this would still be quite simple, and get us closer to matching upstream, but there's some risk. The patch would move hvm init up earlier in setup, which for RHEL hasn't been tested. Anyway, I now see another recent patch related to this issue

commit db34bbb767bdfa1ebed7214b876fe01c5b7ee457
Author: K. Y. Srinivasan <kys@microsoft.com>
Date:   Sun Feb 3 17:22:38 2013 -0800

    X86: Add a check to catch Xen emulation of Hyper-V

This patch is meant for kernels that don't have xen support compiled in (and thus the hypervisor_x86 ordering doesn't help), however it will also resolve this bug and it has zero risk. I'll go with this one so we can more easily get the patch into z-stream too.

Comment 18 Andrew Jones 2013-03-12 08:50:31 UTC
A patched rhel6 kernel has been verified by the reporter

> Hi, Andrew!
> I have tested your kernel just now.
> It works. Thanks!
> 
> copypaste from VM's console:
> 
> [root@nagios ~]# uname -r
> 2.6.32-358.el6_bz918239_hyperv_checkxen.i686
> 
> platform:viridian is set to true in vm parameters.

Comment 21 Wei Shi 2013-03-12 10:09:40 UTC
Hi, Constantin:
  Could you help us to verify this bug when the status of this bug changed to ON_QA, using the fixed package marked on "Fixed In Version" field?

Comment 22 Constantin Dunayev 2013-03-12 10:17:14 UTC
Yes, of course

Comment 23 Eryk 2013-03-14 21:22:01 UTC
Hi guys,
i've got same problem as Constantin. I've upgraded CentOS 6.3 to 6.4 (kernel *-358) on XenServer VM. Got black screen just after booting kernel begins and virtual CPU goes to 100% load. Old kernel works fine.
Andrew, patch from Comment #13 works for me, thanks.

Comment 24 Constantin Dunayev 2013-03-18 08:45:07 UTC
This bug does not affect Centos/RHEL 6.4 PVMs.

Comment 28 belcenter 2013-03-28 15:08:05 UTC
Hello

Any updates about the fix?

Thanks

Comment 29 Andrew Jones 2013-04-04 13:48:59 UTC
*** Bug 888702 has been marked as a duplicate of this bug. ***

Comment 30 Constantin Dunayev 2013-04-24 09:58:07 UTC
kernel-2.6.32-358.6.1 (CentOS) works ok. Bug fixed.

Comment 31 belcenter 2013-04-24 10:23:25 UTC
I confirmed it's fixed

Comment 32 Jarod Wilson 2013-05-08 20:31:31 UTC
Patch(es) available on kernel-2.6.32-375.el6

Comment 35 Fock Johann 2013-05-16 07:40:28 UTC
Hallo
I  have the same Problem with RHEL6.4 and Kernel 2.6.32-279.el6.x86_64 on Xenserver 6.1 After a couple of time my server freeze
Only with a hard rebooot i can start my VM again

Comment 36 Andrew Jones 2013-05-16 07:48:32 UTC
(In reply to comment #35)
> Hallo
> I  have the same Problem with RHEL6.4 and Kernel 2.6.32-279.el6.x86_64 on
> Xenserver 6.1 After a couple of time my server freeze
> Only with a hard rebooot i can start my VM again

You need kernel 2.6.32-358.5.1.el6 or later.

Comment 37 Fock Johann 2013-05-18 20:34:48 UTC
Hallo
Thank you for reply


I am now using  Kernel 2.6.32-356 

My system works more then one day then my server freeze

my memory is more then 95 % used


can you help my

Comment 38 Andrew Jones 2013-05-20 06:20:35 UTC
(In reply to Fock Johann from comment #37)
> Hallo
> Thank you for reply
> 
> 
> I am now using  Kernel 2.6.32-356 
> 
> My system works more then one day then my server freeze
> 
> my memory is more then 95 % used
> 
> 
> can you help my

This is a different issue, and is most likely not a xen nor kernel related problem. You can get support to help you determine what application(s) are consuming all memory, and why, by contacting your usual customer support representative.

Comment 40 Can Zhang 2013-08-28 02:42:25 UTC
I reproduced this bug with:

Host: Xen 4.3.0 on 3.11.0-0.rc6.git4.1.fc21.x86_64
Guest: RHEL6.4 with kernel-2.6.32-358.el6.i686
And config file including `viridian = 1`

After `xl create`, the guest hangs. When checking CPU using `xl top`, the guest has a 100% CPU usage.

It could be verified that the bug has been fixed in kernel 2.6.32-375.el6.i686:

Host: Xen 4.3.0 on 3.11.0-0.rc6.git4.1.fc21.x86_64
Guest: RHEL6.4 with kernel-2.6.32-375.el6.i686
And config file including `viridian = 1`

The guest boots normally in these settings.

Comment 41 Wei Shi 2013-08-28 06:55:01 UTC
Bug also verified by the reporter per c30 & c31

Comment 42 errata-xmlrpc 2013-11-21 16:45:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1645.html


Note You need to log in before you can comment on or make changes to this bug.