Bug 918239 - kernel-2.6.32-358.0.1 doesn't boot at virtual machine on Xen Cloud Platform
kernel-2.6.32-358.0.1 doesn't boot at virtual machine on Xen Cloud Platform
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
i686 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Andrew Jones
Virtualization Bugs
xen
: ZStream
: 888702 (view as bug list)
Depends On:
Blocks: 653816 923204
  Show dependency treegraph
 
Reported: 2013-03-05 13:44 EST by Constantin Dunayev
Modified: 2013-11-21 11:45 EST (History)
16 users (show)

See Also:
Fixed In Version: kernel-2.6.32-375.el6
Doc Type: Bug Fix
Doc Text:
When the Red Hat Enterprise Linux 6 kernel runs as a virtual machine, it performs boot-time detection of the hypervisor in order to enable hypervisor-specific optimizations. Red Hat Enterprise Linux 6.4 introduces detection and optimization for the Microsoft Hyper-V hypervisor. Previously Hyper-V was detected first, however, because some Xen hypervisors can attempt to emulate Hyper-V, this could lead to a boot failure when that emulation was not exact. A patch has been applied to ensure that the attempt to detect Xen is always done before Hyper-V, resolving this issue.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 11:45:11 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Constantin Dunayev 2013-03-05 13:44:27 EST
Description of problem: after upgrade of CentOS 6.3 virtual server at Xen Cloud Platform 1.6 (probably also Citrix Xenserver 6.1) with 6.4 EL packages, new kernel (2.6.32-358) does not start (tested CentOS-built kernel and one from Oracle Linux repo). Old versions (2.6.32-279, 2.6.32-220) worked perfectly
Maybe, this is not CentOS or Oracle bug...

Version-Release number of selected component (if applicable):
2.6.32-358.0.1 (CentOS & Oracle Linux)

How reproducible:
reboot VM with new kernel

Steps to Reproduce:
1. install 2.6.32-358 kernel
2. reboot
3. select new kernel from grub  
  
Actual results:
virtual machine hangs immediately after boot from grub, XCP shows 100% virtual CPU load

Expected results:
normal boot

Additional info:
kernel 2.6.32-358.0.1 (CentOS) works well on bare hardware
Comment 2 Constantin Dunayev 2013-03-05 14:43:03 EST
tested with 3 VMs (HVM, set up from "RHEL 6 i686" template, initial distribution - CentOS 6.2)
2.6.32-279 works well with EL 6.4 packages
Oracle's kernel-uek-2.6.39-400 from OL 6.4 also works
Comment 3 Andrew Jones 2013-03-05 14:58:41 EST
Hmm, we need more information. Is there no way to force a crash and get a core? Is all logging enabled to the console (add ignore_loglevel to the kernel cmdline)? The other option is to bisect further by trying more working/not-working kernels until we get down to a single version. At that point we can check the git logs to make some guesses.
Comment 4 Constantin Dunayev 2013-03-06 01:44:08 EST
There is no way to get any log, kernel 2.6.32-358 does not start at all after boot... But hypervisor admin tool shows 100% virtual CPU load
Comment 5 Constantin Dunayev 2013-03-06 01:46:30 EST
2.6.32-220 all versions worked
2.6.32-279 all versions worked
Comment 6 Constantin Dunayev 2013-03-06 01:56:07 EST
2.6.32-358 does not work (tested CentOS & Oracle builds)
2.6.32-358.0.1 does not work (tested CentOS & Oracle builds)
Comment 7 Constantin Dunayev 2013-03-06 01:59:51 EST
if You can give me i686 binaries of all versions between -279 and -358, i will test them
Comment 8 Constantin Dunayev 2013-03-07 01:40:49 EST
tested on another server hardware: -358.0.1 x86_64 (CentOS) does not boot on XCP 1.6; virtual CPU shows 3% load
Comment 9 Constantin Dunayev 2013-03-07 01:51:34 EST
I'm ready to test some i686 binary kernel packages (probably with Xen related changes) between -279 and -358
Comment 10 Constantin Dunayev 2013-03-08 06:03:17 EST
tested Scientific Linux and Springdale builds of -358.0.1 - the same result - does not start
Comment 11 Pasi Karkkainen 2013-03-08 07:59:23 EST
Ok, so these are all HVM guests. So you don't get *any* messages from the guest kernel? 

Did you try setting up a serial console for the domU kernel by editing the kernel cmdline options from grub (ignore_loglevel console=ttyS0,115200) ? and obviously remove any "quiet" options.
Comment 12 Andrew Cooper 2013-03-08 09:32:50 EST
I have reproduce this issue with the RHEL 6.4 netboot ISO on XenServer trunk

Symptioms are:

Xentop indicates that the VCPU is spinning at 100%

The last message on the console is

"Switching to clocksource hyperv_clocksource"

which sounds disasteriously wrong on a Xen system

Interestingly, at the point of failrue there were two GPFs in Xen's vmx_msr_read_intercept which are caught by the ASM FIXTABLE.  I will apply some more debugging to find out which MSR is attempting to be used.
Comment 13 Andrew Cooper 2013-03-08 10:17:21 EST
From the serial log:

(XEN) [2013-03-08 15:04:41] traps.c:3175: GPF (0000): ffff82c4801d8a68 -> ffff82c480232c9e
(XEN) [2013-03-08 15:04:41]      vmx_msr_read_intercept+0x2db/0x370 ->  vmac+0x757/0xa2a
(XEN) [2013-03-08 15:04:41] Fault reading msr 0x40000020
(XEN) [2013-03-08 15:04:41] traps.c:3175: GPF (0000): ffff82c4801d8a68 -> ffff82c480232c9e
(XEN) [2013-03-08 15:04:41]      vmx_msr_read_intercept+0x2db/0x370 ->  vmac+0x757/0xa2a
(XEN) [2013-03-08 15:04:41] Fault reading msr 0x40000020

The faulting MSRs are both 0x40000020 which at a guest would be to do with the virdian extentions

Indeed,

xe vm-param-set vm=$VM platform:viridian=false

works around the issue.
Comment 14 Constantin Dunayev 2013-03-08 11:03:47 EST
I have set 

xe vm-param-set uuid=...... platform:viridian=false
and VM starts normally

here is copypaste from it's console

[root@vm ~]# uname -r
2.6.32-358.0.1.el6.i686
Thanks to Andrew Cooper
Comment 15 Andrew Jones 2013-03-11 06:05:24 EDT
Thanks Andrew!

it looks like we need to backport

commit 24a42bae6852d27ae569757f5415c91538e6a255
Author: Anupam Chanda <achanda@nicira.com>
Date:   Fri Jul 8 11:42:50 2011 -0700

    x86, hyper: Change hypervisor detection order
Comment 17 Andrew Jones 2013-03-11 09:06:27 EDT
The patch I pointed out in comment 15 would work fine, but it would also require we backport the use of the hypervisor_x86 interface for xen. We have that interface for vmware and hyperv already, but we never got xen into it - as it didn't exist at the time rhel6 xen hvm init was worked out. The patch to backport all this would still be quite simple, and get us closer to matching upstream, but there's some risk. The patch would move hvm init up earlier in setup, which for RHEL hasn't been tested. Anyway, I now see another recent patch related to this issue

commit db34bbb767bdfa1ebed7214b876fe01c5b7ee457
Author: K. Y. Srinivasan <kys@microsoft.com>
Date:   Sun Feb 3 17:22:38 2013 -0800

    X86: Add a check to catch Xen emulation of Hyper-V

This patch is meant for kernels that don't have xen support compiled in (and thus the hypervisor_x86 ordering doesn't help), however it will also resolve this bug and it has zero risk. I'll go with this one so we can more easily get the patch into z-stream too.
Comment 18 Andrew Jones 2013-03-12 04:50:31 EDT
A patched rhel6 kernel has been verified by the reporter

> Hi, Andrew!
> I have tested your kernel just now.
> It works. Thanks!
> 
> copypaste from VM's console:
> 
> [root@nagios ~]# uname -r
> 2.6.32-358.el6_bz918239_hyperv_checkxen.i686
> 
> platform:viridian is set to true in vm parameters.
Comment 21 Wei Shi 2013-03-12 06:09:40 EDT
Hi, Constantin:
  Could you help us to verify this bug when the status of this bug changed to ON_QA, using the fixed package marked on "Fixed In Version" field?
Comment 22 Constantin Dunayev 2013-03-12 06:17:14 EDT
Yes, of course
Comment 23 Eryk 2013-03-14 17:22:01 EDT
Hi guys,
i've got same problem as Constantin. I've upgraded CentOS 6.3 to 6.4 (kernel *-358) on XenServer VM. Got black screen just after booting kernel begins and virtual CPU goes to 100% load. Old kernel works fine.
Andrew, patch from Comment #13 works for me, thanks.
Comment 24 Constantin Dunayev 2013-03-18 04:45:07 EDT
This bug does not affect Centos/RHEL 6.4 PVMs.
Comment 28 belcenter 2013-03-28 11:08:05 EDT
Hello

Any updates about the fix?

Thanks
Comment 29 Andrew Jones 2013-04-04 09:48:59 EDT
*** Bug 888702 has been marked as a duplicate of this bug. ***
Comment 30 Constantin Dunayev 2013-04-24 05:58:07 EDT
kernel-2.6.32-358.6.1 (CentOS) works ok. Bug fixed.
Comment 31 belcenter 2013-04-24 06:23:25 EDT
I confirmed it's fixed
Comment 32 Jarod Wilson 2013-05-08 16:31:31 EDT
Patch(es) available on kernel-2.6.32-375.el6
Comment 35 Fock Johann 2013-05-16 03:40:28 EDT
Hallo
I  have the same Problem with RHEL6.4 and Kernel 2.6.32-279.el6.x86_64 on Xenserver 6.1 After a couple of time my server freeze
Only with a hard rebooot i can start my VM again
Comment 36 Andrew Jones 2013-05-16 03:48:32 EDT
(In reply to comment #35)
> Hallo
> I  have the same Problem with RHEL6.4 and Kernel 2.6.32-279.el6.x86_64 on
> Xenserver 6.1 After a couple of time my server freeze
> Only with a hard rebooot i can start my VM again

You need kernel 2.6.32-358.5.1.el6 or later.
Comment 37 Fock Johann 2013-05-18 16:34:48 EDT
Hallo
Thank you for reply


I am now using  Kernel 2.6.32-356 

My system works more then one day then my server freeze

my memory is more then 95 % used


can you help my
Comment 38 Andrew Jones 2013-05-20 02:20:35 EDT
(In reply to Fock Johann from comment #37)
> Hallo
> Thank you for reply
> 
> 
> I am now using  Kernel 2.6.32-356 
> 
> My system works more then one day then my server freeze
> 
> my memory is more then 95 % used
> 
> 
> can you help my

This is a different issue, and is most likely not a xen nor kernel related problem. You can get support to help you determine what application(s) are consuming all memory, and why, by contacting your usual customer support representative.
Comment 40 Can Zhang 2013-08-27 22:42:25 EDT
I reproduced this bug with:

Host: Xen 4.3.0 on 3.11.0-0.rc6.git4.1.fc21.x86_64
Guest: RHEL6.4 with kernel-2.6.32-358.el6.i686
And config file including `viridian = 1`

After `xl create`, the guest hangs. When checking CPU using `xl top`, the guest has a 100% CPU usage.

It could be verified that the bug has been fixed in kernel 2.6.32-375.el6.i686:

Host: Xen 4.3.0 on 3.11.0-0.rc6.git4.1.fc21.x86_64
Guest: RHEL6.4 with kernel-2.6.32-375.el6.i686
And config file including `viridian = 1`

The guest boots normally in these settings.
Comment 41 Wei Shi 2013-08-28 02:55:01 EDT
Bug also verified by the reporter per c30 & c31
Comment 42 errata-xmlrpc 2013-11-21 11:45:11 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1645.html

Note You need to log in before you can comment on or make changes to this bug.