Bug 212561 - Intel test machine (Conroe) hangs on reboot with VMX enabled
Summary: Intel test machine (Conroe) hangs on reboot with VMX enabled
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen   
(Show other bugs)
Version: 5.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Xen Maintainance List
QA Contact:
Depends On: 201298
Blocks: 492190
TreeView+ depends on / blocked
Reported: 2006-10-27 13:56 UTC by Stephen Tweedie
Modified: 2009-05-01 20:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-03-24 18:21:57 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Stephen Tweedie 2006-10-27 13:56:35 UTC
+++ This bug was initially created as a clone of Bug #201298 +++

The Intel test machines of model number D3C51SYBANCNRB fail to reboot under Xen
if VMX is enabled in the BIOS.  The reboot code of Xen is identical to that of
Linux, in particular, it attempts a reboot via the keyboard reset line, and if
that fails, it tries to create a triple fault by zeroing the IDT.

This works perfectly if VMX is diabled in the BIOS.  This also works if
start_vmx is not called from Xen.  In fact, the reboot still works if start_vmx
is followed immediately by a call to stop_vmx.

Booting Xen with reboot=b (which uses a different reboot strategy where it jumps
into a BIOS address in real mode) causes it to reboot correctly even with VMX

What I'd like to know from Intel is

1) What is the reason that the aformentioned reboot strategies (keyboard +
triple fault, a standard reboot strategy for many years in Linux) fails when VMX
was enabled in the past?
2) If you think this reboot strategy is flawed, what reboot strategy do you
recommend that can work across the full range of x86 hardware that Linux/Xen

-- Additional comment from yunfeng.zhao@intel.com on 2006-08-04 11:54 EST --
We also have the same problem.
Native RHEL4/SLES10 could be rebooted on Conroe without any problems, but xen0 
could not be rebooted.
To reboot xen0 on Conroe we have to press reset button manually.
We thought it's a hardware fault of Conroe.

I will recheck this issue, and see if it's a bug xen vt.


-- Additional comment from bstein@redhat.com on 2006-08-14 11:11 EST --
Yunfeng - any update?

-- Additional comment from sct@redhat.com on 2006-08-21 10:23 EST --
Also fails for me on the same product code.  Attaching successful and failing
boot logs in case they are useful.

-- Additional comment from sct@redhat.com on 2006-08-21 10:26 EST --
Created an attachment (id=134562)
Boot logs from kernel-2.6.17-1.2573.fc6PAE

Reboot/poweroff work correctly on this kernel.

-- Additional comment from sct@redhat.com on 2006-08-21 10:27 EST --
Created an attachment (id=134563)
Boot logs from kernel-2.6.17-1.2573.fc6xen

Reboot/poweroff always hang on this kernel.

-- Additional comment from herbert.xu@redhat.com on 2006-09-03 20:31 EST --
Thanks for the dmesgs Stephen.  At this point I'm only interested in poweroff
since your reboot issue can be explained by the fact that Xen enables VMX while
baremetal does not.

I can see two differences between your setup and mine.  Firstly your ACPI BIOS
is different to mine, and secondly I need to test using the same kernel as
you're to see if that could make my baremetal poweroff consistently.

I've observed an interesting phenomon with poweroff on my machine.  It seems to
work if I leave it either on or off for an extended period of time.  However, it
only works once.  That is, if I power it back on after a successful poweroff and
immediately try to halt, it fails to power off.

So could you do an experiment for me? See if you could do three or four
successive poweroffs (on baremetal of course) and let me know whether they all

-- Additional comment from sct@redhat.com on 2006-09-04 06:25 EST --
Using the 1.2600 PAE kernel, bare metal poweroff failed on the first attempt.  

Strange, as it used to work reliably; but then again I haven't tried it
recently, as all my recent work has been using a -xen kernel on that box, and
I've just got used to having to poweroff manually with that kernel.

-- Additional comment from herbert.xu@redhat.com on 2006-09-05 03:45 EST --
Created an attachment (id=135530)
Patch to work around reboot issue.

This patch works around the reboot issue by rebooting through the BIOS if VMX
is detected to be on.  It works on my machine.	Please let me know if it allows
your machines to reboot.

-- Additional comment from bstein@redhat.com on 2006-10-26 15:54 EST --
Is this patch not included in 3.0.3?  If it is included, please close it.

-- Additional comment from herbert.xu@redhat.com on 2006-10-26 18:21 EST --
This patch is not part of 3.0.3.  I haven't submitted it yet because I'm waiting
for confirmation that it works on a machine other than mine.  Thanks.

Comment 1 Chris Lalancette 2007-10-04 18:43:44 UTC
     I just tried the equivalent of your patch on my Conroe that was giving me
problems.  In particular, I put "reboot=b" on the hypervisor command-line, which
forces the machine to reboot through the BIOS.  With that option, I was able to
successfully reboot the machine using "reboot", while before that wouldn't work.
 I'm still looking at the Xen shutdown code, though; they do disable VMX before
trying to reboot, but of course there may be a BIOS bug that is preventing it
from working properly.

Chris Lalancette

Comment 2 Chris Lalancette 2007-10-04 20:06:46 UTC
There's actually more to it than this, though.  If I "echo b >
/proc/sysrq-trigger", the machine *does* reboot properly.  Looking at the HV
code, there is really little to no difference with respect to what happens in a
"reboot".  However, the dom0 code is different, in that with a "reboot" it does
a "kernel_restart_prepare()", while with the SysRq, it actually just immediately
hypercalls.  If I comment out "device_shutdown()" in kernel_restart_prepare(),
the reboot does actually take place properly.  So it seems like one of the
->shutdown() routines in the drivers is causing the problem, although where the
problem would be is an open question.

Chris Lalancette

Comment 3 Chris Lalancette 2007-10-04 21:46:34 UTC
OK, I tracked this into the e1000 driver now.  If I unload the e1000 driver
before "reboot", the system reboots fine.  Also, if I make e1000_shutdown() do
nothing instead of e1000_suspend(), it also reboots fine.  e1000_suspend() does
a number of things to bring the hardware down, finally finishing with
"pci_set_power_state(pdev, pci_choose_state(pdev, state));", where state is
passed in as PMSG_SUSPEND which equates to PCI_D3hot.  If I comment out this
line of code in e1000_main.c, reboot actually works properly.

Chris Lalancette

Comment 4 Bill Burns 2008-04-16 20:20:27 UTC
Is this still an issue or can we close this?

Comment 5 Jeremy Katz 2009-03-24 18:21:57 UTC
Closing due to lack of information/activity

Note You need to log in before you can comment on or make changes to this bug.