201298 – Intel test machine (Conroe) hangs on reboot with VMX enabled

Bug 201298 - Intel test machine (Conroe) hangs on reboot with VMX enabled

Summary: Intel test machine (Conroe) hangs on reboot with VMX enabled

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel-xen
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Eduardo Habkost
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:	bzcl34nup
Depends On:
Blocks:	212561
TreeView+	depends on / blocked

Reported:	2006-08-04 05:14 UTC by Herbert Xu
Modified:	2009-12-14 20:41 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-05-07 00:43:40 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Boot logs from kernel-2.6.17-1.2573.fc6PAE (32.28 KB, text/plain) 2006-08-21 14:26 UTC, Stephen Tweedie	no flags	Details
Boot logs from kernel-2.6.17-1.2573.fc6xen (31.85 KB, text/plain) 2006-08-21 14:27 UTC, Stephen Tweedie	no flags	Details
Patch to work around reboot issue. (514 bytes, patch) 2006-09-05 07:45 UTC, Herbert Xu	no flags	Details \| Diff
View All

Description Herbert Xu 2006-08-04 05:14:57 UTC

The Intel test machines of model number D3C51SYBANCNRB fail to reboot under Xen
if VMX is enabled in the BIOS.  The reboot code of Xen is identical to that of
Linux, in particular, it attempts a reboot via the keyboard reset line, and if
that fails, it tries to create a triple fault by zeroing the IDT.

This works perfectly if VMX is diabled in the BIOS.  This also works if
start_vmx is not called from Xen.  In fact, the reboot still works if start_vmx
is followed immediately by a call to stop_vmx.

Booting Xen with reboot=b (which uses a different reboot strategy where it jumps
into a BIOS address in real mode) causes it to reboot correctly even with VMX
enabled.

What I'd like to know from Intel is

1) What is the reason that the aformentioned reboot strategies (keyboard +
triple fault, a standard reboot strategy for many years in Linux) fails when VMX
was enabled in the past?
2) If you think this reboot strategy is flawed, what reboot strategy do you
recommend that can work across the full range of x86 hardware that Linux/Xen
supports?

Comment 1 Zhao Yunfeng 2006-08-04 15:54:40 UTC

We also have the same problem.
Native RHEL4/SLES10 could be rebooted on Conroe without any problems, but xen0 
could not be rebooted.
To reboot xen0 on Conroe we have to press reset button manually.
We thought it's a hardware fault of Conroe.

I will recheck this issue, and see if it's a bug xen vt.

thanks
Yunfeng

Comment 2 Brian Stein 2006-08-14 15:11:36 UTC

Yunfeng - any update?

Comment 3 Stephen Tweedie 2006-08-21 14:23:37 UTC

Also fails for me on the same product code.  Attaching successful and failing
boot logs in case they are useful.

Comment 4 Stephen Tweedie 2006-08-21 14:26:14 UTC

Created attachment 134562 [details]
Boot logs from kernel-2.6.17-1.2573.fc6PAE

Reboot/poweroff work correctly on this kernel.

Comment 5 Stephen Tweedie 2006-08-21 14:27:34 UTC

Created attachment 134563 [details]
Boot logs from kernel-2.6.17-1.2573.fc6xen

Reboot/poweroff always hang on this kernel.

Comment 6 Herbert Xu 2006-09-04 00:31:18 UTC

Thanks for the dmesgs Stephen.  At this point I'm only interested in poweroff
since your reboot issue can be explained by the fact that Xen enables VMX while
baremetal does not.

I can see two differences between your setup and mine.  Firstly your ACPI BIOS
is different to mine, and secondly I need to test using the same kernel as
you're to see if that could make my baremetal poweroff consistently.

I've observed an interesting phenomon with poweroff on my machine.  It seems to
work if I leave it either on or off for an extended period of time.  However, it
only works once.  That is, if I power it back on after a successful poweroff and
immediately try to halt, it fails to power off.

So could you do an experiment for me? See if you could do three or four
successive poweroffs (on baremetal of course) and let me know whether they all
succeed.

Comment 7 Stephen Tweedie 2006-09-04 10:25:23 UTC

Using the 1.2600 PAE kernel, bare metal poweroff failed on the first attempt.  

Strange, as it used to work reliably; but then again I haven't tried it
recently, as all my recent work has been using a -xen kernel on that box, and
I've just got used to having to poweroff manually with that kernel.

Comment 8 Herbert Xu 2006-09-05 07:45:41 UTC

Created attachment 135530 [details]
Patch to work around reboot issue.

This patch works around the reboot issue by rebooting through the BIOS if VMX
is detected to be on.  It works on my machine.	Please let me know if it allows
your machines to reboot.

Comment 9 Brian Stein 2006-10-26 19:54:48 UTC

Is this patch not included in 3.0.3?  If it is included, please close it.

Comment 10 Herbert Xu 2006-10-26 22:21:47 UTC

This patch is not part of 3.0.3.  I haven't submitted it yet because I'm waiting
for confirmation that it works on a machine other than mine.  Thanks.

Comment 11 Red Hat Bugzilla 2007-07-24 23:56:34 UTC

change QA contact

Comment 12 Eduardo Habkost 2007-09-28 16:49:27 UTC

Is this problem still present on Xen 3.1?

It would be interesting to check if the problem happens when KVM is used to 
enable VMX, too.

Comment 13 Herbert Xu 2007-09-29 00:53:57 UTC

The motherboard in question has been upgraded long ago.  So unfortunately I'm no
longer in a position to reproduce this.

Comment 14 Chris Lalancette 2007-10-01 13:14:59 UTC

Eduardo,
     I also have a Conroe/Mequon, and with RHEL-5.1, the issue still happens.  A
couple of interesting points:

1)  danpb noticed that inside the dom0, if you do "echo b >
/proc/sysrq-trigger", the box *will* reboot successfully.

2)  Based on 1), I took a quick gander at the shutdown code in dom0 and the HV.
 In terms of the dom0, there is not too much interesting; it really just traps
out to the HV with a shutdown event to do the shutdown.  However, I didn't see
any significant differences between the "crash" case above and a "shutdown -h now".

If you want me to do any additional testing, I'm happy to do it; I've just sort
of put it on the back burner since it doesn't seem all that important.

Chris Lalancette

Comment 15 Bug Zapper 2008-04-03 17:55:20 UTC

Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 16 Bug Zapper 2008-05-07 00:43:38 UTC

This bug has been in NEEDINFO for more than 30 days since feedback was
first requested. As a result we are closing it.

If you can reproduce this bug in the future against a maintained Fedora
version please feel free to reopen it against that version.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

Note You need to log in before you can comment on or make changes to this bug.