The Intel test machines of model number D3C51SYBANCNRB fail to reboot under Xen if VMX is enabled in the BIOS. The reboot code of Xen is identical to that of Linux, in particular, it attempts a reboot via the keyboard reset line, and if that fails, it tries to create a triple fault by zeroing the IDT. This works perfectly if VMX is diabled in the BIOS. This also works if start_vmx is not called from Xen. In fact, the reboot still works if start_vmx is followed immediately by a call to stop_vmx. Booting Xen with reboot=b (which uses a different reboot strategy where it jumps into a BIOS address in real mode) causes it to reboot correctly even with VMX enabled. What I'd like to know from Intel is 1) What is the reason that the aformentioned reboot strategies (keyboard + triple fault, a standard reboot strategy for many years in Linux) fails when VMX was enabled in the past? 2) If you think this reboot strategy is flawed, what reboot strategy do you recommend that can work across the full range of x86 hardware that Linux/Xen supports?
We also have the same problem. Native RHEL4/SLES10 could be rebooted on Conroe without any problems, but xen0 could not be rebooted. To reboot xen0 on Conroe we have to press reset button manually. We thought it's a hardware fault of Conroe. I will recheck this issue, and see if it's a bug xen vt. thanks Yunfeng
Yunfeng - any update?
Also fails for me on the same product code. Attaching successful and failing boot logs in case they are useful.
Created attachment 134562 [details] Boot logs from kernel-2.6.17-1.2573.fc6PAE Reboot/poweroff work correctly on this kernel.
Created attachment 134563 [details] Boot logs from kernel-2.6.17-1.2573.fc6xen Reboot/poweroff always hang on this kernel.
Thanks for the dmesgs Stephen. At this point I'm only interested in poweroff since your reboot issue can be explained by the fact that Xen enables VMX while baremetal does not. I can see two differences between your setup and mine. Firstly your ACPI BIOS is different to mine, and secondly I need to test using the same kernel as you're to see if that could make my baremetal poweroff consistently. I've observed an interesting phenomon with poweroff on my machine. It seems to work if I leave it either on or off for an extended period of time. However, it only works once. That is, if I power it back on after a successful poweroff and immediately try to halt, it fails to power off. So could you do an experiment for me? See if you could do three or four successive poweroffs (on baremetal of course) and let me know whether they all succeed.
Using the 1.2600 PAE kernel, bare metal poweroff failed on the first attempt. Strange, as it used to work reliably; but then again I haven't tried it recently, as all my recent work has been using a -xen kernel on that box, and I've just got used to having to poweroff manually with that kernel.
Created attachment 135530 [details] Patch to work around reboot issue. This patch works around the reboot issue by rebooting through the BIOS if VMX is detected to be on. It works on my machine. Please let me know if it allows your machines to reboot.
Is this patch not included in 3.0.3? If it is included, please close it.
This patch is not part of 3.0.3. I haven't submitted it yet because I'm waiting for confirmation that it works on a machine other than mine. Thanks.
change QA contact
Is this problem still present on Xen 3.1? It would be interesting to check if the problem happens when KVM is used to enable VMX, too.
The motherboard in question has been upgraded long ago. So unfortunately I'm no longer in a position to reproduce this.
Eduardo, I also have a Conroe/Mequon, and with RHEL-5.1, the issue still happens. A couple of interesting points: 1) danpb noticed that inside the dom0, if you do "echo b > /proc/sysrq-trigger", the box *will* reboot successfully. 2) Based on 1), I took a quick gander at the shutdown code in dom0 and the HV. In terms of the dom0, there is not too much interesting; it really just traps out to the HV with a shutdown event to do the shutdown. However, I didn't see any significant differences between the "crash" case above and a "shutdown -h now". If you want me to do any additional testing, I'm happy to do it; I've just sort of put it on the back burner since it doesn't seem all that important. Chris Lalancette
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
This bug has been in NEEDINFO for more than 30 days since feedback was first requested. As a result we are closing it. If you can reproduce this bug in the future against a maintained Fedora version please feel free to reopen it against that version. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp