Bug 570496

Summary: can't boot rhel6 Xen FV guests from iso
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Jones <drjones>
Component: syslinuxAssignee: Peter Jones <pjones>
Status: CLOSED CURRENTRELEASE QA Contact: Release Test Team <release-test-team-automation>
Severity: urgent Docs Contact:
Priority: low    
Version: 6.0CC: apevec, atodorov, borgan, drjones, hpa, jforbes, minovotn, pbonzini, rlerch, sprabhu, syeghiay, xen-maint
Target Milestone: beta   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: syslinux-3.86-1.1 Doc Type: Bug Fix
Doc Text:
Red Hat Enterprise Linux 6 Beta can not be installed as a fully virtualized Xen guest.
Story Points: ---
Clone Of:
: 580945 (view as bug list) Environment:
Last Closed: 2010-07-02 20:54:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 580945    
Bug Blocks: 563347    
Attachments:
Description Flags
Patched used to test booting with reverted GDT
none
Proposed patch
none
Steps to reproduce and debug the issue on CentOS none

Description Andrew Jones 2010-03-04 14:47:46 UTC
Xen must emulate BIOS when booting fully virtualized guests. The rhel5 xen tools are based on Xen 3.0.3 and use a component called vmxassist to emulate real-mode instructions for Intel's VMX cpus. The documentation for vmxassist states that it traded-off completeness for speed. Fedora 12's syslinux (3.75-4) creates isolinux.bin files that boot fine with vmxassist, however rev. 3.83-1.1 packaged with rhel6, and whatever rev is packaged with rawhide do not.

Turning on debug in vmxassist gives output that indicates vmxassist is assuming things about the GDT that have changed. Looking in syslinux we see that a relatively big change to the GDT was made with commits 41f91081 and a7dd6590.

I've attempted to revert the GDT change, but the guest still fails to boot. I'm continuing some experiments to see if I can further understand what exactly is going wrong. Also, while it's possible that vmxassist is making bad assumptions, meaning it has the bug, or is just an incomplete emulator, IMO we should fix this in isolinux. Otherwise all existing Xen servers would need to upgrade the xen tools in order to support installing later linux distributions.

For rhel 6.0 we have these options (depending on the results of our investigation):
1. patch this quickly in syslinux
2. patch this quickly in vmxassist
3. revert the rev. of syslinux to the one used in f12

I don't believe (2) is the right choice for the reason stated above, and it's likely too intrusive at this point in Xen's life as well. Hopefully we can do (1) now, otherwise (1) can be done for 6.1 and (3) now.

Comment 1 H. Peter Anvin 2010-03-04 16:33:01 UTC
I would really like to know what the constraints that vmxassist expect look like. It's quite possible that it's easy enough to accommodate, and if so, I would like to work around it in the upstream Syslinux code.

Anyone who has a clue or know for sure?

Comment 2 Bill Burns 2010-03-04 17:43:52 UTC
Andrew should be able to provide the data when back online, but note that he is past end of day for today.

Comment 3 Andrew Jones 2010-03-05 09:45:35 UTC
(In reply to comment #1)
> I would really like to know what the constraints that vmxassist expect look
> like. It's quite possible that it's easy enough to accommodate, and if so, I
> would like to work around it in the upstream Syslinux code.
> 
> Anyone who has a clue or know for sure?    

Unfortunately I don't know for sure, and it looks to be more complicated than I originally thought. However this is the output I got when turning debug on in vmxassist that led me down the GDT path

(XEN) HVM1: Booting from CD-Rom...
(XEN) HVM1: 0x000F2E83: 0xF000:0x2E83 (0) external interrupt 8
(XEN) HVM1: 0x000F9E18: 0xF000:0x9E18 (0) opc 0xC3
(XEN) HVM1: 0x000F2E83: 0xF000:0x2E83 (0) external interrupt 8
(XEN) HVM1: 0x000F9E18: 0xF000:0x9E18 (0) opc 0xC3
(XEN) HVM1: 0x0000A205: 0x0:0xA205 (0) %cs:
(XEN) HVM1: 0x0000A205: 0x0:0xA205 (0) data32
(XEN) HVM1: 0x0000A207: 0x0:0xA207 (0) lgdt 0xAC20 <47, 0xAC20>
(XEN) HVM1: 0x0000A20C: 0x0:0xA20C (0) movl %cr0, %eax
(XEN) HVM1: 0x0000A20F: 0x0:0xA20F (0) opc 0xC
(XEN) HVM1: 0x0000A211: 0x0:0xA211 (0) movl %eax, %cr0
(XEN) HVM1: 0x0000A214: 0x0:0xA214 (1) <VM86_REAL_TO_PROTECTED>
(XEN) HVM1: 0x0000A214: 0x0:0xA214 (1) jmpl 0x20:0xA219
(XEN) HVM1: should never reach here in function address():
(XEN) HVM1:     entry=0x00009B000000FFFF, mode=3, seg=0x00000010, offset=0x000D03E0
(XEN) HVM1:
(XEN) HVM1: Halt called from %eip 0xD41DA

I also added a function to dump the GDT and got this

(XEN) HVM1: [0x0] = 0x00000000AC20002F, base 0xAC20, limit 0x2F
(XEN) HVM1: [0x8] = 0x0000890005800067, base 0x580, limit 0x67
(XEN) HVM1: [0x10] = 0x00009B000000FFFF, base 0x0, limit 0xFFFF
(XEN) HVM1: [0x18] = 0x000093000000FFFF, base 0x0, limit 0xFFFF
(XEN) HVM1: [0x20] = 0x00CF9B000000FFFF, base 0x0, limit 0xFFFFFFFF
(XEN) HVM1: [0x28] = 0x00CF93000000FFFF, base 0x0, limit 0xFFFFFFFF

So it looks like it's trying to use an offset greater than the limit for segment 0x10. That also corresponds with the "should never reach" message which comes from this code in the address translation part of vmxassist

        if (entry_high & 0x8000 &&
                ((entry_high & 0x800000 && off >> 12 <= seg_limit) ||
                (!(entry_high & 0x800000) && off <= seg_limit)))
                return seg_base + off;

        panic("should never reach here in function address():\n\t"
                  "entry=0x%08x%08x, mode=%d, seg=0x%08x, offset=0x%08x\n",
                  entry_high, entry_low, mode, seg, off);

After reverting the GDT with the patch I'll attach (just as a reference, not a proposal) I was able to boot further, but it still failed and appears to be for other reasons.

Comment 4 Andrew Jones 2010-03-05 09:48:03 UTC
Created attachment 398008 [details]
Patched used to test booting with reverted GDT

Comment 5 H. Peter Anvin 2010-03-06 02:30:28 UTC
Created attachment 398173 [details]
Proposed patch

I would very much like it if you could try the attached patch.  I can't reproduce your problem very well, but I have a hunch that this might be the issue.

Comment 6 H. Peter Anvin 2010-03-06 04:34:39 UTC
Hm... I wrote a comment that seems to have disappeared.

If the patch doesn't work, please change the panic() in the address() function into a printf() so we can get a bit more information about what it does when it bails.

Comment 7 Andrew Jones 2010-03-08 10:56:32 UTC
(In reply to comment #5)
> I would very much like it if you could try the attached patch.  I can't
> reproduce your problem very well, but I have a hunch that this might be the
> issue.    

I tested with this patch and get the same result as in comment 3. How are you trying to reproduce? If you don't have a RHEL server handy, then I think CentOS would have the same issue.

(In reply to comment #6)
> If the patch doesn't work, please change the panic() in the address() function
> into a printf() so we can get a bit more information about what it does when it
> bails.

Switched the panic to a printf and now it looks like we loop for a while, trying over and over the same offset in sel 0x10, but then eventually Halt. There's code in the emulate() function of vmxassist that checks if we're not making progress, and if not it panics with the message "Unknown opcode...", which is what it looks like we're getting.

Here's the last bit of the output showing an address translation try, then the halt.

(XEN) HVM1: 0x00000000: 0x10:0x000D03E0 (3) <VM86_PROTECTED>

(XEN) HVM1: 0x00009253: 0x10:0x00009253 (2) <VM86_PROTECTED_TO_REAL>

(XEN) HVM1: 0x00009253: 0x10:0x00009253 (2) jmpl 0x0:0x9258

(XEN) HVM1: 0x00009258: 0x0:0x9258 (0) <VM86_REAL>

(XEN) HVM1: 0x00009187: 0x0:0x9187 (0) lgdt 0xAC50 <47, 0xAC50>

(XEN) HVM1: 0x0000918C: 0x0:0x918C (0) lidt 0xAF96 <2048, 0x100000>

(XEN) HVM1: 0x00009191: 0x0:0x9191 (0) movl %cr0, %eax

(XEN) HVM1: 0x00009194: 0x0:0x9194 (0) opc 0xC

(XEN) HVM1: 0x00009196: 0x0:0x9196 (0) movl %eax, %cr0

(XEN) HVM1: 0x00009199: 0x0:0x9199 (1) <VM86_REAL_TO_PROTECTED>

(XEN) HVM1: 0x00009199: 0x0:0x9199 (1) jmpl 0x20:0x919E

(XEN) HVM1: should never reach here in function address():

(XEN) HVM1:     entry=0x00009B000000FFFF, mode=3, seg=0x00000010, offset=0x000D03E0

(XEN) HVM1: 0x00000000: 0x10:0x000D03E0 (3) <VM86_PROTECTED>

(XEN) HVM1: 0x00009253: 0x10:0x00009253 (2) <VM86_PROTECTED_TO_REAL>

(XEN) HVM1: 0x00009253: 0x10:0x00009253 (2) jmpl 0x0:0x9258

(XEN) HVM1: 0x00009258: 0x0:0x9258 (0) <VM86_REAL>

(XEN) HVM1: 0x0000A862: 0x0:0xA862 (0) opc 0xF4

(XEN) HVM1: 0x0000A862: 0x0:0xA862 (0) opc 0xF4

(XEN) HVM1: Unknown opcode at 0000:A862=0xA862

(XEN) HVM1: Halt called from %eip 0xD415A



I can try some more experiments and instrumentation of vmxassist to get more data for you, just let me know what you need.

Comment 8 Bill Burns 2010-03-09 20:15:26 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
(For beta 1 only)
RHEL 6 cannot not be used as a fully virtualized Xen guest at this time. Please use it only as a paravirt guest until the issues are resolved.

Comment 9 H. Peter Anvin 2010-03-10 00:17:27 UTC
Could you send me the exact isolinux.bin file from the CD?  Or better, post the entire .iso somewhere?

I don't see any of this on the CentOS 5.4 test system I set up.

Comment 10 Andrew Jones 2010-03-10 12:39:57 UTC
Hi Peter,

Sorry I didn't document how to reproduce and get the output from vmxassist better before. I've attempted to write it all up now and am attaching it to this bug. Please give it a go and let me know if you're able to see what I see. Maybe some of the instructions can also be used as a guide for another syslinux regression test, i.e. a boot test on RHEL/CentOS Xen hosts, to help with future development?

Thanks,
Andrew

Comment 11 Andrew Jones 2010-03-10 12:40:53 UTC
Created attachment 399065 [details]
Steps to reproduce and debug the issue on CentOS

Comment 12 H. Peter Anvin 2010-03-10 20:15:34 UTC
Thanks - I will try it later.  My guess is that there is an instruction and/or CPU state that vmxassist mishandles even worse than it does for other things (good God that code is wrong on so many levels.)

If we can figure out *what that is* it is probably easy enough.

This is a very strong hint at what might be wrong:
(XEN) HVM1: Unknown opcode at 0000:A862=0xA862

I am currently on a trip and can't test anything out until I get back, but if you can get me the *exact* isolinux.bin that ran when you did that test then I might be able to make a test patch while I'm still on the road.

Comment 13 Andrew Jones 2010-03-11 08:58:21 UTC
(In reply to comment #12)
> This is a very strong hint at what might be wrong:
> (XEN) HVM1: Unknown opcode at 0000:A862=0xA862
> 

Unfortunately it might not be. I thought the same when I first saw "unknown opcode", that vmxassist just doesn't know some op. However, looking at the vmxassist code I see that "unknown opcode" is a catch-all error message used when the emulate function notices we've been looping, or there was a bad address, or...

> I am currently on a trip and can't test anything out until I get back, but if
> you can get me the *exact* isolinux.bin that ran when you did that test then I
> might be able to make a test patch while I'm still on the road.    

The instructions I attached show where I got the isolinux.bin, so you can fetch the exact same one from the same place. Or, probably more importantly for you, it also shows where the rest of the files that get compiled into the boot.iso come from. You can spin your own test isolinux.bin file, and then create the boot.iso with the other files for testing. Instructions for the whole produce-boot.iso/test cycle are in the attached document.

I haven't had time to look at this bug too much this week, but I hope to dig back in to it soon.

Comment 14 H. Peter Anvin 2010-03-11 21:42:04 UTC
Unfortunately the isolinux.bin at:

http://mirror.us.as6453.net/fedora/linux/development/13/x86_64/os/isolinux/isolinux.bin

[redirected from the URL in your link]

doesn't match the addresses in your trace above.

As I mentioned, I'm travelling, so I can't actually set up the test environment.  Unfortunately I have a total of two (2) days in the office between now and the end of March.

Comment 15 Andrew Jones 2010-03-11 22:25:54 UTC
The traces above were actually made with the rhel6 iso. To get you more involved more easily I've switched this bug's debug focus to f13 on CentOS. I'm assuming if we solve the problem for the f13 iso that it will be the same for rhel6. I'll dig a bit to assure that assumption is true. Just to make sure you and I are both looking at the exact same isolinux I've also put the f13 one I'm currently looking at up here

http://people.redhat.com/drjones/isolinux.orig.tar.gz

The addresses from this one should match those in the attached document.

Comment 18 H. Peter Anvin 2010-03-12 23:34:07 UTC
I have root-caused this problem: the problem is that hvmassist simply doesn't handle a HLT instruction in real mode (HLT causes an exit from V86 mode).  As such, "nohalt 1" is a valid workaround, *but* that will cause the boot loader to busy-spin with 100% CPU utilization until a selection is made.  This has, in the past, made some virtualization customers specifically very unhappy.

This is reasonably easy to work around in the Syslinux 4 codebase (just do the HLT in protected mode) but in Syslinux 3 it is a fairly significant change, and I'm already in the process of winding down Syslinux 3 to maintenance-only.

I'm going to see if I can auto-detect the Xen environment and/or hvmassist, and automatically set nohalt on that platform.

Comment 19 Andrew Jones 2010-03-15 07:20:01 UTC
Thanks Peter! I've confirmed that adding 'nohalt 1' to the isolinux.cfg file allows us to boot rhel6 isos as xen hvm guests. I saw that we idle with the cpu at 100% while waiting for the menu selection, but f12 also had this issue, so we didn't regress there. An auto-detect patch for this environment would be excellent, in order to keep the same config file for all platforms. Thanks again.

Andrew

Comment 20 H. Peter Anvin 2010-03-15 19:39:08 UTC
I have filed a bug report with XenSource to get information for how to autodetect the presence of vmxassist, but I haven't gotten a response.  Anything you could do on your end for how to find out if vmxassist is present would help.

Comment 21 Andrew Jones 2010-03-16 08:00:18 UTC
Hi Peter,

You can use cpuid to detect it.

If cpuid input eax=0x40000000 returns the string XenVMMXenVMM, which is composed from all the ebx,ecx,edx bytes, then run cpuid input eax=0x40000001. That will return the major and minor number of the Xen revision in eax. The major will be in the upper 2 bytes and minor the lower 2. Anything less than 3.3 will be using vmxassist on Intel processors.

Andrew

Comment 22 H. Peter Anvin 2010-03-30 22:41:56 UTC
A workaround is now included in Syslinux 3.86-pre2.  I will probably release Syslinux 3.86 *this week*, so it would be great if you could try this out before the release goes final.

Comment 23 Andrew Jones 2010-03-31 08:46:40 UTC
This works. Thanks!

I also reviewed the patch for this and the patch immediately following it. The patch immediately following it has a copy+paste error. The register name never changes in this output.

 54     dump_reg("eax", eax);
 55     dump_reg("eax", ebx);
 56     dump_reg("eax", ecx);
 57     dump_reg("eax", edx);

Comment 24 H. Peter Anvin 2010-03-31 15:55:33 UTC
Thanks!

Comment 25 H. Peter Anvin 2010-04-01 18:19:56 UTC
Syslinux 3.86 is now released, containing this workaround.

Comment 26 Andrew Jones 2010-04-06 08:30:54 UTC
*** Bug 578802 has been marked as a duplicate of this bug. ***

Comment 28 Paolo Bonzini 2010-04-12 09:47:02 UTC
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,2 +1,2 @@
 (For beta 1 only)
-RHEL 6 cannot not be used as a fully virtualized Xen guest at this time. Please use it only as a paravirt guest until the issues are resolved.+RHEL 6 cannot be used as a fully virtualized Xen guest at this time. Please use it only as a paravirt guest until the issues are resolved.

Comment 29 Ryan Lerch 2010-04-15 01:06:23 UTC
added to the beta1 release notes.

Comment 30 Ryan Lerch 2010-04-15 01:06:23 UTC
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,2 +1 @@
-(For beta 1 only)
+Red Hat Enterprise Linux 6 Beta can not be installed as a fully virtualized Xen guest.-RHEL 6 cannot be used as a fully virtualized Xen guest at this time. Please use it only as a paravirt guest until the issues are resolved.

Comment 31 Michal Novotny 2010-04-22 08:49:53 UTC
*** Bug 564365 has been marked as a duplicate of this bug. ***

Comment 33 Alexander Todorov 2010-04-23 15:48:32 UTC
With RHEL6.0-20100422.12/Server, syslinux-3.86-1.1 I was able to start a FV Xen guest on a RHEL 5.5 host. The guest booted fine and completed the install (minimal). The guest was able to boot after install. Moving to VERIFIED.

Comment 34 Alan Pevec 2010-06-11 11:25:02 UTC
Fedora bug 601814 "Update syslinux to 3.86"

Comment 35 releng-rhel@redhat.com 2010-07-02 20:54:03 UTC
Red Hat Enterprise Linux Beta 2 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.