Bug 570496
Summary: | can't boot rhel6 Xen FV guests from iso | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Andrew Jones <drjones> | ||||||||
Component: | syslinux | Assignee: | Peter Jones <pjones> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Release Test Team <release-test-team-automation> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 6.0 | CC: | apevec, atodorov, borgan, drjones, hpa, jforbes, minovotn, pbonzini, rlerch, sprabhu, syeghiay, xen-maint | ||||||||
Target Milestone: | beta | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | syslinux-3.86-1.1 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
Red Hat Enterprise Linux 6 Beta can not be installed as a fully virtualized Xen guest.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 580945 (view as bug list) | Environment: | |||||||||
Last Closed: | 2010-07-02 20:54:03 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 580945 | ||||||||||
Bug Blocks: | 563347 | ||||||||||
Attachments: |
|
Description
Andrew Jones
2010-03-04 14:47:46 UTC
I would really like to know what the constraints that vmxassist expect look like. It's quite possible that it's easy enough to accommodate, and if so, I would like to work around it in the upstream Syslinux code. Anyone who has a clue or know for sure? Andrew should be able to provide the data when back online, but note that he is past end of day for today. (In reply to comment #1) > I would really like to know what the constraints that vmxassist expect look > like. It's quite possible that it's easy enough to accommodate, and if so, I > would like to work around it in the upstream Syslinux code. > > Anyone who has a clue or know for sure? Unfortunately I don't know for sure, and it looks to be more complicated than I originally thought. However this is the output I got when turning debug on in vmxassist that led me down the GDT path (XEN) HVM1: Booting from CD-Rom... (XEN) HVM1: 0x000F2E83: 0xF000:0x2E83 (0) external interrupt 8 (XEN) HVM1: 0x000F9E18: 0xF000:0x9E18 (0) opc 0xC3 (XEN) HVM1: 0x000F2E83: 0xF000:0x2E83 (0) external interrupt 8 (XEN) HVM1: 0x000F9E18: 0xF000:0x9E18 (0) opc 0xC3 (XEN) HVM1: 0x0000A205: 0x0:0xA205 (0) %cs: (XEN) HVM1: 0x0000A205: 0x0:0xA205 (0) data32 (XEN) HVM1: 0x0000A207: 0x0:0xA207 (0) lgdt 0xAC20 <47, 0xAC20> (XEN) HVM1: 0x0000A20C: 0x0:0xA20C (0) movl %cr0, %eax (XEN) HVM1: 0x0000A20F: 0x0:0xA20F (0) opc 0xC (XEN) HVM1: 0x0000A211: 0x0:0xA211 (0) movl %eax, %cr0 (XEN) HVM1: 0x0000A214: 0x0:0xA214 (1) <VM86_REAL_TO_PROTECTED> (XEN) HVM1: 0x0000A214: 0x0:0xA214 (1) jmpl 0x20:0xA219 (XEN) HVM1: should never reach here in function address(): (XEN) HVM1: entry=0x00009B000000FFFF, mode=3, seg=0x00000010, offset=0x000D03E0 (XEN) HVM1: (XEN) HVM1: Halt called from %eip 0xD41DA I also added a function to dump the GDT and got this (XEN) HVM1: [0x0] = 0x00000000AC20002F, base 0xAC20, limit 0x2F (XEN) HVM1: [0x8] = 0x0000890005800067, base 0x580, limit 0x67 (XEN) HVM1: [0x10] = 0x00009B000000FFFF, base 0x0, limit 0xFFFF (XEN) HVM1: [0x18] = 0x000093000000FFFF, base 0x0, limit 0xFFFF (XEN) HVM1: [0x20] = 0x00CF9B000000FFFF, base 0x0, limit 0xFFFFFFFF (XEN) HVM1: [0x28] = 0x00CF93000000FFFF, base 0x0, limit 0xFFFFFFFF So it looks like it's trying to use an offset greater than the limit for segment 0x10. That also corresponds with the "should never reach" message which comes from this code in the address translation part of vmxassist if (entry_high & 0x8000 && ((entry_high & 0x800000 && off >> 12 <= seg_limit) || (!(entry_high & 0x800000) && off <= seg_limit))) return seg_base + off; panic("should never reach here in function address():\n\t" "entry=0x%08x%08x, mode=%d, seg=0x%08x, offset=0x%08x\n", entry_high, entry_low, mode, seg, off); After reverting the GDT with the patch I'll attach (just as a reference, not a proposal) I was able to boot further, but it still failed and appears to be for other reasons. Created attachment 398008 [details]
Patched used to test booting with reverted GDT
Created attachment 398173 [details]
Proposed patch
I would very much like it if you could try the attached patch. I can't reproduce your problem very well, but I have a hunch that this might be the issue.
Hm... I wrote a comment that seems to have disappeared. If the patch doesn't work, please change the panic() in the address() function into a printf() so we can get a bit more information about what it does when it bails. (In reply to comment #5) > I would very much like it if you could try the attached patch. I can't > reproduce your problem very well, but I have a hunch that this might be the > issue. I tested with this patch and get the same result as in comment 3. How are you trying to reproduce? If you don't have a RHEL server handy, then I think CentOS would have the same issue. (In reply to comment #6) > If the patch doesn't work, please change the panic() in the address() function > into a printf() so we can get a bit more information about what it does when it > bails. Switched the panic to a printf and now it looks like we loop for a while, trying over and over the same offset in sel 0x10, but then eventually Halt. There's code in the emulate() function of vmxassist that checks if we're not making progress, and if not it panics with the message "Unknown opcode...", which is what it looks like we're getting. Here's the last bit of the output showing an address translation try, then the halt. (XEN) HVM1: 0x00000000: 0x10:0x000D03E0 (3) <VM86_PROTECTED> (XEN) HVM1: 0x00009253: 0x10:0x00009253 (2) <VM86_PROTECTED_TO_REAL> (XEN) HVM1: 0x00009253: 0x10:0x00009253 (2) jmpl 0x0:0x9258 (XEN) HVM1: 0x00009258: 0x0:0x9258 (0) <VM86_REAL> (XEN) HVM1: 0x00009187: 0x0:0x9187 (0) lgdt 0xAC50 <47, 0xAC50> (XEN) HVM1: 0x0000918C: 0x0:0x918C (0) lidt 0xAF96 <2048, 0x100000> (XEN) HVM1: 0x00009191: 0x0:0x9191 (0) movl %cr0, %eax (XEN) HVM1: 0x00009194: 0x0:0x9194 (0) opc 0xC (XEN) HVM1: 0x00009196: 0x0:0x9196 (0) movl %eax, %cr0 (XEN) HVM1: 0x00009199: 0x0:0x9199 (1) <VM86_REAL_TO_PROTECTED> (XEN) HVM1: 0x00009199: 0x0:0x9199 (1) jmpl 0x20:0x919E (XEN) HVM1: should never reach here in function address(): (XEN) HVM1: entry=0x00009B000000FFFF, mode=3, seg=0x00000010, offset=0x000D03E0 (XEN) HVM1: 0x00000000: 0x10:0x000D03E0 (3) <VM86_PROTECTED> (XEN) HVM1: 0x00009253: 0x10:0x00009253 (2) <VM86_PROTECTED_TO_REAL> (XEN) HVM1: 0x00009253: 0x10:0x00009253 (2) jmpl 0x0:0x9258 (XEN) HVM1: 0x00009258: 0x0:0x9258 (0) <VM86_REAL> (XEN) HVM1: 0x0000A862: 0x0:0xA862 (0) opc 0xF4 (XEN) HVM1: 0x0000A862: 0x0:0xA862 (0) opc 0xF4 (XEN) HVM1: Unknown opcode at 0000:A862=0xA862 (XEN) HVM1: Halt called from %eip 0xD415A I can try some more experiments and instrumentation of vmxassist to get more data for you, just let me know what you need. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: (For beta 1 only) RHEL 6 cannot not be used as a fully virtualized Xen guest at this time. Please use it only as a paravirt guest until the issues are resolved. Could you send me the exact isolinux.bin file from the CD? Or better, post the entire .iso somewhere? I don't see any of this on the CentOS 5.4 test system I set up. Hi Peter, Sorry I didn't document how to reproduce and get the output from vmxassist better before. I've attempted to write it all up now and am attaching it to this bug. Please give it a go and let me know if you're able to see what I see. Maybe some of the instructions can also be used as a guide for another syslinux regression test, i.e. a boot test on RHEL/CentOS Xen hosts, to help with future development? Thanks, Andrew Created attachment 399065 [details]
Steps to reproduce and debug the issue on CentOS
Thanks - I will try it later. My guess is that there is an instruction and/or CPU state that vmxassist mishandles even worse than it does for other things (good God that code is wrong on so many levels.) If we can figure out *what that is* it is probably easy enough. This is a very strong hint at what might be wrong: (XEN) HVM1: Unknown opcode at 0000:A862=0xA862 I am currently on a trip and can't test anything out until I get back, but if you can get me the *exact* isolinux.bin that ran when you did that test then I might be able to make a test patch while I'm still on the road. (In reply to comment #12) > This is a very strong hint at what might be wrong: > (XEN) HVM1: Unknown opcode at 0000:A862=0xA862 > Unfortunately it might not be. I thought the same when I first saw "unknown opcode", that vmxassist just doesn't know some op. However, looking at the vmxassist code I see that "unknown opcode" is a catch-all error message used when the emulate function notices we've been looping, or there was a bad address, or... > I am currently on a trip and can't test anything out until I get back, but if > you can get me the *exact* isolinux.bin that ran when you did that test then I > might be able to make a test patch while I'm still on the road. The instructions I attached show where I got the isolinux.bin, so you can fetch the exact same one from the same place. Or, probably more importantly for you, it also shows where the rest of the files that get compiled into the boot.iso come from. You can spin your own test isolinux.bin file, and then create the boot.iso with the other files for testing. Instructions for the whole produce-boot.iso/test cycle are in the attached document. I haven't had time to look at this bug too much this week, but I hope to dig back in to it soon. Unfortunately the isolinux.bin at: http://mirror.us.as6453.net/fedora/linux/development/13/x86_64/os/isolinux/isolinux.bin [redirected from the URL in your link] doesn't match the addresses in your trace above. As I mentioned, I'm travelling, so I can't actually set up the test environment. Unfortunately I have a total of two (2) days in the office between now and the end of March. The traces above were actually made with the rhel6 iso. To get you more involved more easily I've switched this bug's debug focus to f13 on CentOS. I'm assuming if we solve the problem for the f13 iso that it will be the same for rhel6. I'll dig a bit to assure that assumption is true. Just to make sure you and I are both looking at the exact same isolinux I've also put the f13 one I'm currently looking at up here http://people.redhat.com/drjones/isolinux.orig.tar.gz The addresses from this one should match those in the attached document. I have root-caused this problem: the problem is that hvmassist simply doesn't handle a HLT instruction in real mode (HLT causes an exit from V86 mode). As such, "nohalt 1" is a valid workaround, *but* that will cause the boot loader to busy-spin with 100% CPU utilization until a selection is made. This has, in the past, made some virtualization customers specifically very unhappy. This is reasonably easy to work around in the Syslinux 4 codebase (just do the HLT in protected mode) but in Syslinux 3 it is a fairly significant change, and I'm already in the process of winding down Syslinux 3 to maintenance-only. I'm going to see if I can auto-detect the Xen environment and/or hvmassist, and automatically set nohalt on that platform. Thanks Peter! I've confirmed that adding 'nohalt 1' to the isolinux.cfg file allows us to boot rhel6 isos as xen hvm guests. I saw that we idle with the cpu at 100% while waiting for the menu selection, but f12 also had this issue, so we didn't regress there. An auto-detect patch for this environment would be excellent, in order to keep the same config file for all platforms. Thanks again. Andrew I have filed a bug report with XenSource to get information for how to autodetect the presence of vmxassist, but I haven't gotten a response. Anything you could do on your end for how to find out if vmxassist is present would help. Hi Peter, You can use cpuid to detect it. If cpuid input eax=0x40000000 returns the string XenVMMXenVMM, which is composed from all the ebx,ecx,edx bytes, then run cpuid input eax=0x40000001. That will return the major and minor number of the Xen revision in eax. The major will be in the upper 2 bytes and minor the lower 2. Anything less than 3.3 will be using vmxassist on Intel processors. Andrew A workaround is now included in Syslinux 3.86-pre2. I will probably release Syslinux 3.86 *this week*, so it would be great if you could try this out before the release goes final. This works. Thanks! I also reviewed the patch for this and the patch immediately following it. The patch immediately following it has a copy+paste error. The register name never changes in this output. 54 dump_reg("eax", eax); 55 dump_reg("eax", ebx); 56 dump_reg("eax", ecx); 57 dump_reg("eax", edx); Thanks! Syslinux 3.86 is now released, containing this workaround. *** Bug 578802 has been marked as a duplicate of this bug. *** Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,2 +1,2 @@ (For beta 1 only) -RHEL 6 cannot not be used as a fully virtualized Xen guest at this time. Please use it only as a paravirt guest until the issues are resolved.+RHEL 6 cannot be used as a fully virtualized Xen guest at this time. Please use it only as a paravirt guest until the issues are resolved. added to the beta1 release notes. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,2 +1 @@ -(For beta 1 only) +Red Hat Enterprise Linux 6 Beta can not be installed as a fully virtualized Xen guest.-RHEL 6 cannot be used as a fully virtualized Xen guest at this time. Please use it only as a paravirt guest until the issues are resolved. *** Bug 564365 has been marked as a duplicate of this bug. *** With RHEL6.0-20100422.12/Server, syslinux-3.86-1.1 I was able to start a FV Xen guest on a RHEL 5.5 host. The guest booted fine and completed the install (minimal). The guest was able to boot after install. Moving to VERIFIED. Fedora bug 601814 "Update syslinux to 3.86" Red Hat Enterprise Linux Beta 2 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |