Bug 967652
Summary: | rhel 5.9x64 fails to install via virt-manager/qemu | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michele Baldessari <michele> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 19 | CC: | amit.shah, berrange, cbuissar, cdahlin, cfergeau, crobinso, david.brown, dwmw2, fschwarz, gansalmon, itamar, jcapik, jentrena, jonathan.barber, jonathan, jpopelka, kernel-maint, madhu.chinakonda, mattias.ellert, michele, pbonzini, pcfe, pep, pholica, plambri, ppisar, pzhukov, rcollet, rjones, scottt.tw, scui, ss, tg, tim | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-3.11.9-100.fc18 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 969244 (view as bug list) | Environment: | |||||||
Last Closed: | 2013-11-27 12:18:51 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Michele Baldessari
2013-05-27 19:42:28 UTC
I can reproduce this on 3.10.0-0.rc3.git0.2.fc20.x86_64 and also with qemu-1.5.0-3 taken from rawhide and recompiled on current F19 Thanks, I am reassigning to kernel since the QEMU version seems not to matter. Reproduced by Michele with: $ qemu-system-x86_64 -kernel vmlinuz -initrd initrd.img --enable-kvm -monitor stdio -no-shutdown Where vmlinuz and initrd come from the RHEL9 x64 PXE images. (qemu) x/16i $pc 0x00000000006b9319: nopl 0x0(%rax) 0x00000000006b9320: movzbl (%rcx,%rsi,1),%eax 0x00000000006b9324: add $0x1,%r8d 0x00000000006b9328: mov %al,(%rcx,%rdi,1) Created attachment 754434 [details]
patch under test
So the patch does remove the crash as reported in the initial report, but things seem to be stuck without any other messages. I tried current linux git + patch c#3 with the following command: qemu-system-x86_64 -monitor stdio -no-shutdown -drive file=rhel57x64.img -drive file=/data/software/linux/rh/rhel/5.9/RHEL5.9-Server-20121129.0-x86_64-DVD.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 The above works properly and installation continues. If I add '--enable-kvm' it hangs right after displaying vmlinuz... and Loading initramfs.... If it's relevant, when it is stuck: QEMU 1.5.0 monitor - type 'help' for more information (qemu) x/16 $pc 00000000006b9774: 0x0000feeb 0x00000000 0x00000000 0x343d8d48 00000000006b9784: 0xe8000023 0xffffffc4 0x00c3c031 0x56415741 00000000006b9794: 0x54415541 0x83485355 0x058b58ec 0x000067e4 00000000006b97a4: 0xcd058b4c 0x89000067 0x89042454 0x8b4c240c (qemu) x/16i $pc 0x00000000006b9774: jmp 0x6b9774 0x00000000006b9776: add %al,(%rax) 0x00000000006b9778: add %al,(%rax) 0x00000000006b977a: add %al,(%rax) 0x00000000006b977c: add %al,(%rax) 0x00000000006b977e: add %al,(%rax) 0x00000000006b9780: lea 0x2334(%rip),%rdi # 0x6bbabb 0x00000000006b9787: callq 0x6b9750 0x00000000006b978c: xor %eax,%eax 0x00000000006b978e: retq 0x00000000006b978f: add %al,0x57(%rcx) 0x00000000006b9792: push %r14 0x00000000006b9794: push %r13 0x00000000006b9796: push %r12 0x00000000006b9798: push %rbp 0x00000000006b9799: push %rbx It is really stuck, it is a jump to itself. :) The workaround is to use "modprobe kvm_intel emulate_invalid_guest_state=0". I can reproduce it too and will take a look tomorrow. This could be the first mis-emulated instruction: 0x00000000006bade4: cmp $0x1f,%bpl 0x00000000006bade8: je 0x6bae11 At this point TCG goes to 0x6bae11, KVM falls through to 0x6badea. *** Bug 968135 has been marked as a duplicate of this bug. *** *** Bug 968135 has been marked as a duplicate of this bug. *** I'm concerned that this bug hasn't gotten any support for a while. Specifically, the newer kernels have hit fedora 18 this bug seems to be there as well. I don't get any emulation error in the log, RHEL 5 just hangs after loading the kernel and initramfs. (In reply to David Brown from comment #9) > I'm concerned that this bug hasn't gotten any support for a while. > Specifically, the newer kernels have hit fedora 18 this bug seems to be > there as well. Bah, forgot which box I was looking at that one was fedora 19 as well. > I don't get any emulation error in the log, RHEL 5 just hangs after loading > the kernel and initramfs. However, this still is the case, not sure why its just hanging. Sorry for not updating the BZ. The fix is in 3.9.5. However, even with the fix RHEL5.9 will take a while to load and seem to hang (3-4 minutes is not unexpected). There is no fix yet and it's not easy, using emulate_invalid_guest_state=0 however works. *** Bug 971379 has been marked as a duplicate of this bug. *** *** Bug 965711 has been marked as a duplicate of this bug. *** *** Bug 967773 has been marked as a duplicate of this bug. *** As written in bug 965711 this also affects kernels in other distributions (specifically, Debian) and seems to indeed be a kernel bug. Mentioning here so we have a single place to track this. Debian 3.9.6-1 is affected, so “the fix is in 3.9.5” is untrue at least for vanilla kernels :( Nevertheless thank you for looking after this. I(In reply to Paolo Bonzini from comment #11) > Sorry for not updating the BZ. The fix is in 3.9.5. > > However, even with the fix RHEL5.9 will take a while to load and seem to > hang (3-4 minutes is not unexpected). There is no fix yet and it's not > easy, using emulate_invalid_guest_state=0 however works. I waited for about 25 minutes but to no avail so the "fix" seems not to work for me. The workaround is fine though. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. This bug still seems to be present in 3.11.1. The simplest way seems to be to try and install a RHEL 5.9x64 from scratch. On 3.11.1-200.fc19.x86_64 just launching (where vmlinuz and initrd.img are taken from RHEL 5.9): qemu-system-x86_64 -kernel vmlinuz -initrd initrd.img --enable-kvm -monitor stdio -no-shutdow There seems to be no progress even after some time so. On the qemu monitor I see: (qemu) x/40i $pc 0x00000000006b9774: jmp 0x6b9774 0x00000000006b9776: add %al,(%rax) 0x00000000006b9778: add %al,(%rax) 0x00000000006b977a: add %al,(%rax) 0x00000000006b977c: add %al,(%rax) 0x00000000006b977e: add %al,(%rax) 0x00000000006b9780: lea 0x2334(%rip),%rdi # 0x6bbabb 0x00000000006b9787: callq 0x6b9750 0x00000000006b978c: xor %eax,%eax 0x00000000006b978e: retq 0x00000000006b978f: add %al,0x57(%rcx) 0x00000000006b9792: push %r14 0x00000000006b9794: push %r13 0x00000000006b9796: push %r12 0x00000000006b9798: push %rbp 0x00000000006b9799: push %rbx 0x00000000006b979a: sub $0x58,%rsp 0x00000000006b979e: mov 0x67e4(%rip),%eax # 0x6bff88 0x00000000006b97a4: mov 0x67cd(%rip),%r8 # 0x6bff78 0x00000000006b97ab: mov %edx,0x4(%rsp) 0x00000000006b97af: mov %ecx,(%rsp) 0x00000000006b97b2: mov 0x700f(%rip),%r12 # 0x6c07c8 0x00000000006b97b9: mov 0x7011(%rip),%ebx # 0x6c07d0 0x00000000006b97bf: mov %eax,0x3c(%rsp) 0x00000000006b97c3: movslq %edx,%rax 0x00000000006b97c6: lea 0x2193(%rip),%rdx # 0x6bb960 0x00000000006b97cd: mov 0x67b1(%rip),%ebp # 0x6bff84 0x00000000006b97d3: mov 0x67a6(%rip),%r14d # 0x6bff80 0x00000000006b97da: movzwl (%rdx,%rax,2),%eax 0x00000000006b97de: mov %rdi,0x10(%rsp) 0x00000000006b97e3: mov %rsi,0x8(%rsp) 0x00000000006b97e8: mov %r8,0x28(%rsp) 0x00000000006b97ed: mov %eax,0x48(%rsp) 0x00000000006b97f1: movslq %ecx,%rax 0x00000000006b97f4: mov 0x6775(%rip),%rcx # 0x6bff70 0x00000000006b97fb: movzwl (%rdx,%rax,2),%eax 0x00000000006b97ff: mov %rcx,0x18(%rsp) 0x00000000006b9804: mov %eax,0x4c(%rsp) 0x00000000006b9808: jmp 0x6b9830 0x00000000006b980a: nopw 0x0(%rax,%rax,1) First instruction won't get us very far :) Looks like a third bug. Created attachment 818019 [details]
execution log (on a good VM) include the point where breakage happens
I saved the VM (using "migrate") after it has finished computing the CRC of the image. The CRC matches one generated on an unrestricted_guest machine (Westmere or newer) and the one generated with emulate_invalid_guest_state=0.
I then modified the saved VM to fix the GS selector and base and restarted the VM. Linux boots successfully. Thus the mis-emulated instruction is contained in this relatively small log.
0x00000000006bb6bb: shl $0x8,%rdx 0x00000000006bb6bf: or %rax,%rdx 0x00000000006bb6c2: movzbl %bpl,%eax 0x00000000006bb6c6: shl $0x10,%rax 0x00000000006bb6ca: or %rax,%rdx 0x00000000006bb6cd: movzbl %r12b,%eax 0x00000000006bb6d1: shl $0x18,%rax 0x00000000006bb6d5: or %rax,%rdx 0x00000000006bb6d8: cmp %rdx,%rcx 0x00000000006bb6db: je 0x6bb705 The instruction at 0x6bb6c2 (40 0f b6 c5) is mis-emulated (gdb) p/x $rcx $16 = 0x69b87821 (gdb) p/x $rdx $17 = 0x69787821 While it "looks like" it is being transformed into a no-op, a more likely possibility is movzbl %ch,%eax (0f b6 c5). For the Fedora kernel folks and for people in CC: to this issue, Paolo has posted the patch fixing this on LKML and stable is in CC: so it will eventually trickle down to a newer kernel. Currently it is in the kvm tree in the next branch: commit daf727225b8abfdfe424716abac3d15a3ac5626a Author: Paolo Bonzini <pbonzini> Date: Thu Oct 31 23:05:24 2013 +0100 KVM: x86: fix emulation of "movzbl %bpl, %eax" When I was looking at RHEL5.9's failure to start with unrestricted_guest=0/emulate_invalid_guest_state=1, I got it working with a slightly older tree than kvm.git. I now debugged the remaining failure, which was introduced by commit 660696d1 (KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions, 2013-04-24) introduced a similar mis-emulation to the one in commit 8acb4207 (KVM: fix sil/dil/bpl/spl in the mod/rm fields, 2013-05-30). The incorrect decoding occurs in 8-bit movzx/movsx instructions whose 8-bit operand is sil/dil/bpl/spl. Needless to say, "movzbl %bpl, %eax" does occur in RHEL5.9's decompression prolog, just a handful of instructions before finally giving control to the decompressed vmlinux and getting out of the invalid guest state. Because OpMem8 bypasses decode_modrm, the same handling of the REX prefix must be applied to OpMem8. Reported-by: Michele Baldessari <michele> Cc: stable.org Cc: Gleb Natapov <gleb> Signed-off-by: Paolo Bonzini <pbonzini> Signed-off-by: Gleb Natapov <gleb> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 16c037e..282d28c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4117,7 +4117,10 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op, case OpMem8: ctxt->memop.bytes = 1; if (ctxt->memop.type == OP_REG) { - ctxt->memop.addr.reg = decode_register(ctxt, ctxt->modrm_rm, 1); + int highbyte_regs = ctxt->rex_prefix == 0; + + ctxt->memop.addr.reg = decode_register(ctxt, ctxt->modrm_rm, + highbyte_regs); fetch_register_operand(&ctxt->memop); } goto mem_common; *** Bug 1012119 has been marked as a duplicate of this bug. *** Added the patch to all branches on 3.12 or older. Thanks Paolo and Michele. kernel-3.11.9-300.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.11.9-300.fc20 kernel-3.11.9-200.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.11.9-200.fc19 kernel-3.11.9-100.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.11.9-100.fc18 Package kernel-3.11.9-100.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.11.9-100.fc18' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-21822/kernel-3.11.9-100.fc18 then log in and leave karma (feedback). kernel-3.11.9-200.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. kernel-3.11.9-300.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report. The problem still persists with 3.11.9-200.fc19.x86_64. 3.11.9-200.fc19.x86_64 fixed the problem for me. With 3.11.9-200, boot will remain stuck for 2-3 minutes and then proceed at normal speed. kernel-3.11.9-100.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. *** Bug 1013641 has been marked as a duplicate of this bug. *** |