Description of problem: I'm on F19 as current as today (plus http://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/$basearch for a nondebug kernel). After starting an installation in qemu, the VM pauses almost right after pressing enter on the syslinux screen of the dvd. Version-Release number of selected component (if applicable): kernel-3.10.0-0.rc1.git7.2.fc20.x86_64 qemu-1.4.2-1.fc19.x86_64 How reproducible: Steps to Reproduce: 1. Create a vm with virt-manager and connect a 5.9rhel x64 iso 2. Start the installation 3. Actual results: 2013-05-27 19:33:03.483+0000: starting up LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name rhel57x64 -S -machine pc-i440fx-1.4,accel=kvm,usb=off -m 8096 -smp 4,sockets=4,cores=1,threads=1 -uuid 48e2cc85-46ce-1b43-893b-1e8fa2200589 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel57x64.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=dc,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/rhel57x64.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/data/software/linux/rh/rhel/5.9/RHEL5.9-Server-20121129.0-x86_64-DVD.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:95:4c:54,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 char device redirected to /dev/pts/13 (label charserial0) KVM internal error. Suberror: 1 emulation failure RAX=000000000001f800 RBX=00000000004b7000 RCX=0000000000000000 RDX=0000000000000800 RSI=000000000001f800 RDI=00000000006bffa0 RBP=0000000000200000 RSP=00000000006bfee0 R8 =0000000000000000 R9 =00000000004b7000 R10=0000000000000000 R11=0000000000000000 R12=0000000000010000 R13=00000000006c7000 R14=0000000000202005 R15=0000000000000000 RIP=00000000006b9319 RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 00000000 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0000 0000000000000000 00000000 00000000 DS =0000 0000000000000000 00000000 00000000 FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA] GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA] LDT=0000 0000000000000000 ffffffff 00c00000 TR =0008 0000000000000580 00000067 00008b00 DPL=0 TSS64-busy GDT= 0000000000304f38 00000020 IDT= 0000000000000000 00000000 CR0=80000011 CR2=0000000000000000 CR3=00000000006c1000 CR4=00000020 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000500 Code=00 00 e9 50 ff ff ff 00 00 00 00 85 d2 74 20 45 31 c0 31 c9 <0f> 1f 80 00 00 00 00 0f b6 04 31 41 83 c0 01 88 04 39 48 83 c1 01 41 39 d0 75 ec 48 89 f8 Expected results: No stopping ;) Additional info: Feel free to close this one out if the combo "current F19 + rawhide nodebug kernel" is uninteresting.
I can reproduce this on 3.10.0-0.rc3.git0.2.fc20.x86_64 and also with qemu-1.5.0-3 taken from rawhide and recompiled on current F19
Thanks, I am reassigning to kernel since the QEMU version seems not to matter. Reproduced by Michele with: $ qemu-system-x86_64 -kernel vmlinuz -initrd initrd.img --enable-kvm -monitor stdio -no-shutdown Where vmlinuz and initrd come from the RHEL9 x64 PXE images. (qemu) x/16i $pc 0x00000000006b9319: nopl 0x0(%rax) 0x00000000006b9320: movzbl (%rcx,%rsi,1),%eax 0x00000000006b9324: add $0x1,%r8d 0x00000000006b9328: mov %al,(%rcx,%rdi,1)
Created attachment 754434 [details] patch under test
So the patch does remove the crash as reported in the initial report, but things seem to be stuck without any other messages. I tried current linux git + patch c#3 with the following command: qemu-system-x86_64 -monitor stdio -no-shutdown -drive file=rhel57x64.img -drive file=/data/software/linux/rh/rhel/5.9/RHEL5.9-Server-20121129.0-x86_64-DVD.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 The above works properly and installation continues. If I add '--enable-kvm' it hangs right after displaying vmlinuz... and Loading initramfs.... If it's relevant, when it is stuck: QEMU 1.5.0 monitor - type 'help' for more information (qemu) x/16 $pc 00000000006b9774: 0x0000feeb 0x00000000 0x00000000 0x343d8d48 00000000006b9784: 0xe8000023 0xffffffc4 0x00c3c031 0x56415741 00000000006b9794: 0x54415541 0x83485355 0x058b58ec 0x000067e4 00000000006b97a4: 0xcd058b4c 0x89000067 0x89042454 0x8b4c240c (qemu) x/16i $pc 0x00000000006b9774: jmp 0x6b9774 0x00000000006b9776: add %al,(%rax) 0x00000000006b9778: add %al,(%rax) 0x00000000006b977a: add %al,(%rax) 0x00000000006b977c: add %al,(%rax) 0x00000000006b977e: add %al,(%rax) 0x00000000006b9780: lea 0x2334(%rip),%rdi # 0x6bbabb 0x00000000006b9787: callq 0x6b9750 0x00000000006b978c: xor %eax,%eax 0x00000000006b978e: retq 0x00000000006b978f: add %al,0x57(%rcx) 0x00000000006b9792: push %r14 0x00000000006b9794: push %r13 0x00000000006b9796: push %r12 0x00000000006b9798: push %rbp 0x00000000006b9799: push %rbx
It is really stuck, it is a jump to itself. :) The workaround is to use "modprobe kvm_intel emulate_invalid_guest_state=0". I can reproduce it too and will take a look tomorrow.
This could be the first mis-emulated instruction: 0x00000000006bade4: cmp $0x1f,%bpl 0x00000000006bade8: je 0x6bae11 At this point TCG goes to 0x6bae11, KVM falls through to 0x6badea.
*** Bug 968135 has been marked as a duplicate of this bug. ***
I'm concerned that this bug hasn't gotten any support for a while. Specifically, the newer kernels have hit fedora 18 this bug seems to be there as well. I don't get any emulation error in the log, RHEL 5 just hangs after loading the kernel and initramfs.
(In reply to David Brown from comment #9) > I'm concerned that this bug hasn't gotten any support for a while. > Specifically, the newer kernels have hit fedora 18 this bug seems to be > there as well. Bah, forgot which box I was looking at that one was fedora 19 as well. > I don't get any emulation error in the log, RHEL 5 just hangs after loading > the kernel and initramfs. However, this still is the case, not sure why its just hanging.
Sorry for not updating the BZ. The fix is in 3.9.5. However, even with the fix RHEL5.9 will take a while to load and seem to hang (3-4 minutes is not unexpected). There is no fix yet and it's not easy, using emulate_invalid_guest_state=0 however works.
*** Bug 971379 has been marked as a duplicate of this bug. ***
*** Bug 965711 has been marked as a duplicate of this bug. ***
*** Bug 967773 has been marked as a duplicate of this bug. ***
As written in bug 965711 this also affects kernels in other distributions (specifically, Debian) and seems to indeed be a kernel bug. Mentioning here so we have a single place to track this. Debian 3.9.6-1 is affected, so “the fix is in 3.9.5” is untrue at least for vanilla kernels :( Nevertheless thank you for looking after this.
I(In reply to Paolo Bonzini from comment #11) > Sorry for not updating the BZ. The fix is in 3.9.5. > > However, even with the fix RHEL5.9 will take a while to load and seem to > hang (3-4 minutes is not unexpected). There is no fix yet and it's not > easy, using emulate_invalid_guest_state=0 however works. I waited for about 25 minutes but to no avail so the "fix" seems not to work for me. The workaround is fine though.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
This bug still seems to be present in 3.11.1. The simplest way seems to be to try and install a RHEL 5.9x64 from scratch. On 3.11.1-200.fc19.x86_64 just launching (where vmlinuz and initrd.img are taken from RHEL 5.9): qemu-system-x86_64 -kernel vmlinuz -initrd initrd.img --enable-kvm -monitor stdio -no-shutdow There seems to be no progress even after some time so. On the qemu monitor I see: (qemu) x/40i $pc 0x00000000006b9774: jmp 0x6b9774 0x00000000006b9776: add %al,(%rax) 0x00000000006b9778: add %al,(%rax) 0x00000000006b977a: add %al,(%rax) 0x00000000006b977c: add %al,(%rax) 0x00000000006b977e: add %al,(%rax) 0x00000000006b9780: lea 0x2334(%rip),%rdi # 0x6bbabb 0x00000000006b9787: callq 0x6b9750 0x00000000006b978c: xor %eax,%eax 0x00000000006b978e: retq 0x00000000006b978f: add %al,0x57(%rcx) 0x00000000006b9792: push %r14 0x00000000006b9794: push %r13 0x00000000006b9796: push %r12 0x00000000006b9798: push %rbp 0x00000000006b9799: push %rbx 0x00000000006b979a: sub $0x58,%rsp 0x00000000006b979e: mov 0x67e4(%rip),%eax # 0x6bff88 0x00000000006b97a4: mov 0x67cd(%rip),%r8 # 0x6bff78 0x00000000006b97ab: mov %edx,0x4(%rsp) 0x00000000006b97af: mov %ecx,(%rsp) 0x00000000006b97b2: mov 0x700f(%rip),%r12 # 0x6c07c8 0x00000000006b97b9: mov 0x7011(%rip),%ebx # 0x6c07d0 0x00000000006b97bf: mov %eax,0x3c(%rsp) 0x00000000006b97c3: movslq %edx,%rax 0x00000000006b97c6: lea 0x2193(%rip),%rdx # 0x6bb960 0x00000000006b97cd: mov 0x67b1(%rip),%ebp # 0x6bff84 0x00000000006b97d3: mov 0x67a6(%rip),%r14d # 0x6bff80 0x00000000006b97da: movzwl (%rdx,%rax,2),%eax 0x00000000006b97de: mov %rdi,0x10(%rsp) 0x00000000006b97e3: mov %rsi,0x8(%rsp) 0x00000000006b97e8: mov %r8,0x28(%rsp) 0x00000000006b97ed: mov %eax,0x48(%rsp) 0x00000000006b97f1: movslq %ecx,%rax 0x00000000006b97f4: mov 0x6775(%rip),%rcx # 0x6bff70 0x00000000006b97fb: movzwl (%rdx,%rax,2),%eax 0x00000000006b97ff: mov %rcx,0x18(%rsp) 0x00000000006b9804: mov %eax,0x4c(%rsp) 0x00000000006b9808: jmp 0x6b9830 0x00000000006b980a: nopw 0x0(%rax,%rax,1) First instruction won't get us very far :)
Looks like a third bug.
Created attachment 818019 [details] execution log (on a good VM) include the point where breakage happens I saved the VM (using "migrate") after it has finished computing the CRC of the image. The CRC matches one generated on an unrestricted_guest machine (Westmere or newer) and the one generated with emulate_invalid_guest_state=0. I then modified the saved VM to fix the GS selector and base and restarted the VM. Linux boots successfully. Thus the mis-emulated instruction is contained in this relatively small log.
0x00000000006bb6bb: shl $0x8,%rdx 0x00000000006bb6bf: or %rax,%rdx 0x00000000006bb6c2: movzbl %bpl,%eax 0x00000000006bb6c6: shl $0x10,%rax 0x00000000006bb6ca: or %rax,%rdx 0x00000000006bb6cd: movzbl %r12b,%eax 0x00000000006bb6d1: shl $0x18,%rax 0x00000000006bb6d5: or %rax,%rdx 0x00000000006bb6d8: cmp %rdx,%rcx 0x00000000006bb6db: je 0x6bb705 The instruction at 0x6bb6c2 (40 0f b6 c5) is mis-emulated (gdb) p/x $rcx $16 = 0x69b87821 (gdb) p/x $rdx $17 = 0x69787821 While it "looks like" it is being transformed into a no-op, a more likely possibility is movzbl %ch,%eax (0f b6 c5).
For the Fedora kernel folks and for people in CC: to this issue, Paolo has posted the patch fixing this on LKML and stable is in CC: so it will eventually trickle down to a newer kernel. Currently it is in the kvm tree in the next branch: commit daf727225b8abfdfe424716abac3d15a3ac5626a Author: Paolo Bonzini <pbonzini> Date: Thu Oct 31 23:05:24 2013 +0100 KVM: x86: fix emulation of "movzbl %bpl, %eax" When I was looking at RHEL5.9's failure to start with unrestricted_guest=0/emulate_invalid_guest_state=1, I got it working with a slightly older tree than kvm.git. I now debugged the remaining failure, which was introduced by commit 660696d1 (KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions, 2013-04-24) introduced a similar mis-emulation to the one in commit 8acb4207 (KVM: fix sil/dil/bpl/spl in the mod/rm fields, 2013-05-30). The incorrect decoding occurs in 8-bit movzx/movsx instructions whose 8-bit operand is sil/dil/bpl/spl. Needless to say, "movzbl %bpl, %eax" does occur in RHEL5.9's decompression prolog, just a handful of instructions before finally giving control to the decompressed vmlinux and getting out of the invalid guest state. Because OpMem8 bypasses decode_modrm, the same handling of the REX prefix must be applied to OpMem8. Reported-by: Michele Baldessari <michele> Cc: stable.org Cc: Gleb Natapov <gleb> Signed-off-by: Paolo Bonzini <pbonzini> Signed-off-by: Gleb Natapov <gleb> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 16c037e..282d28c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4117,7 +4117,10 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op, case OpMem8: ctxt->memop.bytes = 1; if (ctxt->memop.type == OP_REG) { - ctxt->memop.addr.reg = decode_register(ctxt, ctxt->modrm_rm, 1); + int highbyte_regs = ctxt->rex_prefix == 0; + + ctxt->memop.addr.reg = decode_register(ctxt, ctxt->modrm_rm, + highbyte_regs); fetch_register_operand(&ctxt->memop); } goto mem_common;
*** Bug 1012119 has been marked as a duplicate of this bug. ***
Added the patch to all branches on 3.12 or older. Thanks Paolo and Michele.
kernel-3.11.9-300.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.11.9-300.fc20
kernel-3.11.9-200.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.11.9-200.fc19
kernel-3.11.9-100.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.11.9-100.fc18
Package kernel-3.11.9-100.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.11.9-100.fc18' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-21822/kernel-3.11.9-100.fc18 then log in and leave karma (feedback).
kernel-3.11.9-200.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.11.9-300.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
The problem still persists with 3.11.9-200.fc19.x86_64.
3.11.9-200.fc19.x86_64 fixed the problem for me.
With 3.11.9-200, boot will remain stuck for 2-3 minutes and then proceed at normal speed.
kernel-3.11.9-100.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 1013641 has been marked as a duplicate of this bug. ***