Bug 967652 - rhel 5.9x64 fails to install via virt-manager/qemu
Summary: rhel 5.9x64 fails to install via virt-manager/qemu
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: All
OS: All
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 965711 967773 968135 971379 1012119 1013641 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-27 19:42 UTC by Michele Baldessari
Modified: 2013-11-29 11:24 UTC (History)
34 users (show)

Fixed In Version: kernel-3.11.9-100.fc18
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 969244 (view as bug list)
Environment:
Last Closed: 2013-11-27 12:18:51 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
patch under test (915 bytes, patch)
2013-05-29 15:09 UTC, Paolo Bonzini
no flags Details | Diff
execution log (on a good VM) include the point where breakage happens (7.64 KB, text/x-log)
2013-10-31 18:20 UTC, Paolo Bonzini
no flags Details

Description Michele Baldessari 2013-05-27 19:42:28 UTC
Description of problem:
I'm on F19 as current as today (plus http://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/$basearch for a nondebug kernel).
After starting an installation in qemu, the VM pauses almost right after pressing enter on the syslinux screen of the dvd.

Version-Release number of selected component (if applicable):
kernel-3.10.0-0.rc1.git7.2.fc20.x86_64
qemu-1.4.2-1.fc19.x86_64


How reproducible:


Steps to Reproduce:
1. Create a vm with virt-manager and connect a 5.9rhel x64 iso
2. Start the installation
3.

Actual results:
2013-05-27 19:33:03.483+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name rhel57x64 -S -machine pc-i440fx-1.4,accel=kvm,usb=off -m 8096 -smp 4,sockets=4,cores=1,threads=1 -uuid 48e2cc85-46ce-1b43-893b-1e8fa2200589 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel57x64.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=dc,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/rhel57x64.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/data/software/linux/rh/rhel/5.9/RHEL5.9-Server-20121129.0-x86_64-DVD.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:95:4c:54,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
char device redirected to /dev/pts/13 (label charserial0)
KVM internal error. Suberror: 1
emulation failure
RAX=000000000001f800 RBX=00000000004b7000 RCX=0000000000000000 RDX=0000000000000800
RSI=000000000001f800 RDI=00000000006bffa0 RBP=0000000000200000 RSP=00000000006bfee0
R8 =0000000000000000 R9 =00000000004b7000 R10=0000000000000000 R11=0000000000000000
R12=0000000000010000 R13=00000000006c7000 R14=0000000000202005 R15=0000000000000000
RIP=00000000006b9319 RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 00000000 00000000
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA]
GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA]
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0008 0000000000000580 00000067 00008b00 DPL=0 TSS64-busy
GDT=     0000000000304f38 00000020
IDT=     0000000000000000 00000000
CR0=80000011 CR2=0000000000000000 CR3=00000000006c1000 CR4=00000020
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
Code=00 00 e9 50 ff ff ff 00 00 00 00 85 d2 74 20 45 31 c0 31 c9 <0f> 1f 80 00 00 00 00 0f b6 04 31 41 83 c0 01 88 04 39 48 83 c1 01 41 39 d0 75 ec 48 89 f8

Expected results:
No stopping ;)

Additional info:
Feel free to close this one out if the combo "current F19 + rawhide nodebug kernel" is uninteresting.

Comment 1 Michele Baldessari 2013-05-29 13:47:51 UTC
I can reproduce this on 3.10.0-0.rc3.git0.2.fc20.x86_64 and also with qemu-1.5.0-3 taken from rawhide and recompiled on current F19

Comment 2 Paolo Bonzini 2013-05-29 14:58:10 UTC
Thanks, I am reassigning to kernel since the QEMU version seems not to matter.

Reproduced by Michele with:
$ qemu-system-x86_64 -kernel vmlinuz -initrd initrd.img  --enable-kvm -monitor stdio -no-shutdown

Where vmlinuz and initrd come from the RHEL9 x64 PXE images.

(qemu) x/16i $pc
0x00000000006b9319:  nopl   0x0(%rax)
0x00000000006b9320:  movzbl (%rcx,%rsi,1),%eax
0x00000000006b9324:  add    $0x1,%r8d
0x00000000006b9328:  mov    %al,(%rcx,%rdi,1)

Comment 3 Paolo Bonzini 2013-05-29 15:09:31 UTC
Created attachment 754434 [details]
patch under test

Comment 4 Michele Baldessari 2013-05-29 20:08:32 UTC
So the patch does remove the crash as reported in the initial report, but things seem to be stuck without any other messages.

I tried current linux git + patch c#3 with the following command:
qemu-system-x86_64   -monitor stdio -no-shutdown -drive file=rhel57x64.img  -drive file=/data/software/linux/rh/rhel/5.9/RHEL5.9-Server-20121129.0-x86_64-DVD.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw   -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0

The above works properly and installation continues. If I add '--enable-kvm' it hangs right after displaying vmlinuz... and Loading initramfs....

If it's relevant, when it is stuck:
QEMU 1.5.0 monitor - type 'help' for more information
(qemu) x/16 $pc
00000000006b9774: 0x0000feeb 0x00000000 0x00000000 0x343d8d48
00000000006b9784: 0xe8000023 0xffffffc4 0x00c3c031 0x56415741
00000000006b9794: 0x54415541 0x83485355 0x058b58ec 0x000067e4
00000000006b97a4: 0xcd058b4c 0x89000067 0x89042454 0x8b4c240c
(qemu) x/16i $pc
0x00000000006b9774:  jmp    0x6b9774
0x00000000006b9776:  add    %al,(%rax)
0x00000000006b9778:  add    %al,(%rax)
0x00000000006b977a:  add    %al,(%rax)
0x00000000006b977c:  add    %al,(%rax)
0x00000000006b977e:  add    %al,(%rax)
0x00000000006b9780:  lea    0x2334(%rip),%rdi        # 0x6bbabb
0x00000000006b9787:  callq  0x6b9750
0x00000000006b978c:  xor    %eax,%eax
0x00000000006b978e:  retq   
0x00000000006b978f:  add    %al,0x57(%rcx)
0x00000000006b9792:  push   %r14
0x00000000006b9794:  push   %r13
0x00000000006b9796:  push   %r12
0x00000000006b9798:  push   %rbp
0x00000000006b9799:  push   %rbx

Comment 5 Paolo Bonzini 2013-05-29 21:09:48 UTC
It is really stuck, it is a jump to itself. :)  The workaround is to use "modprobe kvm_intel emulate_invalid_guest_state=0".  I can reproduce it too and will take a look tomorrow.

Comment 6 Paolo Bonzini 2013-05-30 14:07:29 UTC
This could be the first mis-emulated instruction:

0x00000000006bade4:  cmp    $0x1f,%bpl
0x00000000006bade8:  je     0x6bae11

At this point TCG goes to 0x6bae11, KVM falls through to 0x6badea.

Comment 7 Andrew Jones 2013-05-30 14:57:13 UTC
*** Bug 968135 has been marked as a duplicate of this bug. ***

Comment 8 Andrew Jones 2013-05-31 06:39:09 UTC
*** Bug 968135 has been marked as a duplicate of this bug. ***

Comment 9 David Brown 2013-06-21 16:02:46 UTC
I'm concerned that this bug hasn't gotten any support for a while. Specifically, the newer kernels have hit fedora 18 this bug seems to be there as well.

I don't get any emulation error in the log, RHEL 5 just hangs after loading the kernel and initramfs.

Comment 10 David Brown 2013-06-21 16:07:46 UTC
(In reply to David Brown from comment #9)
> I'm concerned that this bug hasn't gotten any support for a while.
> Specifically, the newer kernels have hit fedora 18 this bug seems to be
> there as well.

Bah, forgot which box I was looking at that one was fedora 19 as well.

> I don't get any emulation error in the log, RHEL 5 just hangs after loading
> the kernel and initramfs.

However, this still is the case, not sure why its just hanging.

Comment 11 Paolo Bonzini 2013-06-26 10:07:44 UTC
Sorry for not updating the BZ.  The fix is in 3.9.5.

However, even with the fix RHEL5.9 will take a while to load and seem to hang (3-4 minutes is not unexpected).  There is no fix yet and it's not easy, using emulate_invalid_guest_state=0 however works.

Comment 12 Cole Robinson 2013-07-11 21:11:25 UTC
*** Bug 971379 has been marked as a duplicate of this bug. ***

Comment 13 Cole Robinson 2013-07-11 21:12:00 UTC
*** Bug 965711 has been marked as a duplicate of this bug. ***

Comment 14 Cole Robinson 2013-07-11 21:12:31 UTC
*** Bug 967773 has been marked as a duplicate of this bug. ***

Comment 15 Thorsten Glaser 2013-07-16 08:40:48 UTC
As written in bug 965711 this also affects kernels in other distributions (specifically, Debian) and seems to indeed be a kernel bug. Mentioning here so we have a single place to track this.

Debian 3.9.6-1 is affected, so “the fix is in 3.9.5” is untrue at least for vanilla kernels :(

Nevertheless thank you for looking after this.

Comment 16 Felix Schwarz 2013-07-24 20:32:38 UTC
I(In reply to Paolo Bonzini from comment #11)
> Sorry for not updating the BZ.  The fix is in 3.9.5.
> 
> However, even with the fix RHEL5.9 will take a while to load and seem to
> hang (3-4 minutes is not unexpected).  There is no fix yet and it's not
> easy, using emulate_invalid_guest_state=0 however works.

I waited for about 25 minutes but to no avail so the "fix" seems not to work for me. The workaround is fine though.

Comment 17 Josh Boyer 2013-09-18 20:24:42 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 18 Michele Baldessari 2013-09-20 21:25:22 UTC
This bug still seems to be present in 3.11.1. The simplest way seems to be to try
and install a RHEL 5.9x64 from scratch.

On 3.11.1-200.fc19.x86_64 just launching (where vmlinuz and initrd.img are taken from RHEL 5.9):
qemu-system-x86_64 -kernel vmlinuz -initrd initrd.img  --enable-kvm -monitor stdio -no-shutdow

There seems to be no progress even after some time so. On the qemu monitor I see:
(qemu) x/40i $pc
0x00000000006b9774:  jmp    0x6b9774
0x00000000006b9776:  add    %al,(%rax)
0x00000000006b9778:  add    %al,(%rax)
0x00000000006b977a:  add    %al,(%rax)
0x00000000006b977c:  add    %al,(%rax)
0x00000000006b977e:  add    %al,(%rax)
0x00000000006b9780:  lea    0x2334(%rip),%rdi        # 0x6bbabb
0x00000000006b9787:  callq  0x6b9750
0x00000000006b978c:  xor    %eax,%eax
0x00000000006b978e:  retq   
0x00000000006b978f:  add    %al,0x57(%rcx)
0x00000000006b9792:  push   %r14
0x00000000006b9794:  push   %r13
0x00000000006b9796:  push   %r12
0x00000000006b9798:  push   %rbp
0x00000000006b9799:  push   %rbx
0x00000000006b979a:  sub    $0x58,%rsp
0x00000000006b979e:  mov    0x67e4(%rip),%eax        # 0x6bff88
0x00000000006b97a4:  mov    0x67cd(%rip),%r8        # 0x6bff78
0x00000000006b97ab:  mov    %edx,0x4(%rsp)
0x00000000006b97af:  mov    %ecx,(%rsp)
0x00000000006b97b2:  mov    0x700f(%rip),%r12        # 0x6c07c8
0x00000000006b97b9:  mov    0x7011(%rip),%ebx        # 0x6c07d0
0x00000000006b97bf:  mov    %eax,0x3c(%rsp)
0x00000000006b97c3:  movslq %edx,%rax
0x00000000006b97c6:  lea    0x2193(%rip),%rdx        # 0x6bb960
0x00000000006b97cd:  mov    0x67b1(%rip),%ebp        # 0x6bff84
0x00000000006b97d3:  mov    0x67a6(%rip),%r14d        # 0x6bff80
0x00000000006b97da:  movzwl (%rdx,%rax,2),%eax
0x00000000006b97de:  mov    %rdi,0x10(%rsp)
0x00000000006b97e3:  mov    %rsi,0x8(%rsp)
0x00000000006b97e8:  mov    %r8,0x28(%rsp)
0x00000000006b97ed:  mov    %eax,0x48(%rsp)
0x00000000006b97f1:  movslq %ecx,%rax
0x00000000006b97f4:  mov    0x6775(%rip),%rcx        # 0x6bff70
0x00000000006b97fb:  movzwl (%rdx,%rax,2),%eax
0x00000000006b97ff:  mov    %rcx,0x18(%rsp)
0x00000000006b9804:  mov    %eax,0x4c(%rsp)
0x00000000006b9808:  jmp    0x6b9830
0x00000000006b980a:  nopw   0x0(%rax,%rax,1)

First instruction won't get us very far :)

Comment 19 Paolo Bonzini 2013-10-31 17:15:32 UTC
Looks like a third bug.

Comment 20 Paolo Bonzini 2013-10-31 18:20:02 UTC
Created attachment 818019 [details]
execution log (on a good VM) include the point where breakage happens

I saved the VM (using "migrate") after it has finished computing the CRC of the image.  The CRC matches one generated on an unrestricted_guest machine (Westmere or newer) and the one generated with emulate_invalid_guest_state=0.

I then modified the saved VM to fix the GS selector and base and restarted the VM.  Linux boots successfully.  Thus the mis-emulated instruction is contained in this relatively small log.

Comment 21 Paolo Bonzini 2013-10-31 19:52:30 UTC
0x00000000006bb6bb:  shl    $0x8,%rdx
0x00000000006bb6bf:  or     %rax,%rdx
0x00000000006bb6c2:  movzbl %bpl,%eax
0x00000000006bb6c6:  shl    $0x10,%rax
0x00000000006bb6ca:  or     %rax,%rdx
0x00000000006bb6cd:  movzbl %r12b,%eax
0x00000000006bb6d1:  shl    $0x18,%rax
0x00000000006bb6d5:  or     %rax,%rdx
0x00000000006bb6d8:  cmp    %rdx,%rcx
0x00000000006bb6db:  je     0x6bb705

The instruction at 0x6bb6c2 (40 0f b6 c5) is mis-emulated

(gdb) p/x $rcx
$16 = 0x69b87821
(gdb) p/x $rdx
$17 = 0x69787821

While it "looks like" it is being transformed into a no-op, a more likely possibility is movzbl %ch,%eax (0f b6 c5).

Comment 22 Michele Baldessari 2013-11-11 21:28:32 UTC
For the Fedora kernel folks and for people in CC: to this issue, Paolo has
posted the patch fixing this on LKML and stable is in CC: so it will
eventually trickle down to a newer kernel. 

Currently it is in the kvm tree in the next branch:
commit daf727225b8abfdfe424716abac3d15a3ac5626a
Author: Paolo Bonzini <pbonzini>
Date:   Thu Oct 31 23:05:24 2013 +0100

    KVM: x86: fix emulation of "movzbl %bpl, %eax"
    
    When I was looking at RHEL5.9's failure to start with
    unrestricted_guest=0/emulate_invalid_guest_state=1, I got it working with a
    slightly older tree than kvm.git.  I now debugged the remaining failure,
    which was introduced by commit 660696d1 (KVM: X86 emulator: fix
    source operand decoding for 8bit mov[zs]x instructions, 2013-04-24)
    introduced a similar mis-emulation to the one in commit 8acb4207 (KVM:
    fix sil/dil/bpl/spl in the mod/rm fields, 2013-05-30).  The incorrect
    decoding occurs in 8-bit movzx/movsx instructions whose 8-bit operand
    is sil/dil/bpl/spl.
    
    Needless to say, "movzbl %bpl, %eax" does occur in RHEL5.9's decompression
    prolog, just a handful of instructions before finally giving control to
    the decompressed vmlinux and getting out of the invalid guest state.
    
    Because OpMem8 bypasses decode_modrm, the same handling of the REX prefix
    must be applied to OpMem8.
    
    Reported-by: Michele Baldessari <michele>
    Cc: stable.org
    Cc: Gleb Natapov <gleb>
    Signed-off-by: Paolo Bonzini <pbonzini>
    Signed-off-by: Gleb Natapov <gleb>

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 16c037e..282d28c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4117,7 +4117,10 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
        case OpMem8:
                ctxt->memop.bytes = 1;
                if (ctxt->memop.type == OP_REG) {
-                       ctxt->memop.addr.reg = decode_register(ctxt, ctxt->modrm_rm, 1);
+                       int highbyte_regs = ctxt->rex_prefix == 0;
+
+                       ctxt->memop.addr.reg = decode_register(ctxt, ctxt->modrm_rm,
+                                              highbyte_regs);
                        fetch_register_operand(&ctxt->memop);
                }
                goto mem_common;

Comment 23 Cole Robinson 2013-11-17 19:18:36 UTC
*** Bug 1012119 has been marked as a duplicate of this bug. ***

Comment 24 Josh Boyer 2013-11-18 19:21:12 UTC
Added the patch to all branches on 3.12 or older.

Thanks Paolo and Michele.

Comment 25 Fedora Update System 2013-11-21 14:44:39 UTC
kernel-3.11.9-300.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.9-300.fc20

Comment 26 Fedora Update System 2013-11-21 14:47:49 UTC
kernel-3.11.9-200.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.11.9-200.fc19

Comment 27 Fedora Update System 2013-11-21 14:53:22 UTC
kernel-3.11.9-100.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.11.9-100.fc18

Comment 28 Fedora Update System 2013-11-23 19:40:31 UTC
Package kernel-3.11.9-100.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.9-100.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-21822/kernel-3.11.9-100.fc18
then log in and leave karma (feedback).

Comment 29 Fedora Update System 2013-11-24 03:47:35 UTC
kernel-3.11.9-200.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 30 Fedora Update System 2013-11-24 23:46:15 UTC
kernel-3.11.9-300.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 31 Jaromír Cápík 2013-11-26 15:23:46 UTC
The problem still persists with 3.11.9-200.fc19.x86_64.

Comment 32 Mattias Ellert 2013-11-26 18:04:05 UTC
3.11.9-200.fc19.x86_64 fixed the problem for me.

Comment 33 Paolo Bonzini 2013-11-27 12:18:51 UTC
With 3.11.9-200, boot will remain stuck for 2-3 minutes and then proceed at normal speed.

Comment 34 Fedora Update System 2013-11-29 06:54:13 UTC
kernel-3.11.9-100.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 35 Paolo Bonzini 2013-11-29 11:24:54 UTC
*** Bug 1013641 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.