Bug 650215 - Host F14, guest F14, KVM stuck
Summary: Host F14, guest F14, KVM stuck
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 14
Hardware: i386
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 649333 652373 653696 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-05 14:25 UTC by Catalin BOIE
Modified: 2013-01-09 11:42 UTC (History)
27 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2010-12-08 14:38:42 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
qemu console output - fedora14 (12.65 KB, text/plain)
2010-11-09 05:56 UTC, Izhar Firdaus
no flags Details
qemu console output - ubuntu10.10 (18.03 KB, text/plain)
2010-11-09 05:57 UTC, Izhar Firdaus
no flags Details
fix (2.29 KB, patch)
2010-11-10 17:24 UTC, Avi Kivity
no flags Details | Diff

Description Catalin BOIE 2010-11-05 14:25:45 UTC
Description of problem:
KVM is stuck in kvm_run.

Version-Release number of selected component (if applicable):
qemu-kvm-0.13.0-1.fc14.i686
Linux cboie 2.6.35.6-48.fc14.i686.PAE #1 SMP Fri Oct 22 15:27:53 UTC 2010 i686 i686 i386 GNU/Linux

How reproducible:
Always.

Steps to Reproduce:
1. qemu-kvm -m 512 -hda f14.img
2.
3.
  
Actual results:
Machine does not start.

Expected results:
Machine should start.

Additional info:

First thread:
(gdb) bt full
#0  0x0096b424 in __kernel_vsyscall ()
No symbol table info available.
#1  0x002ce581 in select () at ../sysdeps/unix/syscall-template.S:82
No locals.
#2  0x0805f526 in main_loop_wait (nonblocking=0)
    at /usr/src/debug/qemu-kvm-0.13.0/vl.c:1291
        ioh = 0x0
        rfds = {fds_bits = {20768, 0 <repeats 31 times>}}
        wfds = {fds_bits = {0 <repeats 32 times>}}
        xfds = {fds_bits = {0 <repeats 32 times>}}
        ret = <value optimized out>
        nfds = 14
        tv = {tv_sec = 0, tv_usec = 985899}
        timeout = 1000
#3  0x0807303c in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:1710
        fds = {12, 13}
        mask = {__val = {268443712, 0 <repeats 31 times>}}
        sigfd = 14
#4  0x080604a9 in main_loop (argc=5, argv=0xbfec0c04, envp=0xbfec0c1c)
    at /usr/src/debug/qemu-kvm-0.13.0/vl.c:1343
        r = <value optimized out>
#5  main (argc=5, argv=0xbfec0c04, envp=0xbfec0c1c)
    at /usr/src/debug/qemu-kvm-0.13.0/vl.c:3097
        gdbstub_dev = 0x0
        i = <value optimized out>
        snapshot = 0
        linux_boot = 0
        icount_option = 0x0
        initrd_filename = 0x0
        kernel_filename = 0x0
        kernel_cmdline = 0x821eccc ""
        boot_devices = "cad", '\000' <repeats 29 times>
        ds = <value optimized out>
        dcl = <value optimized out>
        cyls = 0
        heads = 0
        secs = 0
        translation = 0
        hda_opts = 0x93bd498
        opts = <value optimized out>
        optind = 5
        optarg = 0xbfec1829 "f14.img"
        loadvm = 0x0
        machine = 0x82ae880
        cpu_model = 0x0
        tb_size = 0
        pid_file = 0x0
        incoming = 0x0
        show_vnc_port = 0
        defconfig = <value optimized out>


Second thread:
(gdb) bt full
#0  0x001d1424 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00998be9 in ioctl () at ../sysdeps/unix/syscall-template.S:82
No locals.
#2  0x080714aa in kvm_run (env=0x9ad3598)
    at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:610
        r = 0
        kvm = 0x9a9abcc
        run = 0xb77ca000
        fd = 9
#3  0x080724c2 in kvm_cpu_exec (env=0x9ad3598)
    at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:1238
        r = <value optimized out>
#4  0x08072753 in kvm_main_loop_cpu (_env=0x9ad3598)
    at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:1495
        run_cpu = <value optimized out>
#5  ap_main_loop (_env=0x9ad3598)
    at /usr/src/debug/qemu-kvm-0.13.0/qemu-kvm.c:1541
        env = 0x9ad3598
        signals = {__val = {2147483647, 4294967294, 
            4294967295 <repeats 30 times>}}
        data = <value optimized out>
#6  0x00a8bf19 in start_thread (arg=0xb749bb70) at pthread_create.c:301
        pd = 0xb749bb70
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {11137012, 0, 4001536, 
                -1219906472, 1444683819, -1811475644}, mask_was_saved = 0}}, 
          priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, 
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#7  0x009a1a2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
No locals.


pstack 24964
Thread 2 (Thread 0xb74e3b70 (LWP 24965)):
#0  0x0096b424 in __kernel_vsyscall ()
#1  0x002cdbe9 in ioctl () from /lib/libc.so.6
#2  0x080714aa in kvm_run ()
#3  0x080724c2 in kvm_cpu_exec ()
#4  0x08072753 in ap_main_loop ()
#5  0x00a8bf19 in start_thread () from /lib/libpthread.so.0
#6  0x002d6a2e in clone () from /lib/libc.so.6
Thread 1 (Thread 0xb77e6700 (LWP 24964)):
#0  0x0096b424 in __kernel_vsyscall ()
#1  0x002ce581 in select () from /lib/libc.so.6
#2  0x0805f526 in main_loop_wait ()
#3  0x0807303c in kvm_main_loop ()
#4  0x080604a9 in main ()

qemu-kvm --no-kvm -m 512 -hda f14.img - WORKS
qemu-kvm -m 512 -hda f14.img - NOT WORK

Comment 1 Izhar Firdaus 2010-11-07 11:51:44 UTC
a "me too" post here.

I got the same issue, F14 not booting up in KVM, only get a white screen right after grub .. however, it seems to work on my T400 laptop , but not on my office server. Issue with kvm_amd module ?

T400:
model name	: Intel(R) Core(TM)2 Duo CPU     P8700  @ 2.53GHz

Office server
model name	: AMD Phenom(tm) II X4 945 Processor

Comment 2 Catalin BOIE 2010-11-08 12:50:00 UTC
Strange is that the bug appears with qcow2. I tried with a raw partition and everything is OK.
Anyway, let me know if I can help to fix the root cause.

Comment 3 Izhar Firdaus 2010-11-08 12:54:25 UTC
i'm using raw by default, but stuck anyway ..

Comment 4 Avi Kivity 2010-11-08 14:47:55 UTC
AMD or Intel?

Guest type?

Where is the guest stuck exactly?

Comment 5 Catalin BOIE 2010-11-08 15:05:25 UTC
Intel.
Host is F14 i386.
Guest is F14 i386.
Please see the "Additional info:" section.
If you need more info, please let me know how to get it.

Thank you!

Comment 6 Avi Kivity 2010-11-08 16:42:29 UTC
Please remove 'rhgb quiet' from the guest command line using grub's built-in editor (assuming it gets past grub), to see what the guest does.

Also post the serial trace:

- start qemu with '-serial stdio'
- add 'console=ttyS0' to the guest grub command line

Interesting qemu commands to run:
- 'info registers'
- 'x/30i $eip - 20'

Comment 7 Izhar Firdaus 2010-11-09 05:55:35 UTC
update on my issue:

Fedora14 and Ubuntu10.10 able to boot up if the processor allocation is only 1. Both stuck right after grub if the allocation is more than 1. I ran both through libvirt

i'm attaching the console output for both Fedora14 and Ubuntu10.10. Qemu parameters were taken from qemu-kvm executed by libvirt and removed as much extra parameters i can.

I couldnt reproduce this on my T400 laptop however.

Comment 8 Izhar Firdaus 2010-11-09 05:56:37 UTC
Created attachment 458983 [details]
qemu console output - fedora14

Comment 9 Izhar Firdaus 2010-11-09 05:57:17 UTC
Created attachment 458984 [details]
qemu console output - ubuntu10.10

Comment 10 Catalin BOIE 2010-11-09 09:30:41 UTC
More info:

info registers:
(qemu) info registers
EAX=00000000 EBX=00ad3067 ECX=00000000 EDX=00000000
ESI=c0ac8fd0 EDI=c0ad4000 EBP=c0991f24 ESP=c0991f1c
EIP=c042803f EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =00d8 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =00e0 c0ac3d40 00000018 00409100 DPL=0 DS   [--A]
LDT=0000 00000000 ffffffff 00000000
TR =0020 00001000 00000067 00008b00 DPL=0 TSS32-busy
GDT=     c0abb000 000000ff
IDT=     c0997000 000007ff
CR0=80050033 CR2=00000000 CR3=00993000 CR4=00000020
DR0=00000000 DR1=00000000 DR2=00000000 DR3=c05867c3 
DR6=ffff0ff0 DR7=00000000
EFER=0000000000000800
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00000000
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
Seems it is stuck in this point.

(qemu) x/30i $eip - 20
0x00000000c042802b:  nop    
0x00000000c042802c:  push   %ebp
0x00000000c042802d:  mov    %esp,%ebp
0x00000000c042802f:  push   %esi
0x00000000c0428030:  push   %ebx
0x00000000c0428031:  call   0xc0408ff0
0x00000000c0428036:  mov    %eax,%esi
0x00000000c0428038:  mov    %edx,%ebx
0x00000000c042803a:  mov    (%eax),%eax
0x00000000c042803c:  mov    0x4(%esi),%edx
0x00000000c042803f:  lock cmpxchg8b (%esi)
0x00000000c0428043:  jne    0xc042803f
0x00000000c0428045:  pop    %ebx
0x00000000c0428046:  pop    %esi
0x00000000c0428047:  pop    %ebp
0x00000000c0428048:  ret    
0x00000000c0428049:  push   %ebp
0x00000000c042804a:  mov    %esp,%ebp
0x00000000c042804c:  call   0xc0408ff0
0x00000000c0428051:  pushf  
0x00000000c0428052:  pop    %eax
0x00000000c0428053:  pop    %ebp
0x00000000c0428054:  ret    
0x00000000c0428055:  push   %ebp
0x00000000c0428056:  mov    %esp,%ebp
0x00000000c0428058:  call   0xc0408ff0
0x00000000c042805d:  push   %eax
0x00000000c042805e:  popf   
0x00000000c042805f:  pop    %ebp
0x00000000c0428060:  ret    

Seems that "lock cmpxchg8b (%esi)" is the problem.

On the screen, only EDD line appears. I tried with edd=off and no progress.

Comment 11 Jan ONDREJ 2010-11-09 09:46:05 UTC
Very similar output like in bug #649333.

Curious, that I can run fedora 14 under fedora 14, but can't run fedora 12, 13.
CentOS works well too.

Also after reinstall of Fedora 13 guest unable to boot it. There is no problem with installation, only after installation I can't boot it.

Comment 12 Avi Kivity 2010-11-10 10:26:14 UTC
Please also dump the stack:

(qemu) x/100x $esp

Comment 13 Avi Kivity 2010-11-10 10:26:51 UTC
Note, please verify that $eip has not changed from the previous dump, so we're sure we're debugging the same problem.

Comment 14 Avi Kivity 2010-11-10 10:27:31 UTC
Note, please verify that $eip has not changed from the previous dump, so we're
sure we're debugging the same problem.

Comment 15 Catalin BOIE 2010-11-10 10:38:21 UTC
(qemu) info registers
EAX=00000000 EBX=00ad3067 ECX=00000000 EDX=00000000
ESI=c0ac8fd0 EDI=c0ad4000 EBP=c0991f24 ESP=c0991f1c
EIP=c042803f EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =00d8 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =00e0 c0ac3d40 00000018 00409100 DPL=0 DS   [--A]
LDT=0000 00000000 ffffffff 00000000
TR =0020 00001000 00000067 00008b00 DPL=0 TSS32-busy
GDT=     c0abb000 000000ff
IDT=     c0997000 000007ff
CR0=80050033 CR2=00000000 CR3=00993000 CR4=00000020
DR0=00000000 DR1=00000000 DR2=00000000 DR3=c05867c3 
DR6=ffff0ff0 DR7=00000000
EFER=0000000000000800
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00000000
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
(Same stuck address)

(qemu) x/100x $esp
00000000c0991f1c: 0x00ad3067 0x00000000 0xc0991f2c 0xc04282c5
00000000c0991f2c: 0xc0991f48 0xc0a4dc6a 0x00ad3067 0xc0ac8fd0
00000000c0991f3c: 0x00000000 0xc0a1c9c0 0xc0ad63d0 0xc0991fc0
00000000c0991f4c: 0xc0a3d85c 0x0000000f 0x00000046 0x00000035
00000000c0991f5c: 0xc0991f7d 0x00000000 0x00000000 0x00000000
00000000c0991f6c: 0x205b4aa0 0x30202020 0x3030302e 0xc0991fd8
00000000c0991f7c: 0xc0630020 0xc0991f9e 0x1eed093a 0xc09e8470
00000000c0991f8c: 0xc0428069 0xc0991f98 0xc0464dce 0xc0991fa4
00000000c0991f9c: 0x00000086 0x1eed093a 0x00000000 0x00099800
00000000c0991fac: 0xc0997000 0x1eed093a 0x00000000 0x00099800
00000000c0991fbc: 0xc0997000 0xc0991fe0 0xc0a39562 0xc08dc36b
00000000c0991fcc: 0xc07c0044 0x1eed093a 0xc4b459a5 0x00000000
00000000c0991fdc: 0x07fed000 0xc0991ff8 0xc0a390da 0x07560000
00000000c0991fec: 0x00000000 0x00000800 0x00099800 0x00f84003
00000000c0991ffc: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099200c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099201c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099202c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099203c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099204c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099205c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099206c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099207c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099208c: 0x00000000 0x00000000 0x00000000 0x00000000
00000000c099209c: 0x00000000 0x00000000 0x00000000 0x00000000

Comment 16 Avi Kivity 2010-11-10 15:15:04 UTC
This is set_64bit in native_set_pmd().

old value = edx:eax = 0
new value = ecx:ebx = 0x00ad3067
ZF = 0 = compare failed

Looks like cmpxchg8b is misemulated on i386.

Comment 17 Avi Kivity 2010-11-10 15:29:52 UTC
Already fixed in mainline, 16518d5ada690643453eb0aef3cc7841d3623c2d, just needs backporting to 2.6.35.

Justin, can you work with the kernel maintainers to cherry-pick this commit ASAP?  I'll get this into linux-stable, but that can take a while to percolate.

Comment 18 Avi Kivity 2010-11-10 17:24:07 UTC
Created attachment 459498 [details]
fix

Tested patch attached.

Comment 19 Catalin BOIE 2010-11-11 09:41:37 UTC
I have a small (stupid) question: why set_64bit is not inlined even if it is marked with 'inline'?

Comment 20 Justin M. Forbes 2010-11-11 17:03:42 UTC
This patch has been added to the F14 kernel and should make the next update.

Comment 21 Justin M. Forbes 2010-11-23 15:14:00 UTC
*** Bug 652373 has been marked as a duplicate of this bug. ***

Comment 22 Justin M. Forbes 2010-11-23 15:16:06 UTC
*** Bug 649333 has been marked as a duplicate of this bug. ***

Comment 23 Justin M. Forbes 2010-11-30 15:53:04 UTC
*** Bug 653696 has been marked as a duplicate of this bug. ***

Comment 24 Jan ONDREJ 2010-12-07 15:46:01 UTC
I think this was fixed in 2.6.35.9-64.fc14.i686.PAE. I have no problems after my host was updated to this kernel. Should we close this bug?

Comment 25 Leif Gruenwoldt 2010-12-07 15:56:09 UTC
I just experienced this issue on an up to date F14 host with a F14 live cd. Both were 64bit if it matters. So I would say leave it open.

$ uname -a
Linux jug 2.6.35.9-64.fc14.x86_64 #1 SMP Fri Dec 3 12:19:41 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 26 Avi Kivity 2010-12-08 14:38:42 UTC
(In reply to comment #25)
> I just experienced this issue on an up to date F14 host with a F14 live cd.
> Both were 64bit if it matters. So I would say leave it open.
> 
> $ uname -a
> Linux jug 2.6.35.9-64.fc14.x86_64 #1 SMP Fri Dec 3 12:19:41 UTC 2010 x86_64
> x86_64 x86_64 GNU/Linux

It's a different issue.  This is specifically an i386 problem.


Note You need to log in before you can comment on or make changes to this bug.