Bug 524508 - kvm regression between 2.6.29 and 2.6.30 causes guest to become unresponsive
kvm regression between 2.6.29 and 2.6.30 causes guest to become unresponsive
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
11
All Linux
high Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks: F11VirtTarget
  Show dependency treegraph
 
Reported: 2009-09-20 18:50 EDT by Kevin Fenzi
Modified: 2013-01-09 06:28 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-28 10:45:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
report of items from reporting virt bugs wiki page. (23.49 KB, text/plain)
2009-10-01 13:24 EDT, Kevin Fenzi
no flags Details

  None (edit)
Description Kevin Fenzi 2009-09-20 18:50:14 EDT
I have a f11 host with a f11 guest running on it via kvm/libvirt. 
In the past it's been very stable. 

With the switch to the 2.6.30.x kernels in f11 it's been much less so. 

Every few days the guest will go unresponsive and start taking up 100% cpu on the host. 
It will answer pings, but nothing else. I have to 'virsh destroy' and 'virsh start' it to get it back up and working again. 

The host is fine during this except that its seeing the heavy cpu load. 

I managed to get somewhat of a trace from it the last time it happened, but I didn't have the right debuginfo installed, so not sure how usefull it will be: 

(gdb) thread apply all bt full

Thread 5 (Thread 0x7f224f09d910 (LWP 2520)):
#0  0x00000034c16d6827 in ioctl () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000000054bb7e in kvm_run (kvm=0x1453040, vcpu=<value optimized out>, env=0x146b570)
    at libkvm.c:908
        r = 0
        fd = 12
        run = 0x7f22dd118000
#2  0x000000000051f159 in kvm_cpu_exec (env=0x0) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:205
        r = <value optimized out>
#3  0x000000000051f440 in kvm_main_loop_cpu (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:414
No locals.
#4  ap_main_loop (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:451
        env = 0x146b570
        signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
        data = 0x0
#5  0x00000034c220686a in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#6  0x00000034c16de3bd in clone () from /lib64/libc.so.6
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 4 (Thread 0x7f224e69c910 (LWP 2521)):
#0  0x00000034c16d6827 in ioctl () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000000054bb7e in kvm_run (kvm=0x1453040, vcpu=<value optimized out>, env=0x1485010)
    at libkvm.c:908
        r = 0
        fd = 13
        run = 0x7f22dd115000
#2  0x000000000051f159 in kvm_cpu_exec (env=0x0) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:205
        r = <value optimized out>
#3  0x000000000051f440 in kvm_main_loop_cpu (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:414
No locals.
#4  ap_main_loop (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:451
        env = 0x1485010
        signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
        data = 0x0
#5  0x00000034c220686a in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#6  0x00000034c16de3bd in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 3 (Thread 0x7f224dc9b910 (LWP 2522)):
#0  0x00000034c16d6827 in ioctl () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000000054bb7e in kvm_run (kvm=0x1453040, vcpu=<value optimized out>, env=0x1492c10)
    at libkvm.c:908
        r = 0
        fd = 14
        run = 0x7f22dd112000
#2  0x000000000051f159 in kvm_cpu_exec (env=0x0) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:205
        r = <value optimized out>
#3  0x000000000051f440 in kvm_main_loop_cpu (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:414
No locals.
#4  ap_main_loop (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:451
        env = 0x1492c10
        signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
        data = 0x0
#5  0x00000034c220686a in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#6  0x00000034c16de3bd in clone () from /lib64/libc.so.6
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 0x7f224d29a910 (LWP 2523)):
#0  0x00000034c16d6827 in ioctl () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000000054bb7e in kvm_run (kvm=0x1453040, vcpu=<value optimized out>, env=0x14a0810)
    at libkvm.c:908
        r = 0
        fd = 15
        run = 0x7f22dd10f000
#2  0x000000000051f159 in kvm_cpu_exec (env=0x0) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:205
        r = <value optimized out>
#3  0x000000000051f440 in kvm_main_loop_cpu (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:414
No locals.
#4  ap_main_loop (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:451
        env = 0x14a0810
        signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
        data = 0x0
---Type <return> to continue, or q <return> to quit---
#5  0x00000034c220686a in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#6  0x00000034c16de3bd in clone () from /lib64/libc.so.6
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 1 (Thread 0x7f22dca50740 (LWP 2501)):
#0  0x00000034c16d7102 in select () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000409b97 in qemu_select (tv=<value optimized out>, xfds=<value optimized out>, 
    wfds=<value optimized out>, rfds=<value optimized out>, max_fd=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.6/vl.c:3689
No locals.
#2  main_loop_wait (tv=<value optimized out>, xfds=<value optimized out>, 
    wfds=<value optimized out>, rfds=<value optimized out>, max_fd=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.10.6/vl.c:3788
        ioh = 0x0
        rfds = {fds_bits = {1508640, 0 <repeats 15 times>}}
        wfds = {fds_bits = {0 <repeats 16 times>}}
        xfds = {fds_bits = {0 <repeats 16 times>}}
        ret = <value optimized out>
        nfds = 20
        tv = {tv_sec = 0, tv_usec = 977996}
#3  0x000000000051ec0a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:596
        fds = {18, 19}
        mask = {__val = {268443648, 0 <repeats 15 times>}}
        sigfd = <value optimized out>
#4  0x000000000040e981 in main_loop () at /usr/src/debug/qemu-kvm-0.10.6/vl.c:3851
        ret = <value optimized out>
        timeout = <value optimized out>
        env = <value optimized out>
#5  main () at /usr/src/debug/qemu-kvm-0.10.6/vl.c:6140
        use_gdbstub = 0
        gdbstub_port = 0x551a58 "1234"
        boot_devices_bitmap = <value optimized out>
        i = <value optimized out>
        snapshot = <value optimized out>
        linux_boot = <value optimized out>
        net_boot = <value optimized out>
        initrd_filename = 0x0
        kernel_filename = 0x0
        kernel_cmdline = 0x58b4eb ""
        boot_devices = 0x7fff5d5c2eb5 "c"
        dcl = <value optimized out>
---Type <return> to continue, or q <return> to quit---
        cyls = 0
        heads = 0
        secs = 0
        translation = 0
        net_clients = {0x7fff5d5c2f38 "nic,macaddr=54:52:00:64:0b:3c,vlan=0,model=virtio", 
          0x7fff5d5c2f6f "tap,fd=17,vlan=0", 0x7f22dd11f658 "", 0x7f22dcef09f0 "", 
          0x7f22dd10c4d0 "", 0x7f22dd10c998 "", 0x7f22dcef0000 "", 0x0, 
          0xfffebab84b800000 <Address 0xfffebab84b800000 out of bounds>, 0x7f22dd11fb20 "", 0x0, 
          0x34c2601266 "libc.so.6", 0x7fff5d5c2030 "", 
          0xfffebab83e000000 <Address 0xfffebab83e000000 out of bounds>, 
          0x698241cdd60000 <Address 0x698241cdd60000 out of bounds>, 0x7fff5d5c2550 ".", 
          0x7fff5d5c25c0 "\20bg\301\64", 0x7f22dd11f658 "", 0x0, 0x34c1a03876 "libc.so.6", 
          0x7fff5d5c2110 "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377", 
          0x34c120c7ed "H\211C H\203\304\20[\303f\17\37\204", 0x0, 
          0x4242191848944058 <Address 0x4242191848944058 out of bounds>, 0x0, 
          0x34c120e706 "H\213\204$\30\1", 0x34c141faf8 "", 0x7fff5d5c2578 "", 0x7fff5d5c2580 "", 
          0x7fff5d5c258f "", 0x34c120c7b0 "SH\211\373\271\1", 0x7fff5d5c2550 "."}
        nb_net_clients = <value optimized out>
        bt_opts = {0x0, 0xfffebab842c00000 <Address 0xfffebab842c00000 out of bounds>, 
          0x698241cdd60000 <Address 0x698241cdd60000 out of bounds>, 0x7f22dd120000 "", 0x0, 
          0x403145 "libSDL-1.2.so.0", 0x1 <Address 0x1 out of bounds>, 0x7f22dcef04d0 "", 
          0x7fff5d5c1ef0 "\1", 0x0}
        nb_bt_opts = <value optimized out>
        hda_index = <value optimized out>
        optind = <value optimized out>
        r = <value optimized out>
        optarg = <value optimized out>
        monitor_hd = 0x1465f50
        monitor_device = <value optimized out>
        serial_devices = {0x7fff5d5c2f88 "pty", 0x0, 0x0, 0x0}
        serial_device_index = <value optimized out>
        parallel_devices = {0x7fff5d5c2f96 "none", 0x0, 0x0}
        parallel_device_index = <value optimized out>
        virtio_console_index = 0
        loadvm = 0x0
        machine = <value optimized out>
        cpu_model = 0x0
        usb_devices = {0x7da0e8 "\1", 0x34c120cee4 "H\213u\300H\205\366\17\205,\b", 
          0x1 <Address 0x1 out of bounds>, 0x7f22dcef0000 "", 0x7fff5d5c2200 "\1", 0x0, 
          0x7da0d8 "\1", 0x34c120cee4 "H\213u\300H\205\366\17\205,\b"}
        usb_devices_index = <value optimized out>
        fds = {8233144, 0}
        tb_size = 0
        pid_file = 0x7fff5d5c2e82 "/var/run/libvirt/qemu//dworkin.scrye.com.pid"
---Type <return> to continue, or q <return> to quit---
        incoming = 0x0
        fd = 0
        pwd = 0x0
        chroot_dir = 0x0
        run_as = 0x0

Currently both host and guest are running 2.6.30.6-53.fc11.x86_64. 
(.6 had some kvm fixes that I thought might help out). 

I'm not sure if this is a kernel issue (although it seems like it might be) or a libvirt one. 

Happy to provide further info on the host/guest/setup/gather more info the next time it happens, etc. There's nothing that seems to matter in dmesg/libvirt logs on the host.
Comment 1 Mark McLoughlin 2009-09-21 13:25:34 EDT
Definitely sounds like a kernel issue

Could you try running the 2.6.29 kernel in the guest for a while to see if that helps? If it doesn't, try the 2.6.29 kernel in the host and check that fixes it

That way we'll at least know whether it's a guest or host issue
Comment 2 Kevin Fenzi 2009-09-21 13:42:57 EDT
ok. Will boot it to the last .29 kernel the next time it locks up, and/or tonight.
Comment 3 Kevin Fenzi 2009-09-23 00:57:35 EDT
ok. I booted the guest into the last .29 kernel the other day, and it just locked the same way. ;( 

You want me to try the .29 kernel on the host with the .30 kernel in the guest now?
Comment 4 Mark McLoughlin 2009-09-23 04:01:37 EDT
(In reply to comment #3)

> You want me to try the .29 kernel on the host with the .30 kernel in the guest
> now?  

Yes, please
Comment 5 Kevin Fenzi 2009-09-23 18:52:46 EDT
ok. 
Host: 2.6.29.6-217.2.8.fc11.x86_64
Guest: 2.6.30.6-53.fc11.x86_64

Will see how it does. ;) It usually only takes a day or less to lock. 

Some additional info which may or may not be of use: 

- The guest has 4 cpus defined. I noticed when it locked up with the .30 kernel in the guest it was showing 100% cpu on the host. When it was using the .29 kernel it was showing 400% cpu. 

- I also have a rawhide kvm host here. It's showing no signs of problems with a f11 guest. 
It's running 2.6.31. 

Will let you know how it goes...
Comment 6 Chuck Ebbert 2009-09-24 23:24:25 EDT
2.6.30.6 has 16 kvm patches that aren't in 2.6.30.5
Comment 7 Kevin Fenzi 2009-09-25 00:16:30 EDT
Yeah, I tried 2.6.30.6 after I got lockups with 2.6.30.5. ;( 

Anyhow, it's been a bit over a day with the config from comment #5 with no lockups. 
(ie the host running .29).
Comment 8 Kevin Fenzi 2009-09-28 11:44:38 EDT
ok. It's been 4.5 days now, with no problems. 

So, it appears the issue happens with a 2.6.30 host system. 

Let me know if there is anything further for me to try from here, or if you need more info.
Comment 9 Mark McLoughlin 2009-10-01 11:53:43 EDT
Kevin: thanks for confirming that, helps a lot

could you include /var/log/libvirt/qemu/$guest.log so we see how the guest is launched?

see also https://fedoraproject.org/wiki/Reporting_virtualization_bugs

avi: any ideas for debugging this when it happens?
Comment 10 Kevin Fenzi 2009-10-01 13:20:12 EDT
Sure. Will attach the log. 

I will also attach the info requested from the above link.
Comment 11 Kevin Fenzi 2009-10-01 13:24:59 EDT
Created attachment 363373 [details]
report of items from reporting virt bugs wiki page.
Comment 12 Kevin Fenzi 2009-10-01 13:28:30 EDT
I can't seem to attach the guest.log. ;( 

Logrotate seems to rotate those daily, and keep only 1 week, so it's already rotated off. ;(
Comment 13 Kevin Fenzi 2010-02-13 17:02:25 EST
It's worth noting that I have moved both guest and host to f12, and they were happy for about 30 days or so, but now I see instability again. ;( 

We can probibly close this and move it to bug 562699 unless you guys think they are the same bug.
Comment 14 Bug Zapper 2010-04-28 06:30:16 EDT
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 15 Bug Zapper 2010-06-28 10:45:15 EDT
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.