I have a f11 host with a f11 guest running on it via kvm/libvirt. In the past it's been very stable. With the switch to the 2.6.30.x kernels in f11 it's been much less so. Every few days the guest will go unresponsive and start taking up 100% cpu on the host. It will answer pings, but nothing else. I have to 'virsh destroy' and 'virsh start' it to get it back up and working again. The host is fine during this except that its seeing the heavy cpu load. I managed to get somewhat of a trace from it the last time it happened, but I didn't have the right debuginfo installed, so not sure how usefull it will be: (gdb) thread apply all bt full Thread 5 (Thread 0x7f224f09d910 (LWP 2520)): #0 0x00000034c16d6827 in ioctl () from /lib64/libc.so.6 No symbol table info available. #1 0x000000000054bb7e in kvm_run (kvm=0x1453040, vcpu=<value optimized out>, env=0x146b570) at libkvm.c:908 r = 0 fd = 12 run = 0x7f22dd118000 #2 0x000000000051f159 in kvm_cpu_exec (env=0x0) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:205 r = <value optimized out> #3 0x000000000051f440 in kvm_main_loop_cpu (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:414 No locals. #4 ap_main_loop (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:451 env = 0x146b570 signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}} data = 0x0 #5 0x00000034c220686a in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #6 0x00000034c16de3bd in clone () from /lib64/libc.so.6 No symbol table info available. #7 0x0000000000000000 in ?? () No symbol table info available. Thread 4 (Thread 0x7f224e69c910 (LWP 2521)): #0 0x00000034c16d6827 in ioctl () from /lib64/libc.so.6 No symbol table info available. #1 0x000000000054bb7e in kvm_run (kvm=0x1453040, vcpu=<value optimized out>, env=0x1485010) at libkvm.c:908 r = 0 fd = 13 run = 0x7f22dd115000 #2 0x000000000051f159 in kvm_cpu_exec (env=0x0) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:205 r = <value optimized out> #3 0x000000000051f440 in kvm_main_loop_cpu (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:414 No locals. #4 ap_main_loop (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:451 env = 0x1485010 signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}} data = 0x0 #5 0x00000034c220686a in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #6 0x00000034c16de3bd in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- No symbol table info available. #7 0x0000000000000000 in ?? () No symbol table info available. Thread 3 (Thread 0x7f224dc9b910 (LWP 2522)): #0 0x00000034c16d6827 in ioctl () from /lib64/libc.so.6 No symbol table info available. #1 0x000000000054bb7e in kvm_run (kvm=0x1453040, vcpu=<value optimized out>, env=0x1492c10) at libkvm.c:908 r = 0 fd = 14 run = 0x7f22dd112000 #2 0x000000000051f159 in kvm_cpu_exec (env=0x0) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:205 r = <value optimized out> #3 0x000000000051f440 in kvm_main_loop_cpu (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:414 No locals. #4 ap_main_loop (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:451 env = 0x1492c10 signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}} data = 0x0 #5 0x00000034c220686a in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #6 0x00000034c16de3bd in clone () from /lib64/libc.so.6 No symbol table info available. #7 0x0000000000000000 in ?? () No symbol table info available. Thread 2 (Thread 0x7f224d29a910 (LWP 2523)): #0 0x00000034c16d6827 in ioctl () from /lib64/libc.so.6 No symbol table info available. #1 0x000000000054bb7e in kvm_run (kvm=0x1453040, vcpu=<value optimized out>, env=0x14a0810) at libkvm.c:908 r = 0 fd = 15 run = 0x7f22dd10f000 #2 0x000000000051f159 in kvm_cpu_exec (env=0x0) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:205 r = <value optimized out> #3 0x000000000051f440 in kvm_main_loop_cpu (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:414 No locals. #4 ap_main_loop (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:451 env = 0x14a0810 signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}} data = 0x0 ---Type <return> to continue, or q <return> to quit--- #5 0x00000034c220686a in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #6 0x00000034c16de3bd in clone () from /lib64/libc.so.6 No symbol table info available. #7 0x0000000000000000 in ?? () No symbol table info available. Thread 1 (Thread 0x7f22dca50740 (LWP 2501)): #0 0x00000034c16d7102 in select () from /lib64/libc.so.6 No symbol table info available. #1 0x0000000000409b97 in qemu_select (tv=<value optimized out>, xfds=<value optimized out>, wfds=<value optimized out>, rfds=<value optimized out>, max_fd=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/vl.c:3689 No locals. #2 main_loop_wait (tv=<value optimized out>, xfds=<value optimized out>, wfds=<value optimized out>, rfds=<value optimized out>, max_fd=<value optimized out>) at /usr/src/debug/qemu-kvm-0.10.6/vl.c:3788 ioh = 0x0 rfds = {fds_bits = {1508640, 0 <repeats 15 times>}} wfds = {fds_bits = {0 <repeats 16 times>}} xfds = {fds_bits = {0 <repeats 16 times>}} ret = <value optimized out> nfds = 20 tv = {tv_sec = 0, tv_usec = 977996} #3 0x000000000051ec0a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.10.6/qemu-kvm.c:596 fds = {18, 19} mask = {__val = {268443648, 0 <repeats 15 times>}} sigfd = <value optimized out> #4 0x000000000040e981 in main_loop () at /usr/src/debug/qemu-kvm-0.10.6/vl.c:3851 ret = <value optimized out> timeout = <value optimized out> env = <value optimized out> #5 main () at /usr/src/debug/qemu-kvm-0.10.6/vl.c:6140 use_gdbstub = 0 gdbstub_port = 0x551a58 "1234" boot_devices_bitmap = <value optimized out> i = <value optimized out> snapshot = <value optimized out> linux_boot = <value optimized out> net_boot = <value optimized out> initrd_filename = 0x0 kernel_filename = 0x0 kernel_cmdline = 0x58b4eb "" boot_devices = 0x7fff5d5c2eb5 "c" dcl = <value optimized out> ---Type <return> to continue, or q <return> to quit--- cyls = 0 heads = 0 secs = 0 translation = 0 net_clients = {0x7fff5d5c2f38 "nic,macaddr=54:52:00:64:0b:3c,vlan=0,model=virtio", 0x7fff5d5c2f6f "tap,fd=17,vlan=0", 0x7f22dd11f658 "", 0x7f22dcef09f0 "", 0x7f22dd10c4d0 "", 0x7f22dd10c998 "", 0x7f22dcef0000 "", 0x0, 0xfffebab84b800000 <Address 0xfffebab84b800000 out of bounds>, 0x7f22dd11fb20 "", 0x0, 0x34c2601266 "libc.so.6", 0x7fff5d5c2030 "", 0xfffebab83e000000 <Address 0xfffebab83e000000 out of bounds>, 0x698241cdd60000 <Address 0x698241cdd60000 out of bounds>, 0x7fff5d5c2550 ".", 0x7fff5d5c25c0 "\20bg\301\64", 0x7f22dd11f658 "", 0x0, 0x34c1a03876 "libc.so.6", 0x7fff5d5c2110 "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377", 0x34c120c7ed "H\211C H\203\304\20[\303f\17\37\204", 0x0, 0x4242191848944058 <Address 0x4242191848944058 out of bounds>, 0x0, 0x34c120e706 "H\213\204$\30\1", 0x34c141faf8 "", 0x7fff5d5c2578 "", 0x7fff5d5c2580 "", 0x7fff5d5c258f "", 0x34c120c7b0 "SH\211\373\271\1", 0x7fff5d5c2550 "."} nb_net_clients = <value optimized out> bt_opts = {0x0, 0xfffebab842c00000 <Address 0xfffebab842c00000 out of bounds>, 0x698241cdd60000 <Address 0x698241cdd60000 out of bounds>, 0x7f22dd120000 "", 0x0, 0x403145 "libSDL-1.2.so.0", 0x1 <Address 0x1 out of bounds>, 0x7f22dcef04d0 "", 0x7fff5d5c1ef0 "\1", 0x0} nb_bt_opts = <value optimized out> hda_index = <value optimized out> optind = <value optimized out> r = <value optimized out> optarg = <value optimized out> monitor_hd = 0x1465f50 monitor_device = <value optimized out> serial_devices = {0x7fff5d5c2f88 "pty", 0x0, 0x0, 0x0} serial_device_index = <value optimized out> parallel_devices = {0x7fff5d5c2f96 "none", 0x0, 0x0} parallel_device_index = <value optimized out> virtio_console_index = 0 loadvm = 0x0 machine = <value optimized out> cpu_model = 0x0 usb_devices = {0x7da0e8 "\1", 0x34c120cee4 "H\213u\300H\205\366\17\205,\b", 0x1 <Address 0x1 out of bounds>, 0x7f22dcef0000 "", 0x7fff5d5c2200 "\1", 0x0, 0x7da0d8 "\1", 0x34c120cee4 "H\213u\300H\205\366\17\205,\b"} usb_devices_index = <value optimized out> fds = {8233144, 0} tb_size = 0 pid_file = 0x7fff5d5c2e82 "/var/run/libvirt/qemu//dworkin.scrye.com.pid" ---Type <return> to continue, or q <return> to quit--- incoming = 0x0 fd = 0 pwd = 0x0 chroot_dir = 0x0 run_as = 0x0 Currently both host and guest are running 2.6.30.6-53.fc11.x86_64. (.6 had some kvm fixes that I thought might help out). I'm not sure if this is a kernel issue (although it seems like it might be) or a libvirt one. Happy to provide further info on the host/guest/setup/gather more info the next time it happens, etc. There's nothing that seems to matter in dmesg/libvirt logs on the host.
Definitely sounds like a kernel issue Could you try running the 2.6.29 kernel in the guest for a while to see if that helps? If it doesn't, try the 2.6.29 kernel in the host and check that fixes it That way we'll at least know whether it's a guest or host issue
ok. Will boot it to the last .29 kernel the next time it locks up, and/or tonight.
ok. I booted the guest into the last .29 kernel the other day, and it just locked the same way. ;( You want me to try the .29 kernel on the host with the .30 kernel in the guest now?
(In reply to comment #3) > You want me to try the .29 kernel on the host with the .30 kernel in the guest > now? Yes, please
ok. Host: 2.6.29.6-217.2.8.fc11.x86_64 Guest: 2.6.30.6-53.fc11.x86_64 Will see how it does. ;) It usually only takes a day or less to lock. Some additional info which may or may not be of use: - The guest has 4 cpus defined. I noticed when it locked up with the .30 kernel in the guest it was showing 100% cpu on the host. When it was using the .29 kernel it was showing 400% cpu. - I also have a rawhide kvm host here. It's showing no signs of problems with a f11 guest. It's running 2.6.31. Will let you know how it goes...
2.6.30.6 has 16 kvm patches that aren't in 2.6.30.5
Yeah, I tried 2.6.30.6 after I got lockups with 2.6.30.5. ;( Anyhow, it's been a bit over a day with the config from comment #5 with no lockups. (ie the host running .29).
ok. It's been 4.5 days now, with no problems. So, it appears the issue happens with a 2.6.30 host system. Let me know if there is anything further for me to try from here, or if you need more info.
Kevin: thanks for confirming that, helps a lot could you include /var/log/libvirt/qemu/$guest.log so we see how the guest is launched? see also https://fedoraproject.org/wiki/Reporting_virtualization_bugs avi: any ideas for debugging this when it happens?
Sure. Will attach the log. I will also attach the info requested from the above link.
Created attachment 363373 [details] report of items from reporting virt bugs wiki page.
I can't seem to attach the guest.log. ;( Logrotate seems to rotate those daily, and keep only 1 week, so it's already rotated off. ;(
It's worth noting that I have moved both guest and host to f12, and they were happy for about 30 days or so, but now I see instability again. ;( We can probibly close this and move it to bug 562699 unless you guys think they are the same bug.
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.