Description of problem: Frequently Fedora 19 virtual machines do not fully power off on my F19 host system. The virtual CPU is pegged at 100% and the virtual machine's console is just blank and I have to use Force Off from virt-manager to kill it. I don't know if this is a kernel problem, systemd problem, libvirt, qemu, or what. Version-Release number of selected component (if applicable): kernel-3.9.6-301.fc19.x86_64 libvirt-1.0.5.2-1.fc19.x86_64 qemu-1.4.2-4.fc19.x86_64 virt-manager-0.10.0-1.fc19.noarch How reproducible: very frequently Steps to Reproduce: 1. on an F19 host, start an F19 virtual machine 2. log in to Gnome 3. Gnome username menu -> Power Off Actual results: virt machine console is blank and virt-manager shows the CPU is 100% busy Expected results: virt machine fully powers off Additional info:
I'm not sure if this is related, but 3 times in the past week my laptop has completely frozen while running a virtual machine. It doesn't respond to any keyboard nor mouse events nor ping.
Bug 974383 sounds similar to this one, except I'm experiencing this on regular boots, not anaconda related. I'll try some of the debugging techniques mentioned in the other bug.
I tried enabling the debug-shell.service for systemd, but when the VM hangs, sending a CTRL-ALT-F9 does nothing. All I see on the screen is a blinking _ for all TTYs, and virt-manager still shows the VM using 100% CPU.
I tried adding a serial console to the VM to maybe get some output, and I could no longer reproduce the problem. So I removed the serial console, and the problem returned. Ugh.
Try booting the guest without "quiet" on the kernel command line and with "plymouth.enable=0" instead.
Ok, I did that and it's in a loop printing one _ per line now as fast as possible. The shutdown messages quickly scrolled off the screen and now I just have a screen full of _ _ _ _ _ _ _ _ and the CPU is pegged at 100%.
What is the kernel command line in the virtual machine? # cat /proc/cmdline
Currently it's this: ~]# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.9.6-301.fc19.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap rd.md=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 rd.lvm.lv=fedora/root rhgb plymouth.enable=0 nomodeset But I've been experimenting with the options at the end and I've seen failures with: ... rhgb quiet ... rhgb plymouth.enable=0 ... rhgb plymouth.enable=0 nomodeset ... rhgb drm_kms_helper.edid_firmware=edid/1024x768.bin And I did not see the problem with ... rhgb quiet console=tty0 console=ttyS0,9600n8 ... rhgb quiet console=ttyS0,9600n8 console=tty0
Created attachment 767805 [details] kernel bug during shutdown (In reply to Jeff Bastian from comment #8) > And I did not see the problem with > > ... rhgb quiet console=tty0 console=ttyS0,9600n8 > ... rhgb quiet console=ttyS0,9600n8 console=tty0 I take it back: I just reproduced it with the serial console. It seems less frequent with a serial console enabled (maybe it's just my perception), but I just kept booting and powering off the VM until I saw a problem and eventually I got this: [ 1055.769984] BUG: scheduling while atomic: swapper/1/0/0x10010000 [ 1055.770022] Modules linked in: ebtable_nat nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables bnep bluetooth rfkill ip6table_filter ip6_tables joydev microcode virtio_net virtio_balloon i2c_piix4 uinput qxl drm_kms_helper ttm drm virtio_blk i2c_core [ 1055.770024] Pid: 0, comm: swapper/1 Not tainted 3.9.6-301.fc19.x86_64 #1 [ 1055.770025] Call Trace: [ 1055.770035] <IRQ> [<ffffffff8163c725>] __schedule_bug+0x4d/0x5b [ 1055.770039] [<ffffffff81644866>] __schedule+0x6c6/0x7c0 [ 1055.770043] [<ffffffff8108ef16>] __cond_resched+0x26/0x30 [ 1055.770045] [<ffffffff81644d5a>] _cond_resched+0x3a/0x50 ... The call trace just keeps going on and on and on for 1000s of lines. I virsh destroyed the box after a few seconds so I could copy-and-paste the output. See attached for the full output.
Looks like qxl is to blame. See if configuring the guest to emulate a different video hardware works around the bug.
Created attachment 767809 [details] longer transcript of bad shutdown I tried again and used the 'script' command to get a full transcript (so I wouldn't have to interrupt it with virsh destroy to copy-and-paste before it scrolled off my terminal buffer) The output stopped at 49 seconds (according to the VM's kernel), but the machine kept running at 100% CPU for a few more minutes until I finally killed it.
(In reply to Michal Schmidt from comment #10) > Looks like qxl is to blame. See if configuring the guest to emulate a > different video hardware works around the bug. I think you're right: I switched from Spice/QXL graphics to VNC/Cirrus and I was able to cleanly boot and shutdown 20 times in a row. I then switched back to Spice/QXL and it froze on the 3rd attempt.
I think I found another clue: my VM was configured for 2 virtual CPUs. I switched it to a single CPU and I was able to cleanly shutdown 20 times in a row with Spice/QXL graphics. I went back to a dual-CPU setup and it hit the bug on the first shutdown. So it seems to be a combo of multi-CPU + QXL graphics behind this problem.
I upgraded my VM to kernel-3.9.8-300.fc19 and it doesn't want to crash anymore. I tried going back to 3.9.6-301.fc19 and easily reproduced the problem. Maybe I'm just getting lucky with 3.9.8-300 because I don't see anything in the kernel rpm changelog to indicate any modifications to the qxl driver: * Thu Jun 27 2013 Josh Boyer <jwboyer> - 3.9.8-300 - Linux v3.9.8 * Thu Jun 27 2013 Josh Boyer <jwboyer> - Fix stack memory usage for DMA in ath3k (rhbz 977558) * Wed Jun 26 2013 Josh Boyer <jwboyer> - Add two patches to fix bridge networking issues (rhbz 880035) * Tue Jun 25 2013 Kyle McMartin <kyle> - Cherry pick fix out of rawhide for %{with_*} tests in module signing from Jan Stancek. * Mon Jun 24 2013 Josh Boyer <jwboyer> - Fix battery issue with bluetooth keyboards (rhbz 903741) * Fri Jun 21 2013 Josh Boyer <jwboyer> - Add two patches to fix iwlwifi issues in unmapping - Add patch to fix carl9170 oops (rhbz 967271) * Thu Jun 20 2013 Justin M. Forbes <jforbes> - Linux v3.9.7 * Tue Jun 18 2013 Neil Horman <nhorman> - Fix dma debug error in tulip driver (rhbz 956732) * Tue Jun 18 2013 Dave Jones <davej> - Disable MTRR sanitizer by default. * Mon Jun 17 2013 Josh Boyer <jwboyer> - 3.9.6-301 - Add patch to fix radeon issues on powerpc
I just saw another bug when booting my VM with 2 CPUs and the 3.9.8-300 and it seems to involve the qxl driver again: [ 19.006361] BUG: scheduling while atomic: systemd-udevd/346/0x10010000 [ 19.006368] Modules linked in: microcode(+) i2c_piix4 uinput qxl drm_kms_helper virtio_blk ttm drm i2c_core [ 19.006371] Pid: 346, comm: systemd-udevd Tainted: G W 3.9.8-300.fc19.x86_64 #1 [ 19.006371] Call Trace: [ 19.006379] <IRQ> [<ffffffff8163cc45>] __schedule_bug+0x4d/0x5b [ 19.006382] [<ffffffff81644d86>] __schedule+0x6c6/0x7c0 [ 19.006386] [<ffffffff8108efb6>] __cond_resched+0x26/0x30 [ 19.006388] [<ffffffff8164527a>] _cond_resched+0x3a/0x50 [ 19.006393] [<ffffffff81138bf5>] __alloc_pages_nodemask+0x2a5/0xa30 [ 19.006396] [<ffffffff81041e4f>] ? kvm_clock_read+0x1f/0x30 [ 19.006400] [<ffffffff8101a8a9>] ? sched_clock+0x9/0x10 [ 19.006403] [<ffffffff8109366d>] ? sched_clock_local+0x1d/0x80 [ 19.006405] [<ffffffff810937f8>] ? sched_clock_cpu+0xa8/0x100 [ 19.006408] [<ffffffff81065904>] ? irq_exit+0x84/0xb0 [ 19.006410] [<ffffffff81650c16>] ? do_IRQ+0x56/0xc0 [ 19.006413] [<ffffffff81646e6d>] ? common_interrupt+0x6d/0x6d [ 19.006416] [<ffffffff81176479>] alloc_pages_current+0xa9/0x170 [ 19.006419] [<ffffffff8117ef6a>] new_slab+0x2fa/0x3e0 [ 19.006420] [<ffffffff8163efba>] __slab_alloc+0x309/0x4cd [ 19.006423] [<ffffffff8118207b>] ? kmem_cache_alloc+0x1bb/0x200 [ 19.006426] [<ffffffff81306e9d>] ? list_del+0xd/0x30 [ 19.006430] [<ffffffff8119b48c>] ? get_empty_filp+0x5c/0x1b0 [ 19.006432] [<ffffffff81182054>] kmem_cache_alloc+0x194/0x200 [ 19.006437] [<ffffffff812971c7>] ? inode_doinit_with_dentry+0x157/0x660 [ 19.006439] [<ffffffff8119b48c>] ? get_empty_filp+0x5c/0x1b0 [ 19.006441] [<ffffffff8119b48c>] get_empty_filp+0x5c/0x1b0 [ 19.006442] [<ffffffff8119b5fe>] alloc_file+0x1e/0xc0 [ 19.006445] [<ffffffff811475b8>] shmem_file_setup+0xf8/0x1d0 [ 19.006454] [<ffffffffa0015832>] drm_gem_object_init+0x32/0x60 [drm] [ 19.006458] [<ffffffffa008ee92>] qxl_bo_create+0x92/0x1f0 [qxl] [ 19.006463] [<ffffffffa00938b8>] qxl_alloc_release_reserved+0x168/0x2b0 [qxl] [ 19.006467] [<ffffffffa009170b>] make_drawable.constprop.3+0x2b/0xd0 [qxl] [ 19.006470] [<ffffffffa0091f13>] qxl_draw_copyarea+0x43/0xc0 [qxl] [ 19.006473] [<ffffffffa008e30b>] qxl_fb_copyarea+0x3b/0x40 [qxl] [ 19.006477] [<ffffffff81349db7>] bit_bmove+0x57/0x60 [ 19.006480] [<ffffffff81344e6f>] fbcon_redraw_blit.isra.22+0x14f/0x1e0 [ 19.006482] [<ffffffff81348aa4>] fbcon_scroll+0x9b4/0xd30 [ 19.006485] [<ffffffff812fc5ac>] ? vsnprintf+0x20c/0x670 [ 19.006489] [<ffffffff813b7f2c>] scrup+0xfc/0x110 [ 19.006490] [<ffffffff813b7fc0>] lf+0x80/0x90 [ 19.006493] [<ffffffff813b9152>] vt_console_print+0x2a2/0x3f0 [ 19.006497] [<ffffffff8105df71>] call_console_drivers.constprop.15+0x91/0x100 [ 19.006499] [<ffffffff8105ee6b>] console_unlock+0x3ab/0x3f0 [ 19.006501] [<ffffffff8105f115>] vprintk_emit+0x265/0x520 [ 19.006503] [<ffffffff8163c832>] printk+0x67/0x69 [ 19.006507] [<ffffffffa0060a7f>] collect_cpu_info+0xbf/0xe0 [microcode] [ 19.006510] [<ffffffffa0060022>] collect_cpu_info_local+0x22/0x30 [microcode] [ 19.006512] [<ffffffff810bc804>] generic_smp_call_function_single_interrupt+0x94/0x100 [ 19.006516] [<ffffffff81036a87>] smp_call_function_single_interrupt+0x27/0x40 [ 19.006518] [<ffffffff8164fddd>] call_function_single_interrupt+0x6d/0x80 [ 19.008513] <EOI> This is repeated over and over again on the serial console.
This message is a notice that Fedora 19 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 19. It is Fedora's policy to close all bug reports from releases that are no longer maintained. Approximately 4 (four) weeks from now this bug will be closed as EOL if it remains open with a Fedora 'version' of '19'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 19 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.