Bug 977475

Summary: virtual machines with 2 CPUs and qxl graphics do not power off
Product: [Fedora] Fedora Reporter: Jeff Bastian <jbastian>
Component: xorg-x11-drv-qxlAssignee: David Blechter <dblechte>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 19CC: collura, dblechte, gansalmon, harald, hdegoede, itamar, johannbg, jonathan, kem, kernel-maint, lnykryn, madhu.chinakonda, marcandre.lureau, michele, mschmidt, msekleta, plautrba, rvokal, systemd-maint, vpavlin, xgl-maint, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 15:40:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel bug during shutdown
none
longer transcript of bad shutdown none

Description Jeff Bastian 2013-06-24 16:03:21 UTC
Description of problem:
Frequently Fedora 19 virtual machines do not fully power off on my F19 host system.  The virtual CPU is pegged at 100% and the virtual machine's console is just blank and I have to use Force Off from virt-manager to kill it.

I don't know if this is a kernel problem, systemd problem, libvirt, qemu, or what.

Version-Release number of selected component (if applicable):
kernel-3.9.6-301.fc19.x86_64
libvirt-1.0.5.2-1.fc19.x86_64
qemu-1.4.2-4.fc19.x86_64
virt-manager-0.10.0-1.fc19.noarch

How reproducible:
very frequently

Steps to Reproduce:
1. on an F19 host, start an F19 virtual machine
2. log in to Gnome
3. Gnome username menu -> Power Off

Actual results:
virt machine console is blank and virt-manager shows the CPU is 100% busy

Expected results:
virt machine fully powers off

Additional info:

Comment 1 Jeff Bastian 2013-06-24 16:27:34 UTC
I'm not sure if this is related, but 3 times in the past week my laptop has completely frozen while running a virtual machine.  It doesn't respond to any keyboard nor mouse events nor ping.

Comment 2 Jeff Bastian 2013-07-01 16:24:29 UTC
Bug 974383 sounds similar to this one, except I'm experiencing this on regular boots, not anaconda related.  I'll try some of the debugging techniques mentioned in the other bug.

Comment 3 Jeff Bastian 2013-07-01 16:56:42 UTC
I tried enabling the debug-shell.service for systemd, but when the VM hangs, sending a CTRL-ALT-F9 does nothing.  All I see on the screen is a blinking _ for all TTYs, and virt-manager still shows the VM using 100% CPU.

Comment 4 Jeff Bastian 2013-07-01 17:27:05 UTC
I tried adding a serial console to the VM to maybe get some output, and I could no longer reproduce the problem.

So I removed the serial console, and the problem returned.

Ugh.

Comment 5 Michal Schmidt 2013-07-01 17:34:03 UTC
Try booting the guest without "quiet" on the kernel command line and with "plymouth.enable=0" instead.

Comment 6 Jeff Bastian 2013-07-01 18:00:22 UTC
Ok, I did that and it's in a loop printing one _ per line now as fast as possible.  The shutdown messages quickly scrolled off the screen and now I just have a screen full of
_
_
_
_
_
_
_
_
and the CPU is pegged at 100%.

Comment 7 Harald Hoyer 2013-07-02 10:30:05 UTC
What is the kernel command line in the virtual machine?

# cat /proc/cmdline

Comment 8 Jeff Bastian 2013-07-02 14:19:29 UTC
Currently it's this:

~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.9.6-301.fc19.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap rd.md=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 rd.lvm.lv=fedora/root rhgb plymouth.enable=0 nomodeset


But I've been experimenting with the options at the end and I've seen failures with:

... rhgb quiet
... rhgb plymouth.enable=0
... rhgb plymouth.enable=0 nomodeset
... rhgb drm_kms_helper.edid_firmware=edid/1024x768.bin


And I did not see the problem with

... rhgb quiet console=tty0 console=ttyS0,9600n8
... rhgb quiet console=ttyS0,9600n8 console=tty0

Comment 9 Jeff Bastian 2013-07-02 14:58:10 UTC
Created attachment 767805 [details]
kernel bug during shutdown

(In reply to Jeff Bastian from comment #8)
> And I did not see the problem with
> 
> ... rhgb quiet console=tty0 console=ttyS0,9600n8
> ... rhgb quiet console=ttyS0,9600n8 console=tty0


I take it back: I just reproduced it with the serial console.  It seems less frequent with a serial console enabled (maybe it's just my perception), but I just kept booting and powering off the VM until I saw a problem and eventually I got this:

[ 1055.769984] BUG: scheduling while atomic: swapper/1/0/0x10010000
[ 1055.770022] Modules linked in: ebtable_nat nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables bnep bluetooth rfkill ip6table_filter ip6_tables joydev microcode virtio_net virtio_balloon i2c_piix4 uinput qxl drm_kms_helper ttm drm virtio_blk i2c_core
[ 1055.770024] Pid: 0, comm: swapper/1 Not tainted 3.9.6-301.fc19.x86_64 #1
[ 1055.770025] Call Trace:
[ 1055.770035]  <IRQ>  [<ffffffff8163c725>] __schedule_bug+0x4d/0x5b
[ 1055.770039]  [<ffffffff81644866>] __schedule+0x6c6/0x7c0
[ 1055.770043]  [<ffffffff8108ef16>] __cond_resched+0x26/0x30
[ 1055.770045]  [<ffffffff81644d5a>] _cond_resched+0x3a/0x50
...

The call trace just keeps going on and on and on for 1000s of lines.

I virsh destroyed the box after a few seconds so I could copy-and-paste the output.  See attached for the full output.

Comment 10 Michal Schmidt 2013-07-02 15:02:16 UTC
Looks like qxl is to blame. See if configuring the guest to emulate a different video hardware works around the bug.

Comment 11 Jeff Bastian 2013-07-02 15:19:44 UTC
Created attachment 767809 [details]
longer transcript of bad shutdown

I tried again and used the 'script' command to get a full transcript (so I wouldn't have to interrupt it with virsh destroy to copy-and-paste before it scrolled off my terminal buffer)

The output stopped at 49 seconds (according to the VM's kernel), but the machine kept running at 100% CPU for a few more minutes until I finally killed it.

Comment 12 Jeff Bastian 2013-07-02 15:45:49 UTC
(In reply to Michal Schmidt from comment #10)
> Looks like qxl is to blame. See if configuring the guest to emulate a
> different video hardware works around the bug.


I think you're right: I switched from Spice/QXL graphics to VNC/Cirrus and I was able to cleanly boot and shutdown 20 times in a row.

I then switched back to Spice/QXL and it froze on the 3rd attempt.

Comment 13 Jeff Bastian 2013-07-02 16:23:51 UTC
I think I found another clue: my VM was configured for 2 virtual CPUs.

I switched it to a single CPU and I was able to cleanly shutdown 20 times in a row with Spice/QXL graphics.

I went back to a dual-CPU setup and it hit the bug on the first shutdown.


So it seems to be a combo of multi-CPU + QXL graphics behind this problem.

Comment 14 Jeff Bastian 2013-07-02 20:43:44 UTC
I upgraded my VM to kernel-3.9.8-300.fc19 and it doesn't want to crash anymore.

I tried going back to 3.9.6-301.fc19 and easily reproduced the problem.

Maybe I'm just getting lucky with 3.9.8-300 because I don't see anything in the kernel rpm changelog to indicate any modifications to the qxl driver:

* Thu Jun 27 2013 Josh Boyer <jwboyer> - 3.9.8-300
- Linux v3.9.8

* Thu Jun 27 2013 Josh Boyer <jwboyer>
- Fix stack memory usage for DMA in ath3k (rhbz 977558)

* Wed Jun 26 2013 Josh Boyer <jwboyer>
- Add two patches to fix bridge networking issues (rhbz 880035)

* Tue Jun 25 2013 Kyle McMartin <kyle>
- Cherry pick fix out of rawhide for %{with_*} tests in module
  signing from Jan Stancek.

* Mon Jun 24 2013 Josh Boyer <jwboyer>
- Fix battery issue with bluetooth keyboards (rhbz 903741)

* Fri Jun 21 2013 Josh Boyer <jwboyer>
- Add two patches to fix iwlwifi issues in unmapping
- Add patch to fix carl9170 oops (rhbz 967271)

* Thu Jun 20 2013 Justin M. Forbes <jforbes>
- Linux v3.9.7

* Tue Jun 18 2013 Neil Horman <nhorman>
- Fix dma debug error in tulip driver (rhbz 956732)

* Tue Jun 18 2013 Dave Jones <davej>
- Disable MTRR sanitizer by default.

* Mon Jun 17 2013 Josh Boyer <jwboyer> - 3.9.6-301
- Add patch to fix radeon issues on powerpc

Comment 15 Jeff Bastian 2013-07-03 14:07:23 UTC
I just saw another bug when booting my VM with 2 CPUs and the 3.9.8-300 and it seems to involve the qxl driver again:

[   19.006361] BUG: scheduling while atomic: systemd-udevd/346/0x10010000
[   19.006368] Modules linked in: microcode(+) i2c_piix4 uinput qxl drm_kms_helper virtio_blk ttm drm i2c_core
[   19.006371] Pid: 346, comm: systemd-udevd Tainted: G        W    3.9.8-300.fc19.x86_64 #1
[   19.006371] Call Trace:
[   19.006379]  <IRQ>  [<ffffffff8163cc45>] __schedule_bug+0x4d/0x5b
[   19.006382]  [<ffffffff81644d86>] __schedule+0x6c6/0x7c0
[   19.006386]  [<ffffffff8108efb6>] __cond_resched+0x26/0x30
[   19.006388]  [<ffffffff8164527a>] _cond_resched+0x3a/0x50
[   19.006393]  [<ffffffff81138bf5>] __alloc_pages_nodemask+0x2a5/0xa30
[   19.006396]  [<ffffffff81041e4f>] ? kvm_clock_read+0x1f/0x30
[   19.006400]  [<ffffffff8101a8a9>] ? sched_clock+0x9/0x10
[   19.006403]  [<ffffffff8109366d>] ? sched_clock_local+0x1d/0x80
[   19.006405]  [<ffffffff810937f8>] ? sched_clock_cpu+0xa8/0x100
[   19.006408]  [<ffffffff81065904>] ? irq_exit+0x84/0xb0
[   19.006410]  [<ffffffff81650c16>] ? do_IRQ+0x56/0xc0
[   19.006413]  [<ffffffff81646e6d>] ? common_interrupt+0x6d/0x6d
[   19.006416]  [<ffffffff81176479>] alloc_pages_current+0xa9/0x170
[   19.006419]  [<ffffffff8117ef6a>] new_slab+0x2fa/0x3e0
[   19.006420]  [<ffffffff8163efba>] __slab_alloc+0x309/0x4cd
[   19.006423]  [<ffffffff8118207b>] ? kmem_cache_alloc+0x1bb/0x200
[   19.006426]  [<ffffffff81306e9d>] ? list_del+0xd/0x30
[   19.006430]  [<ffffffff8119b48c>] ? get_empty_filp+0x5c/0x1b0
[   19.006432]  [<ffffffff81182054>] kmem_cache_alloc+0x194/0x200
[   19.006437]  [<ffffffff812971c7>] ? inode_doinit_with_dentry+0x157/0x660
[   19.006439]  [<ffffffff8119b48c>] ? get_empty_filp+0x5c/0x1b0
[   19.006441]  [<ffffffff8119b48c>] get_empty_filp+0x5c/0x1b0
[   19.006442]  [<ffffffff8119b5fe>] alloc_file+0x1e/0xc0
[   19.006445]  [<ffffffff811475b8>] shmem_file_setup+0xf8/0x1d0
[   19.006454]  [<ffffffffa0015832>] drm_gem_object_init+0x32/0x60 [drm]
[   19.006458]  [<ffffffffa008ee92>] qxl_bo_create+0x92/0x1f0 [qxl]
[   19.006463]  [<ffffffffa00938b8>] qxl_alloc_release_reserved+0x168/0x2b0 [qxl]
[   19.006467]  [<ffffffffa009170b>] make_drawable.constprop.3+0x2b/0xd0 [qxl]
[   19.006470]  [<ffffffffa0091f13>] qxl_draw_copyarea+0x43/0xc0 [qxl]
[   19.006473]  [<ffffffffa008e30b>] qxl_fb_copyarea+0x3b/0x40 [qxl]
[   19.006477]  [<ffffffff81349db7>] bit_bmove+0x57/0x60
[   19.006480]  [<ffffffff81344e6f>] fbcon_redraw_blit.isra.22+0x14f/0x1e0
[   19.006482]  [<ffffffff81348aa4>] fbcon_scroll+0x9b4/0xd30
[   19.006485]  [<ffffffff812fc5ac>] ? vsnprintf+0x20c/0x670
[   19.006489]  [<ffffffff813b7f2c>] scrup+0xfc/0x110
[   19.006490]  [<ffffffff813b7fc0>] lf+0x80/0x90
[   19.006493]  [<ffffffff813b9152>] vt_console_print+0x2a2/0x3f0
[   19.006497]  [<ffffffff8105df71>] call_console_drivers.constprop.15+0x91/0x100
[   19.006499]  [<ffffffff8105ee6b>] console_unlock+0x3ab/0x3f0
[   19.006501]  [<ffffffff8105f115>] vprintk_emit+0x265/0x520
[   19.006503]  [<ffffffff8163c832>] printk+0x67/0x69
[   19.006507]  [<ffffffffa0060a7f>] collect_cpu_info+0xbf/0xe0 [microcode]
[   19.006510]  [<ffffffffa0060022>] collect_cpu_info_local+0x22/0x30 [microcode]
[   19.006512]  [<ffffffff810bc804>] generic_smp_call_function_single_interrupt+0x94/0x100
[   19.006516]  [<ffffffff81036a87>] smp_call_function_single_interrupt+0x27/0x40
[   19.006518]  [<ffffffff8164fddd>] call_function_single_interrupt+0x6d/0x80
[   19.008513]  <EOI> 



This is repeated over and over again on the serial console.

Comment 17 Fedora End Of Life 2015-01-09 18:31:17 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Fedora End Of Life 2015-02-17 15:40:43 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.