Bug 1003819

Summary: System-reset make qemu core dumpd after migrating a "s3-state" guest w/ spice&qxl .
Product: Red Hat Enterprise Linux 7 Reporter: Qian Guo <qiguo>
Component: qemu-kvmAssignee: Gerd Hoffmann <kraxel>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0CC: hhuang, juzhang, mazhang, qiguo, qzhang, rbalakri, rhod, rmainz, virt-bugs, virt-maint, xutian
Target Milestone: rcKeywords: TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-30 08:46:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1054077    
Bug Blocks: 923626    

Description Qian Guo 2013-09-03 09:45:38 UTC
Description of problem:
RHEL7 guest w/ spice and qxl device is in S3 state, then migrated, the guest can not be resumed(exist issue), but when try to system_reset via hmp, the qemu coredumpd.

Version-Release number of selected component (if applicable):
kernel:
# uname -r
3.10.0-15.el7.x86_64
# rpm -q qemu-kvm
qemu-kvm-1.5.3-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot guest in src host:
#/usr/libexec/qemu-kvm -cpu Penryn -enable-kvm -m 4096 -smp 4,sockets=1,cores=4,threads=1 -name rhel7base  -drive file=/mnt/rhel7cp1.qcow2_v3,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -boot menu=on -monitor stdio -netdev tap,id=hostnet0,ifname=guest1,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown,vhost=on,queues=4 -device virtio-net,netdev=hostnet0,mac=54:52:1b:35:3c:16,id=test,mq=on,vectors=9 -nodefaults -nodefconfig -spice disable-ticketing,port=5930,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864   -device virtio-balloon-pci,id=balloon1 -qmp tcp:0:4446,server,nowait -device intel-hda,id=hda1 -device hda-duplex -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -serial unix:/tmp/qiguo,server,nowait

2.Do S3 in guest:
# pm-suspend

3.Migrate this guest to dst host, then the guest stalled, can not be resumed.

4.After migration, try to reset the guest via hmp
# system_reset
Actual results:
qemu coredumpd w/ the qxl/spice information:

(qemu) qemu-kvm: /builddir/build/BUILD/qemu-1.5.3/hw/display/qxl.c:1114: qxl_check_state: Assertion `!spice_display_running || ((&ram->cmd_ring)->cons == (&ram->cmd_ring)->prod)' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff32e4999 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install alsa-lib-1.0.27.2-1.el7.x86_64 celt051-0.5.1.3-6.el7.x86_64 cyrus-sasl-lib-2.1.26-9.el7.x86_64 cyrus-sasl-md5-2.1.26-9.el7.x86_64 cyrus-sasl-plain-2.1.26-9.el7.x86_64 cyrus-sasl-scram-2.1.26-9.el7.x86_64 dbus-libs-1.6.12-4.el7.x86_64 flac-libs-1.3.0-2.el7.x86_64 glib2-2.36.3-2.el7.x86_64 glibc-2.17-21.el7.x86_64 glusterfs-3.4.0.15rhs-1.el7.x86_64 gmp-5.1.1-2.el7.x86_64 gnutls-3.1.13-1.el7.x86_64 gsm-1.0.13-9.el7.x86_64 json-c-0.11-1.el7.x86_64 keyutils-libs-1.5.5-4.el7.x86_64 krb5-libs-1.11.3-8.el7.x86_64 libICE-1.0.8-5.el7.x86_64 libSM-1.2.1-5.el7.x86_64 libX11-1.6.0-1.el7.x86_64 libXau-1.0.8-1.el7.x86_64 libXext-1.3.2-1.el7.x86_64 libXi-1.7.2-1.el7.x86_64 libXtst-1.2.2-1.el7.x86_64 libaio-0.3.109-9.el7.x86_64 libasyncns-0.8-5.el7.x86_64 libattr-2.4.46-10.el7.x86_64 libcap-2.22-6.el7.x86_64 libcom_err-1.42.8-2.el7.x86_64 libdb-5.3.21-11.el7.x86_64 libgcc-4.8.1-6.el7.x86_64 libgcrypt-1.5.3-1.el7.x86_64 libgpg-error-1.11-1.el7.x86_64 libiscsi-1.7.0-6.el7.x86_64 libjpeg-turbo-1.2.90-2.el7.x86_64 libogg-1.3.0-5.el7.x86_64 libpng-1.5.13-2.el7.x86_64 libseccomp-2.1.0-0.el7.x86_64 libselinux-2.1.13-16.el7.x86_64 libsndfile-1.0.25-7.el7.x86_64 libtasn1-3.3-1.el7.x86_64 libusbx-1.0.15-2.el7.x86_64 libuuid-2.23.2-2.el7.x86_64 libvorbis-1.3.3-4.el7.x86_64 libxcb-1.9-3.el7.x86_64 nettle-2.6-2.el7.x86_64 nspr-4.10-3.el7.x86_64 nss-3.15.1-2.el7.x86_64 nss-softokn-freebl-3.15.1-2.el7.x86_64 nss-util-3.15.1-2.el7.x86_64 openssl-libs-1.0.1e-15.el7.x86_64 p11-kit-0.18.5-1.el7.x86_64 pcre-8.32-7.el7.x86_64 pixman-0.30.0-1.el7.x86_64 pulseaudio-libs-3.0-10.el7.x86_64 spice-server-0.12.4-1.el7.x86_64 tcp_wrappers-libs-7.6-75.el7.x86_64 usbredir-0.6-3.el7.x86_64 zlib-1.2.7-10.el7.x86_64
(gdb) bt
#0  0x00007ffff32e4999 in raise () from /lib64/libc.so.6
#1  0x00007ffff32e60a8 in abort () from /lib64/libc.so.6
#2  0x00007ffff32dd906 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff32dd9b2 in __assert_fail () from /lib64/libc.so.6
#4  0x000055555575971d in qxl_check_state (d=<optimized out>) at /usr/src/debug/qemu-1.5.3/hw/display/qxl.c:1114
#5  0x000055555575a025 in qxl_reset_state (d=d@entry=0x555556713e20) at /usr/src/debug/qemu-1.5.3/hw/display/qxl.c:1122
#6  0x000055555575b35b in qxl_hard_reset (d=0x555556713e20, loadvm=0) at /usr/src/debug/qemu-1.5.3/hw/display/qxl.c:1159
#7  0x000055555563ecd9 in qdev_reset_one (dev=dev@entry=0x555556713e20, opaque=opaque@entry=0x0) at hw/core/qdev.c:227
#8  0x000055555563e3d0 in qdev_walk_children (dev=dev@entry=0x555556713e20, devfn=devfn@entry=0x55555563ecc0 <qdev_reset_one>, 
    busfn=busfn@entry=0x55555563ccc0 <qbus_reset_one>, opaque=opaque@entry=0x0) at hw/core/qdev.c:376
#9  0x000055555563e46d in qdev_reset_all (dev=dev@entry=0x555556713e20) at hw/core/qdev.c:243
#10 0x0000555555682c3d in pci_device_reset (dev=0x555556713e20) at hw/pci/pci.c:180
#11 0x0000555555682df2 in pci_bus_reset (bus=0x5555566aed70) at hw/pci/pci.c:226
#12 0x0000555555682e39 in pcibus_reset (qbus=<optimized out>) at hw/pci/pci.c:233
#13 0x000055555563e4b0 in qbus_walk_children (bus=bus@entry=0x5555566aed70, devfn=devfn@entry=0x55555563ecc0 <qdev_reset_one>, 
    busfn=busfn@entry=0x55555563ccc0 <qbus_reset_one>, opaque=opaque@entry=0x0) at hw/core/qdev.c:353
#14 0x000055555563e3fa in qdev_walk_children (dev=<optimized out>, devfn=devfn@entry=0x55555563ecc0 <qdev_reset_one>, busfn=busfn@entry=0x55555563ccc0 <qbus_reset_one>, 
    opaque=opaque@entry=0x0) at hw/core/qdev.c:383
#15 0x000055555563e4da in qbus_walk_children (bus=<optimized out>, devfn=0x55555563ecc0 <qdev_reset_one>, busfn=0x55555563ccc0 <qbus_reset_one>, opaque=0x0)
    at hw/core/qdev.c:360
#16 0x000055555573143d in qemu_devices_reset () at vl.c:1883
#17 qemu_system_reset (report=report@entry=true) at vl.c:1892
#18 0x00005555555c5474 in main_loop_should_exit () at vl.c:2026
#19 main_loop () at vl.c:2064
#20 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4451


Expected results:
No coredumpd and can reboot successfully.

Additional info:
When test w/ std&spice, no such issue.

Comment 1 Qian Guo 2013-09-03 09:49:27 UTC
There's call trace in the dst host after qemu-kvm coredumpd:
[98393.475354] WARNING: at net/core/dev.c:5011 rollback_registered_many+0x1e2/0x210()
[98393.475356] Modules linked in: tcp_lp rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl lockd sunrpc vhost_net macvtap macvlan tun bnep bluetooth fuse xt_CHECKSUM bridge stp llc ebtable_nat nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables openvswitch vxlan ip_tunnel gre sg snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device iTCO_wdt igb coretemp kvm_intel iTCO_vendor_support kvm e1000e i2c_i801 snd_pcm snd_page_alloc snd_timer snd hp_wmi sparse_keymap
[98393.475421]  rfkill crc32_pclmul lpc_ich dca crc32c_intel soundcore ghash_clmulni_intel mfd_core ptp pps_core wmi shpchp serio_raw microcode mperf pcspkr uinput xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif i915 video i2c_algo_bit drm_kms_helper ahci drm libahci libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
[98393.475454] CPU: 4 PID: 14411 Comm: qemu-kvm Tainted: G        W   --------------   3.10.0-15.el7.x86_64 #1
[98393.475457] Hardware name: Hewlett-Packard HP Compaq 8200 Elite MT PC/1495, BIOS J01 v02.15 11/10/2011
[98393.475459]  0000000000000009 ffff8801ee419b30 ffffffff815fa8cc ffff8801ee419b68
[98393.475464]  ffffffff81060711 ffff8801249e0000 ffff8801ee419bb0 ffff8801ee419bb0
[98393.475468]  ffff88021bc800c0 ffff8801e4974400 ffff8801ee419b78 ffffffff810607ea
[98393.475473] Call Trace:
[98393.475480]  [<ffffffff815fa8cc>] dump_stack+0x19/0x1b
[98393.475487]  [<ffffffff81060711>] warn_slowpath_common+0x61/0x80
[98393.475489]  [<ffffffff810607ea>] warn_slowpath_null+0x1a/0x20
[98393.475492]  [<ffffffff814ee8e2>] rollback_registered_many+0x1e2/0x210
[98393.475494]  [<ffffffff814ee941>] rollback_registered+0x31/0x40
[98393.475497]  [<ffffffff814ef9f8>] unregister_netdevice_queue+0x48/0x90
[98393.475509]  [<ffffffffa0677312>] __tun_detach+0x112/0x2b0 [tun]
[98393.475513]  [<ffffffffa06774dd>] tun_chr_close+0x2d/0x50 [tun]
[98393.475517]  [<ffffffff8119e6a9>] __fput+0xe9/0x270
[98393.475520]  [<ffffffff8119e8ee>] ____fput+0xe/0x10
[98393.475524]  [<ffffffff810820a4>] task_work_run+0xc4/0xe0
[98393.475527]  [<ffffffff81066025>] do_exit+0x2b5/0xa20
[98393.475530]  [<ffffffff8106680f>] do_group_exit+0x3f/0xa0
[98393.475535]  [<ffffffff81074eeb>] get_signal_to_deliver+0x1cb/0x5d0
[98393.475539]  [<ffffffff81011408>] do_signal+0x48/0x5a0
[98393.475543]  [<ffffffff810119d0>] do_notify_resume+0x70/0xa0
[98393.475547]  [<ffffffff81609292>] int_signal+0x12/0x17
[98393.475549] ---[ end trace 8b1af66abfed498d ]---

Comment 3 Gerd Hoffmann 2013-11-05 12:53:34 UTC
Looks simliar to bug 1021324.
Can you retest with qemu-kvm-1.5.3-12.el7.x86_64 (or newer) please?

Comment 4 Qian Guo 2013-11-06 06:46:30 UTC
Hi, Gerd

Reproduced with 
# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-1.5.3-13.el7.x86_64

host/guest kernel : kernel-3.10.0-42.el7.x86_64

After migrate and system_reset, qemu-kvm coredumpd(qemu) qemu-kvm: /builddir/build/BUILD/qemu-1.5.3/hw/display/qxl.c:1114: qxl_check_state: Assertion `!spice_display_running || ((&ram->cmd_ring)->cons == (&ram->cmd_ring)->prod)' failed.
Aborted


and in dst host, hit call trace, the coredumpd and call trace messages are same as comment #0.

See bug #1021324 , the coredumpd messages are similar, seams same bug.


Thanks,

Qian Guo

Comment 14 Gerd Hoffmann 2014-05-23 08:57:12 UTC
upstream commits:
7cc6a25fe94b430cb5a041bcb19d7d854b4e99a7
b50f3e42b9438e033074222671c0502ecfeba82c
75c70e37bc4a6bdc394b4d1b163fe730abb82c72

Comment 15 Gerd Hoffmann 2014-09-02 11:06:05 UTC
Most likely same as bug 1054077.

Comment 16 Gerd Hoffmann 2014-10-27 09:47:50 UTC
bug 1054077 was fixed in qemu-kvm-1.5.3-71.el7, please retest with that build (or newer).

Comment 17 juzhang 2014-10-28 00:48:52 UTC
Hi Qian,

Could you re-test this issue?

Best Regards,
Junyi

Comment 18 Qian Guo 2014-10-30 08:22:44 UTC
(In reply to Gerd Hoffmann from comment #16)
> bug 1054077 was fixed in qemu-kvm-1.5.3-71.el7, please retest with that
> build (or newer).

Test this scenario with qemu-kvm-rhev-2.1.2-5.el7.x86_64 and qemu-kvm-1.5.3-77.el7.x86_64, both works well.

qemu cli:
# /usr/libexec/qemu-kvm -cpu Penryn -enable-kvm -m 4096 -smp 4,sockets=1,cores=4,threads=1 -name rhel7base  -drive file=/mnt/rhel7u1/rhel7u1cp1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -boot menu=on -monitor stdio -netdev tap,id=hostnet0,ifname=guest1,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net,netdev=hostnet0,mac=54:52:1b:35:3c:16,id=test,mq=on,vectors=9 -nodefaults -nodefconfig -spice disable-ticketing,port=5930,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864   -device virtio-balloon-pci,id=balloon1 -qmp tcp:0:4446,server,nowait -device intel-hda,id=hda1 -device hda-duplex -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -serial unix:/tmp/qiguo,server,nowait

So the latest build has fixed this bug.

Comment 19 Gerd Hoffmann 2014-10-30 08:46:26 UTC
> So the latest build has fixed this bug.

Good, closing as 1054077 dup then.

*** This bug has been marked as a duplicate of bug 1054077 ***