Bug 1290743
Summary: | qemu-kvm core dumped when repeat system_reset 20 times during guest boot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Qianqian Zhu <qizhu> |
Component: | qemu-kvm | Assignee: | Gerd Hoffmann <kraxel> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.8 | CC: | ailan, chayang, jen, juzhang, kraxel, mkenneth, qizhu, rbalakri, virt-bugs, virt-maint, yfu |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-0.12.1.2-2.489.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-05-10 21:02:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Qianqian Zhu
2015-12-11 10:12:21 UTC
This bug was exposed on AMD system while performing system_reset in a loop. Same test was done on Intel but with different cpu model, smp as well as memory size, unfortunately, it is not reproducible. QE will try same qemu-kvm cmd again on Intel then update here with test result. (In reply to Chao Yang from comment #3) > This bug was exposed on AMD system while performing system_reset in a loop. > Same test was done on Intel but with different cpu model, smp as well as > memory size, unfortunately, it is not reproducible. > > QE will try same qemu-kvm cmd again on Intel then update here with test > result. This bug can be reproduced with intel host, with same smp and vcpus(smp=16384, cpu=8): ----------------------------------------- -m 16384 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'Westmere' \ (CLI for autotest: python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 --mem=16384 --vcpu=8) job link: http://10.66.4.244/kvm_autotest_job_log/?jobid=1171125 How reproducible: 2/5 The reason seems same with AMD host ( .DEBUG file in autotest log): 12/14 13:47:02 INFO | aexpect:0965| [qemu output] qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/hw/qxl.c:1775: qxl_send_events: Assertion `qemu_spice_display_is_running(&d->ssd)' failed. 12/14 13:47:16 WARNI|env_proces:1296| virt-tests-vm1 is not alive. Can't query the register status 12/14 13:47:16 INFO | aexpect:0965| [qemu output] /tmp/aexpect/8eQT9SkV/aexpect-0YTy8k.sh: line 1: 9186 Aborted (core dumped) ... 12/14 13:47:16 INFO | aexpect:0965| [qemu output] (Process terminated with status 134) check the report file (debug/crash.qemu-kvm.9186/report): #0 0x00007f5d8d2c3625 in raise () from /lib64/libc.so.6 #0 0x00007f5d8d2c3625 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00007f5d8d2c4e05 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00007f5d8d2bc74e in __assert_fail_base () from /lib64/libc.so.6 No symbol table info available. #3 0x00007f5d8d2bc810 in __assert_fail () from /lib64/libc.so.6 No symbol table info available. #4 0x00007f5d909ffbed in qxl_send_events (d=0x7f5d96572840, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1775 old_pending = <value optimized out> le_events = 1 __PRETTY_FUNCTION__ = "qxl_send_events" __FUNCTION__ = "qxl_send_events" #5 0x00007f5d90a021a5 in qxl_push_free_res (sin=0x7f5d96572ad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708 ring = 0x7f5957bff434 notify = <value optimized out> #6 interface_release_resource (sin=0x7f5d96572ad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760 qxl = 0x7f5d96572840 ring = 0x7f5957bff434 id = <value optimized out> #7 0x00007f5d8daef2cf in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #8 0x00007f5d8dafe1a4 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #9 0x00007f5d8dae1087 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #10 0x00007f5d8dafcd06 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #11 0x00007f5d90381a51 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #12 0x00007f5d8d37996d in clone () from /lib64/libc.so.6 No symbol table info available. Sorry for make a mistake before: ------------------- How reproducible: 2/5 ====>2/100 ------------------- I run this case with autotest for 5 times, and 2 times fail. and in one repeat,autotest will do "system_reset" for 20 times. Test with the same intel host,and with "--mem=16384 --vcpu=8", repeat 5 times which means do "system_reset" 5*20=100 times, all are ok. Test with latest RHEL6.7.z, hit the same issue, so this is not a regression bug. (intel, --mem=16384 --vcpu=8) kernel:kernel-2.6.32-573.13.1.el6.x86_64 qemu-kvm:qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 spice: spice-server-debuginfo-0.12.4-12.el6.x86_64 bt: #0 0x00007fab07c02625 in raise () from /lib64/libc.so.6 #0 0x00007fab07c02625 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fab07c03e05 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00007fab07bfb74e in __assert_fail_base () from /lib64/libc.so.6 No symbol table info available. #3 0x00007fab07bfb810 in __assert_fail () from /lib64/libc.so.6 No symbol table info available. #4 0x00007fab0b33ec0d in qxl_send_events (d=0x7fab1238b840, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1775 old_pending = <value optimized out> le_events = 1 __PRETTY_FUNCTION__ = "qxl_send_events" __FUNCTION__ = "qxl_send_events" #5 0x00007fab0b3411c5 in qxl_push_free_res (sin=0x7fab1238bad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708 ring = 0x7fa6cfbff434 notify = <value optimized out> #6 interface_release_resource (sin=0x7fab1238bad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760 qxl = 0x7fab1238b840 ring = 0x7fa6cfbff434 id = <value optimized out> #7 0x00007fab0842e2cf in red_destroy_surface (worker=0x7faad40008c0, surface_id=1) at red_worker.c:1763 surface = 0x7faad4000c38 dcc = 0x1 link = <value optimized out> next = <value optimized out> #8 0x00007fab0843d1a4 in dev_destroy_surfaces (opaque=0x7faad40008c0, payload=<value optimized out>) at red_worker.c:11223 i = <value optimized out> #9 handle_dev_destroy_surfaces (opaque=0x7faad40008c0, payload=<value optimized out>) at red_worker.c:11246 worker = 0x7faad40008c0 #10 0x00007fab08420087 in dispatcher_handle_single_read (dispatcher=0x7fab0d3951d8) at dispatcher.c:139 ret = <value optimized out> type = <value optimized out> msg = 0x7fab0d3954b8 ack = 4294967295 payload = 0x7faad41da300 "@\t" #11 dispatcher_handle_recv_read (dispatcher=0x7fab0d3951d8) at dispatcher.c:162 No locals. #12 0x00007fab0843bd06 in red_worker_main (arg=<value optimized out>) at red_worker.c:12231 events = <value optimized out> i = <value optimized out> num_events = 1 timers_queue_timeout = <value optimized out> worker = 0x7faad40008c0 __FUNCTION__ = "red_worker_main" #13 0x00007fab0acc0a51 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #14 0x00007fab07cb896d in clone () from /lib64/libc.so.6 No symbol table info available. Candidate: commit 4a46c99c8118586f19894fe66fc6e353f159d4d9 Author: Gerd Hoffmann <kraxel> Date: Tue Oct 29 13:29:43 2013 +0100 qxl: replace pipe signaling with bottom half qxl creates a pipe, then writes something to it to wake up the iothread from the spice server thread to raise an irq. These days qemu bottom halves can be scheduled from threads and signals, so there is no reason to do this any more. Time to clean it up. Signed-off-by: Gerd Hoffmann <kraxel> Testbuild: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=10232971 Retest with: qemu-kvm-0.12.1.2-2.481.el6.bz1290743.1.x86_64 kernel-2.6.32-590.el6.x86_64 Steps to Reproduce: 1.Launch guest with: -vga qxl \ -m 16384 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'Westmere' \ 2.(qemu) system_reset 3. Repeat step2 20 times Results: Repeat 10 times, all END GOOD, without core dump. Can't be reproduced. Additional Info: (CLI for autotest: python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 --mem=16384 --vcpu=8 job link:http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1173313) patch sent. Fix included in qemu-kvm-0.12.1.2-2.485.el6 Test failed, so reassign. Package version: qemu-kvm-0.12.1.2-2.485.el6 kernel-2.6.32-604.el6.x86_64 Steps: 1.Launch guest with: -vga qxl \ -m 2048 \ -smp 1,maxcpus=1,cores=1,threads=1,sockets=1 \ -cpu 'Opteron_G2' \ 2.(qemu) system_reset 3. Repeat step2 20 times (CLI for autotest: python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 ) Result: Hit once core dump among 5 times repeat. Core file: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/rhel6/bug1290743/qemu-485/ #bt full Program: /usr/libexec/qemu-kvm PID: 5121 Signal: 6 Hostname: amd-5400b-4-3.englab.nay.redhat.com Time of the crash (according to kernel): Tue Jan 26 15:40:38 2016 Program backtrace: Missing separate debuginfo for Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/10/0ef2d7b308bb30d5d714867feda1e49712632d [New Thread 5130] [New Thread 5129] [New Thread 5121] [New Thread 10426] [Thread debugging using libthread_db enabled] Core was generated by `/usr/libexec/qemu-kvm -S -name virt-tests-vm1 -machine rhel6.6.0 -nodefaults -v'. Program terminated with signal 6, Aborted. #0 0x00007f6b6ec7c625 in raise () from /lib64/libc.so.6 #0 0x00007f6b6ec7c625 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00007f6b6ec7de05 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00007f6b6ec7574e in __assert_fail_base () from /lib64/libc.so.6 No symbol table info available. #3 0x00007f6b6ec75810 in __assert_fail () from /lib64/libc.so.6 No symbol table info available. #4 0x00007f6b723b8c1e in qxl_send_events (d=0x7f6b744b0320, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1769 old_pending = <value optimized out> le_events = 1 __PRETTY_FUNCTION__ = "qxl_send_events" #5 0x00007f6b723bb115 in qxl_push_free_res (sin=0x7f6b744b05b8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708 ring = 0x7f6adbbff434 notify = <value optimized out> #6 interface_release_resource (sin=0x7f6b744b05b8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760 qxl = 0x7f6b744b0320 ring = 0x7f6adbbff434 id = <value optimized out> #7 0x00007f6b6f4a82cf in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #8 0x00007f6b6f4b71a4 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #9 0x00007f6b6f49a087 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #10 0x00007f6b6f4b5d06 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #11 0x00007f6b71d3aa51 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #12 0x00007f6b6ed3296d in clone () from /lib64/libc.so.6 No symbol table info available. Gerd, Please have a look at Comment 12 Seems there is a spice-server bug we have to workaround: commit 511aefb0c60e3063ead76d4ba6aabf619eed18ef Author: Alon Levy <alevy> Date: Thu Nov 1 14:56:00 2012 +0200 hw/qxl: qxl_send_events: nop if stopped Added a trace point for easy logging. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=870972 Signed-off-by: Alon Levy <alevy> Signed-off-by: Gerd Hoffmann <kraxel> Please test: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=10397861 Retest with: qemu-kvm-0.12.1.2-2.486.el6.bz1290743.1.x86_64 kernel-2.6.32-604.el6.x86_64 Steps: 1.Launch guest with: -vga qxl \ -m 2048 \ -smp 1,maxcpus=1,cores=1,threads=1,sockets=1 \ -cpu 'Opteron_G2' \ 2.(qemu) system_reset 3. Repeat step2 20 times Results: Repeat 30 times, with --nrepeat=10 and --nrepeat=20 seperately, all END GOOD, without core dump. Can't be reproduced. Additional Info: CLI for autotest: python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=20 job link: http://10.66.4.244/kvm_autotest_job_log/?jobid=1206293 http://10.66.4.244/kvm_autotest_job_log/?jobid=1206568 Fix included in qemu-kvm-0.12.1.2-2.489.el6 Verified with: qemu-kvm-rhev-0.12.1.2-2.489.el6.x86_64 kernel-2.6.32-615.el6.x86_64 Steps: 1.Launch guest with: -vga qxl \ -m 2048 \ -smp 1,maxcpus=1,cores=1,threads=1,sockets=1 \ -cpu 'Opteron_G2' \ 2.(qemu) system_reset 3. Repeat step2 20 times Results: Repeat the above steps twice, and all PASS, no core dump. Additional Info: CLI for autotest: python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=20 job link: http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1239747 http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1243292 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0815.html |