Bug 1290743
| Summary: | qemu-kvm core dumped when repeat system_reset 20 times during guest boot | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Qianqian Zhu <qizhu> |
| Component: | qemu-kvm | Assignee: | Gerd Hoffmann <kraxel> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.8 | CC: | ailan, chayang, jen, juzhang, kraxel, mkenneth, qizhu, rbalakri, virt-bugs, virt-maint, yfu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-0.12.1.2-2.489.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-10 21:02:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This bug was exposed on AMD system while performing system_reset in a loop. Same test was done on Intel but with different cpu model, smp as well as memory size, unfortunately, it is not reproducible. QE will try same qemu-kvm cmd again on Intel then update here with test result. (In reply to Chao Yang from comment #3) > This bug was exposed on AMD system while performing system_reset in a loop. > Same test was done on Intel but with different cpu model, smp as well as > memory size, unfortunately, it is not reproducible. > > QE will try same qemu-kvm cmd again on Intel then update here with test > result. This bug can be reproduced with intel host, with same smp and vcpus(smp=16384, cpu=8): ----------------------------------------- -m 16384 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'Westmere' \ (CLI for autotest: python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 --mem=16384 --vcpu=8) job link: http://10.66.4.244/kvm_autotest_job_log/?jobid=1171125 How reproducible: 2/5 The reason seems same with AMD host ( .DEBUG file in autotest log): 12/14 13:47:02 INFO | aexpect:0965| [qemu output] qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/hw/qxl.c:1775: qxl_send_events: Assertion `qemu_spice_display_is_running(&d->ssd)' failed. 12/14 13:47:16 WARNI|env_proces:1296| virt-tests-vm1 is not alive. Can't query the register status 12/14 13:47:16 INFO | aexpect:0965| [qemu output] /tmp/aexpect/8eQT9SkV/aexpect-0YTy8k.sh: line 1: 9186 Aborted (core dumped) ... 12/14 13:47:16 INFO | aexpect:0965| [qemu output] (Process terminated with status 134) check the report file (debug/crash.qemu-kvm.9186/report): #0 0x00007f5d8d2c3625 in raise () from /lib64/libc.so.6 #0 0x00007f5d8d2c3625 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00007f5d8d2c4e05 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00007f5d8d2bc74e in __assert_fail_base () from /lib64/libc.so.6 No symbol table info available. #3 0x00007f5d8d2bc810 in __assert_fail () from /lib64/libc.so.6 No symbol table info available. #4 0x00007f5d909ffbed in qxl_send_events (d=0x7f5d96572840, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1775 old_pending = <value optimized out> le_events = 1 __PRETTY_FUNCTION__ = "qxl_send_events" __FUNCTION__ = "qxl_send_events" #5 0x00007f5d90a021a5 in qxl_push_free_res (sin=0x7f5d96572ad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708 ring = 0x7f5957bff434 notify = <value optimized out> #6 interface_release_resource (sin=0x7f5d96572ad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760 qxl = 0x7f5d96572840 ring = 0x7f5957bff434 id = <value optimized out> #7 0x00007f5d8daef2cf in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #8 0x00007f5d8dafe1a4 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #9 0x00007f5d8dae1087 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #10 0x00007f5d8dafcd06 in ?? () from /usr/lib64/libspice-server.so.1 No symbol table info available. #11 0x00007f5d90381a51 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #12 0x00007f5d8d37996d in clone () from /lib64/libc.so.6 No symbol table info available. Sorry for make a mistake before: ------------------- How reproducible: 2/5 ====>2/100 ------------------- I run this case with autotest for 5 times, and 2 times fail. and in one repeat,autotest will do "system_reset" for 20 times. Test with the same intel host,and with "--mem=16384 --vcpu=8", repeat 5 times which means do "system_reset" 5*20=100 times, all are ok. Test with latest RHEL6.7.z, hit the same issue, so this is not a regression bug.
(intel, --mem=16384 --vcpu=8)
kernel:kernel-2.6.32-573.13.1.el6.x86_64
qemu-kvm:qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64
spice: spice-server-debuginfo-0.12.4-12.el6.x86_64
bt:
#0 0x00007fab07c02625 in raise () from /lib64/libc.so.6
#0 0x00007fab07c02625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fab07c03e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007fab07bfb74e in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3 0x00007fab07bfb810 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4 0x00007fab0b33ec0d in qxl_send_events (d=0x7fab1238b840, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1775
old_pending = <value optimized out>
le_events = 1
__PRETTY_FUNCTION__ = "qxl_send_events"
__FUNCTION__ = "qxl_send_events"
#5 0x00007fab0b3411c5 in qxl_push_free_res (sin=0x7fab1238bad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708
ring = 0x7fa6cfbff434
notify = <value optimized out>
#6 interface_release_resource (sin=0x7fab1238bad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760
qxl = 0x7fab1238b840
ring = 0x7fa6cfbff434
id = <value optimized out>
#7 0x00007fab0842e2cf in red_destroy_surface (worker=0x7faad40008c0, surface_id=1) at red_worker.c:1763
surface = 0x7faad4000c38
dcc = 0x1
link = <value optimized out>
next = <value optimized out>
#8 0x00007fab0843d1a4 in dev_destroy_surfaces (opaque=0x7faad40008c0, payload=<value optimized out>) at red_worker.c:11223
i = <value optimized out>
#9 handle_dev_destroy_surfaces (opaque=0x7faad40008c0, payload=<value optimized out>) at red_worker.c:11246
worker = 0x7faad40008c0
#10 0x00007fab08420087 in dispatcher_handle_single_read (dispatcher=0x7fab0d3951d8) at dispatcher.c:139
ret = <value optimized out>
type = <value optimized out>
msg = 0x7fab0d3954b8
ack = 4294967295
payload = 0x7faad41da300 "@\t"
#11 dispatcher_handle_recv_read (dispatcher=0x7fab0d3951d8) at dispatcher.c:162
No locals.
#12 0x00007fab0843bd06 in red_worker_main (arg=<value optimized out>) at red_worker.c:12231
events = <value optimized out>
i = <value optimized out>
num_events = 1
timers_queue_timeout = <value optimized out>
worker = 0x7faad40008c0
__FUNCTION__ = "red_worker_main"
#13 0x00007fab0acc0a51 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x00007fab07cb896d in clone () from /lib64/libc.so.6
No symbol table info available.
Candidate:
commit 4a46c99c8118586f19894fe66fc6e353f159d4d9
Author: Gerd Hoffmann <kraxel>
Date: Tue Oct 29 13:29:43 2013 +0100
qxl: replace pipe signaling with bottom half
qxl creates a pipe, then writes something to it to wake up the iothread
from the spice server thread to raise an irq. These days qemu bottom
halves can be scheduled from threads and signals, so there is no reason
to do this any more. Time to clean it up.
Signed-off-by: Gerd Hoffmann <kraxel>
Testbuild:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=10232971
Retest with: qemu-kvm-0.12.1.2-2.481.el6.bz1290743.1.x86_64 kernel-2.6.32-590.el6.x86_64 Steps to Reproduce: 1.Launch guest with: -vga qxl \ -m 16384 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'Westmere' \ 2.(qemu) system_reset 3. Repeat step2 20 times Results: Repeat 10 times, all END GOOD, without core dump. Can't be reproduced. Additional Info: (CLI for autotest: python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 --mem=16384 --vcpu=8 job link:http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1173313) patch sent. Fix included in qemu-kvm-0.12.1.2-2.485.el6 Test failed, so reassign.
Package version:
qemu-kvm-0.12.1.2-2.485.el6
kernel-2.6.32-604.el6.x86_64
Steps:
1.Launch guest with:
-vga qxl \
-m 2048 \
-smp 1,maxcpus=1,cores=1,threads=1,sockets=1 \
-cpu 'Opteron_G2' \
2.(qemu) system_reset
3. Repeat step2 20 times
(CLI for autotest:
python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 )
Result:
Hit once core dump among 5 times repeat.
Core file: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/rhel6/bug1290743/qemu-485/
#bt full
Program: /usr/libexec/qemu-kvm
PID: 5121
Signal: 6
Hostname: amd-5400b-4-3.englab.nay.redhat.com
Time of the crash (according to kernel): Tue Jan 26 15:40:38 2016
Program backtrace:
Missing separate debuginfo for
Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/10/0ef2d7b308bb30d5d714867feda1e49712632d
[New Thread 5130]
[New Thread 5129]
[New Thread 5121]
[New Thread 10426]
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/libexec/qemu-kvm -S -name virt-tests-vm1 -machine rhel6.6.0 -nodefaults -v'.
Program terminated with signal 6, Aborted.
#0 0x00007f6b6ec7c625 in raise () from /lib64/libc.so.6
#0 0x00007f6b6ec7c625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f6b6ec7de05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007f6b6ec7574e in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3 0x00007f6b6ec75810 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4 0x00007f6b723b8c1e in qxl_send_events (d=0x7f6b744b0320, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1769
old_pending = <value optimized out>
le_events = 1
__PRETTY_FUNCTION__ = "qxl_send_events"
#5 0x00007f6b723bb115 in qxl_push_free_res (sin=0x7f6b744b05b8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708
ring = 0x7f6adbbff434
notify = <value optimized out>
#6 interface_release_resource (sin=0x7f6b744b05b8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760
qxl = 0x7f6b744b0320
ring = 0x7f6adbbff434
id = <value optimized out>
#7 0x00007f6b6f4a82cf in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#8 0x00007f6b6f4b71a4 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#9 0x00007f6b6f49a087 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#10 0x00007f6b6f4b5d06 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#11 0x00007f6b71d3aa51 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#12 0x00007f6b6ed3296d in clone () from /lib64/libc.so.6
No symbol table info available.
Gerd, Please have a look at Comment 12 Seems there is a spice-server bug we have to workaround:
commit 511aefb0c60e3063ead76d4ba6aabf619eed18ef
Author: Alon Levy <alevy>
Date: Thu Nov 1 14:56:00 2012 +0200
hw/qxl: qxl_send_events: nop if stopped
Added a trace point for easy logging.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=870972
Signed-off-by: Alon Levy <alevy>
Signed-off-by: Gerd Hoffmann <kraxel>
Please test:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=10397861
Retest with: qemu-kvm-0.12.1.2-2.486.el6.bz1290743.1.x86_64 kernel-2.6.32-604.el6.x86_64 Steps: 1.Launch guest with: -vga qxl \ -m 2048 \ -smp 1,maxcpus=1,cores=1,threads=1,sockets=1 \ -cpu 'Opteron_G2' \ 2.(qemu) system_reset 3. Repeat step2 20 times Results: Repeat 30 times, with --nrepeat=10 and --nrepeat=20 seperately, all END GOOD, without core dump. Can't be reproduced. Additional Info: CLI for autotest: python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=20 job link: http://10.66.4.244/kvm_autotest_job_log/?jobid=1206293 http://10.66.4.244/kvm_autotest_job_log/?jobid=1206568 Fix included in qemu-kvm-0.12.1.2-2.489.el6 Verified with:
qemu-kvm-rhev-0.12.1.2-2.489.el6.x86_64
kernel-2.6.32-615.el6.x86_64
Steps:
1.Launch guest with:
-vga qxl \
-m 2048 \
-smp 1,maxcpus=1,cores=1,threads=1,sockets=1 \
-cpu 'Opteron_G2' \
2.(qemu) system_reset
3. Repeat step2 20 times
Results:
Repeat the above steps twice, and all PASS, no core dump.
Additional Info:
CLI for autotest:
python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=20
job link:
http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1239747
http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1243292
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0815.html |
Description of problem: qemu-kvm core dumped when repeat system_reset 20 times during guest boot Version-Release number of selected component (if applicable): kernel-2.6.32-590.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.481.el6.x86_64 seabios-0.6.1.2-30.el6.x86_64 How reproducible: 1/1 Steps to Reproduce: 1.launch guest with: /usr/libexec/qemu-kvm \ -S \ -name 'virt-tests-vm1' \ -machine rhel6.6.0 \ -nodefaults \ -vga qxl \ -device intel-hda,bus=pci.0,addr=03 \ -device hda-duplex \ -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20151209-142858-hiRm1ljA,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151209-142858-hiRm1ljA,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idEJRiq4 \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151209-142858-hiRm1ljA,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \ -chardev socket,id=devvs,path=/tmp/virtio_port-vs-20151209-142858-hiRm1ljA,server,nowait \ -device virtserialport,chardev=devvs,name=vs,id=vs,bus=virtio_serial_pci0.0 \ -chardev socket,id=seabioslog_id_20151209-142858-hiRm1ljA,path=/tmp/seabios-20151209-142858-hiRm1ljA,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20151209-142858-hiRm1ljA,iobase=0x402 \ -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \ -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \ -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \ -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \ -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/home/autotest/client/tests/virt/shared/data/images/rhel71-64-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=05 \ -device virtio-net-pci,mac=9a:b2:b3:b4:b5:b6,id=idL2KZkA,vectors=4,netdev=idm0QPZl,bus=pci.0,addr=06 \ -netdev tap,id=idm0QPZl,vhost=on,vhostfd=23,fd=22 \ -m 16384 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'Opteron_G3' \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -spice port=3000,password=123456,addr=0,tls-port=3200,x509-dir=/tmp/spice_x509d,tls-channel=main,tls-channel=inputs,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm 2.(qemu) system_reset 3. Repeat step2 20 times Actual results: qemu core dump Expected results: guest reboot successfully and works well Additional info: