Bug 1290743

Summary: qemu-kvm core dumped when repeat system_reset 20 times during guest boot
Product: Red Hat Enterprise Linux 6 Reporter: Qianqian Zhu <qizhu>
Component: qemu-kvmAssignee: Gerd Hoffmann <kraxel>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.8CC: ailan, chayang, jen, juzhang, kraxel, mkenneth, qizhu, rbalakri, virt-bugs, virt-maint, yfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.489.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-10 21:02:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qianqian Zhu 2015-12-11 10:12:21 UTC
Description of problem:
qemu-kvm core dumped when repeat system_reset 20 times during guest boot

Version-Release number of selected component (if applicable):

kernel-2.6.32-590.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.481.el6.x86_64
seabios-0.6.1.2-30.el6.x86_64

How reproducible:
1/1

Steps to Reproduce:
1.launch guest with:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'virt-tests-vm1' \
    -machine rhel6.6.0  \
    -nodefaults  \
    -vga qxl \
    -device intel-hda,bus=pci.0,addr=03 \
    -device hda-duplex  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20151209-142858-hiRm1ljA,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151209-142858-hiRm1ljA,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idEJRiq4  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151209-142858-hiRm1ljA,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04  \
    -chardev socket,id=devvs,path=/tmp/virtio_port-vs-20151209-142858-hiRm1ljA,server,nowait \
    -device virtserialport,chardev=devvs,name=vs,id=vs,bus=virtio_serial_pci0.0  \
    -chardev socket,id=seabioslog_id_20151209-142858-hiRm1ljA,path=/tmp/seabios-20151209-142858-hiRm1ljA,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20151209-142858-hiRm1ljA,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/home/autotest/client/tests/virt/shared/data/images/rhel71-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=05 \
    -device virtio-net-pci,mac=9a:b2:b3:b4:b5:b6,id=idL2KZkA,vectors=4,netdev=idm0QPZl,bus=pci.0,addr=06  \
    -netdev tap,id=idm0QPZl,vhost=on,vhostfd=23,fd=22  \
    -m 16384  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G3' \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=3000,password=123456,addr=0,tls-port=3200,x509-dir=/tmp/spice_x509d,tls-channel=main,tls-channel=inputs,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm

2.(qemu) system_reset
3. Repeat step2 20 times

Actual results:
qemu core dump

Expected results:
guest reboot successfully and works well 

Additional info:

Comment 3 Chao Yang 2015-12-14 04:11:30 UTC
This bug was exposed on AMD system while performing system_reset in a loop. Same test was done on Intel but with different cpu model, smp as well as memory size, unfortunately, it is not reproducible.

QE will try same qemu-kvm cmd again on Intel then update here with test result.

Comment 4 Yanan Fu 2015-12-14 08:21:16 UTC
(In reply to Chao Yang from comment #3)
> This bug was exposed on AMD system while performing system_reset in a loop.
> Same test was done on Intel but with different cpu model, smp as well as
> memory size, unfortunately, it is not reproducible.
> 
> QE will try same qemu-kvm cmd again on Intel then update here with test
> result.

This bug can be reproduced with intel host, with same smp and vcpus(smp=16384, cpu=8):
-----------------------------------------
    -m 16384  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Westmere' \

(CLI for autotest: 
python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 --mem=16384 --vcpu=8)

job link:
http://10.66.4.244/kvm_autotest_job_log/?jobid=1171125

How reproducible:
2/5

The reason seems same with AMD host ( .DEBUG file in autotest log):
12/14 13:47:02 INFO |   aexpect:0965| [qemu output] qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/hw/qxl.c:1775: qxl_send_events: Assertion `qemu_spice_display_is_running(&d->ssd)' failed.
12/14 13:47:16 WARNI|env_proces:1296| virt-tests-vm1 is not alive. Can't query the register status
12/14 13:47:16 INFO |   aexpect:0965| [qemu output] /tmp/aexpect/8eQT9SkV/aexpect-0YTy8k.sh: line 1:  9186 Aborted                 (core dumped)
... 
12/14 13:47:16 INFO |   aexpect:0965| [qemu output] (Process terminated with status 134)


check the report file (debug/crash.qemu-kvm.9186/report):
#0  0x00007f5d8d2c3625 in raise () from /lib64/libc.so.6
#0  0x00007f5d8d2c3625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f5d8d2c4e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007f5d8d2bc74e in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007f5d8d2bc810 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f5d909ffbed in qxl_send_events (d=0x7f5d96572840, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1775
        old_pending = <value optimized out>
        le_events = 1
        __PRETTY_FUNCTION__ = "qxl_send_events"
        __FUNCTION__ = "qxl_send_events"
#5  0x00007f5d90a021a5 in qxl_push_free_res (sin=0x7f5d96572ad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708
        ring = 0x7f5957bff434
        notify = <value optimized out>
#6  interface_release_resource (sin=0x7f5d96572ad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760
        qxl = 0x7f5d96572840
        ring = 0x7f5957bff434
        id = <value optimized out>
#7  0x00007f5d8daef2cf in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#8  0x00007f5d8dafe1a4 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#9  0x00007f5d8dae1087 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#10 0x00007f5d8dafcd06 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#11 0x00007f5d90381a51 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#12 0x00007f5d8d37996d in clone () from /lib64/libc.so.6
No symbol table info available.

Comment 5 Yanan Fu 2015-12-15 01:49:26 UTC
Sorry for make a mistake before:
-------------------
How reproducible:
2/5                 ====>2/100
-------------------
I run this case with autotest for 5 times, and 2 times fail. and in one repeat,autotest will do "system_reset" for 20 times.

Test with the same intel host,and with "--mem=16384 --vcpu=8", repeat 5 times which means do "system_reset" 5*20=100 times, all are ok.

Comment 6 Yanan Fu 2015-12-15 05:56:53 UTC
Test with latest RHEL6.7.z, hit the same issue, so this is not a regression bug.
(intel, --mem=16384 --vcpu=8)

kernel:kernel-2.6.32-573.13.1.el6.x86_64
qemu-kvm:qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64
spice: spice-server-debuginfo-0.12.4-12.el6.x86_64

bt:
#0  0x00007fab07c02625 in raise () from /lib64/libc.so.6
#0  0x00007fab07c02625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fab07c03e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fab07bfb74e in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007fab07bfb810 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007fab0b33ec0d in qxl_send_events (d=0x7fab1238b840, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1775
        old_pending = <value optimized out>
        le_events = 1
        __PRETTY_FUNCTION__ = "qxl_send_events"
        __FUNCTION__ = "qxl_send_events"
#5  0x00007fab0b3411c5 in qxl_push_free_res (sin=0x7fab1238bad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708
        ring = 0x7fa6cfbff434
        notify = <value optimized out>
#6  interface_release_resource (sin=0x7fab1238bad8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760
        qxl = 0x7fab1238b840
        ring = 0x7fa6cfbff434
        id = <value optimized out>
#7  0x00007fab0842e2cf in red_destroy_surface (worker=0x7faad40008c0, surface_id=1) at red_worker.c:1763
        surface = 0x7faad4000c38
        dcc = 0x1
        link = <value optimized out>
        next = <value optimized out>
#8  0x00007fab0843d1a4 in dev_destroy_surfaces (opaque=0x7faad40008c0, payload=<value optimized out>) at red_worker.c:11223
        i = <value optimized out>
#9  handle_dev_destroy_surfaces (opaque=0x7faad40008c0, payload=<value optimized out>) at red_worker.c:11246
        worker = 0x7faad40008c0
#10 0x00007fab08420087 in dispatcher_handle_single_read (dispatcher=0x7fab0d3951d8) at dispatcher.c:139
        ret = <value optimized out>
        type = <value optimized out>
        msg = 0x7fab0d3954b8
        ack = 4294967295
        payload = 0x7faad41da300 "@\t"
#11 dispatcher_handle_recv_read (dispatcher=0x7fab0d3951d8) at dispatcher.c:162
No locals.
#12 0x00007fab0843bd06 in red_worker_main (arg=<value optimized out>) at red_worker.c:12231
        events = <value optimized out>
        i = <value optimized out>
        num_events = 1
        timers_queue_timeout = <value optimized out>
        worker = 0x7faad40008c0
        __FUNCTION__ = "red_worker_main"
#13 0x00007fab0acc0a51 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x00007fab07cb896d in clone () from /lib64/libc.so.6
No symbol table info available.

Comment 7 Gerd Hoffmann 2015-12-15 12:09:25 UTC
Candidate:

commit 4a46c99c8118586f19894fe66fc6e353f159d4d9
Author: Gerd Hoffmann <kraxel>
Date:   Tue Oct 29 13:29:43 2013 +0100

    qxl: replace pipe signaling with bottom half
    
    qxl creates a pipe, then writes something to it to wake up the iothread
    from the spice server thread to raise an irq.  These days qemu bottom
    halves can be scheduled from threads and signals, so there is no reason
    to do this any more.  Time to clean it up.
    
    Signed-off-by: Gerd Hoffmann <kraxel>

Testbuild:

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=10232971

Comment 8 Qianqian Zhu 2015-12-16 08:07:45 UTC
Retest with:
qemu-kvm-0.12.1.2-2.481.el6.bz1290743.1.x86_64
kernel-2.6.32-590.el6.x86_64

Steps to Reproduce:
1.Launch guest with:

    -vga qxl \
    -m 16384  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Westmere' \

2.(qemu) system_reset
3. Repeat step2 20 times

Results:
Repeat 10 times, all END GOOD, without core dump. Can't be reproduced.


Additional Info:
(CLI for autotest: 
python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 --mem=16384 --vcpu=8
job link:http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1173313)

Comment 9 Gerd Hoffmann 2016-01-15 14:06:55 UTC
patch sent.

Comment 10 Jeff Nelson 2016-01-22 16:37:40 UTC
Fix included in qemu-kvm-0.12.1.2-2.485.el6

Comment 12 Qianqian Zhu 2016-01-26 09:08:28 UTC
Test failed, so reassign.

Package version:
qemu-kvm-0.12.1.2-2.485.el6
kernel-2.6.32-604.el6.x86_64

Steps:
1.Launch guest with:
    -vga qxl \
    -m 2048  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=1  \
    -cpu 'Opteron_G2' \
2.(qemu) system_reset
3. Repeat step2 20 times

(CLI for autotest: 
python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=5 )

Result:
Hit once core dump among 5 times repeat.
Core file: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/rhel6/bug1290743/qemu-485/

#bt full
Program: /usr/libexec/qemu-kvm
PID: 5121
Signal: 6
Hostname: amd-5400b-4-3.englab.nay.redhat.com
Time of the crash (according to kernel): Tue Jan 26 15:40:38 2016
Program backtrace:
Missing separate debuginfo for 
Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/10/0ef2d7b308bb30d5d714867feda1e49712632d
[New Thread 5130]
[New Thread 5129]
[New Thread 5121]
[New Thread 10426]
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/libexec/qemu-kvm -S -name virt-tests-vm1 -machine rhel6.6.0 -nodefaults -v'.
Program terminated with signal 6, Aborted.
#0  0x00007f6b6ec7c625 in raise () from /lib64/libc.so.6
#0  0x00007f6b6ec7c625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f6b6ec7de05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007f6b6ec7574e in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007f6b6ec75810 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f6b723b8c1e in qxl_send_events (d=0x7f6b744b0320, events=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1769
        old_pending = <value optimized out>
        le_events = 1
        __PRETTY_FUNCTION__ = "qxl_send_events"
#5  0x00007f6b723bb115 in qxl_push_free_res (sin=0x7f6b744b05b8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:708
        ring = 0x7f6adbbff434
        notify = <value optimized out>
#6  interface_release_resource (sin=0x7f6b744b05b8, ext=...) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:760
        qxl = 0x7f6b744b0320
        ring = 0x7f6adbbff434
        id = <value optimized out>
#7  0x00007f6b6f4a82cf in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#8  0x00007f6b6f4b71a4 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#9  0x00007f6b6f49a087 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#10 0x00007f6b6f4b5d06 in ?? () from /usr/lib64/libspice-server.so.1
No symbol table info available.
#11 0x00007f6b71d3aa51 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#12 0x00007f6b6ed3296d in clone () from /lib64/libc.so.6
No symbol table info available.

Comment 13 Chao Yang 2016-01-27 03:28:00 UTC
Gerd,

Please have a look at Comment 12

Comment 14 Gerd Hoffmann 2016-01-27 07:25:38 UTC
Seems there is a spice-server bug we have to workaround:

commit 511aefb0c60e3063ead76d4ba6aabf619eed18ef
Author: Alon Levy <alevy>
Date:   Thu Nov 1 14:56:00 2012 +0200

    hw/qxl: qxl_send_events: nop if stopped
    
    Added a trace point for easy logging.
    
    RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=870972
    
    Signed-off-by: Alon Levy <alevy>
    Signed-off-by: Gerd Hoffmann <kraxel>

Please test:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=10397861

Comment 15 Qianqian Zhu 2016-01-29 09:10:22 UTC
Retest with:
qemu-kvm-0.12.1.2-2.486.el6.bz1290743.1.x86_64
kernel-2.6.32-604.el6.x86_64

Steps:
1.Launch guest with:
    -vga qxl \
    -m 2048  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=1  \
    -cpu 'Opteron_G2' \
2.(qemu) system_reset
3. Repeat step2 20 times

Results:
Repeat 30 times, with --nrepeat=10 and --nrepeat=20 seperately, all END GOOD, without core dump. Can't be reproduced.


Additional Info:
CLI for autotest: 
python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=20
job link:
http://10.66.4.244/kvm_autotest_job_log/?jobid=1206293
http://10.66.4.244/kvm_autotest_job_log/?jobid=1206568

Comment 16 Jeff Nelson 2016-02-19 00:10:55 UTC
Fix included in qemu-kvm-0.12.1.2-2.489.el6

Comment 17 Qianqian Zhu 2016-02-29 08:11:32 UTC
Verified with:
qemu-kvm-rhev-0.12.1.2-2.489.el6.x86_64
kernel-2.6.32-615.el6.x86_64

Steps:
1.Launch guest with:
    -vga qxl \
    -m 2048  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=1  \
    -cpu 'Opteron_G2' \
2.(qemu) system_reset
3. Repeat step2 20 times

Results:
Repeat the above steps twice, and all PASS, no core dump.

Additional Info:
CLI for autotest: 
python ConfigTest.py --guestname=RHEL.7.1 --imageformat=qcow2 --platform=x86_64 --driveformat=virtio_blk --display=spice --testcase=system_reset_during_boot --nrepeat=20
job link:
http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1239747
http://10.66.4.244/kvm_autotest_for_auto_job_detail/?jobid=1243292

Comment 20 errata-xmlrpc 2016-05-10 21:02:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0815.html