Bug 869586

Summary:

core dump happens when quitting qemu via monitor

Product:

Red Hat Enterprise Linux 6

Reporter:

Xiaoqing Wei <xwei>

Component:

qemu-kvm

Assignee:

Laszlo Ersek <lersek>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.4

CC:

acathrow, areis, bsarathy, dron, flang, juzhang, kwolf, lnovich, michen, mkenneth, pbonzini, qzhang, shuang, stefanha, virt-maint

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

qemu-kvm-0.12.1.2-2.370.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

959102 (view as bug list)

Environment:

Last Closed:

2013-11-21 05:54:48 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

959102

Attachments:

Description	Flags
thread apply all bt full	none
core dump file - 0	none
core dump file - 1	none
core dump file - 2	none
BH list from coredump in comments 17-19	none
virtio-blk: fix unplug + virsh reboot (RHEL-6 backport of 69b302b2)	none

Description Xiaoqing Wei 2012-10-24 10:43:59 UTC

Description of problem:

core dump happens when quitting qemu via monitor

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-0.12.1.2-2.330.el6.x86_64

How reproducible:
only once [tried 50+, didn't manage to reproduce]

Steps to Reproduce:
1.boot a guest:
/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/qemu -name 'vm1' -nodefaults -chardev socket,id=qmp_monitor_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20121023-145830-Xq8j,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=control -chardev socket,id=serial_id_20121023-145830-Xq8j,path=/tmp/serial-20121023-145830-Xq8j,server,nowait -device isa-serial,chardev=serial_id_20121023-145830-Xq8j -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 -drive file='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/images/RHEL-Server-6.3-64-virtio.qcow2',if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=off,snapshot=on,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=id6vSJZt,mac=9a:0b:83:c4:5f:72,id=ndev00id6vSJZt,bus=pci.0,addr=0x3 -netdev tap,id=id6vSJZt,vhost=on,fd=24 -m 16384 -smp 2,cores=1,threads=1,sockets=2 -cpu 'Penryn' -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -spice port=3000,password=123456,addr=0,tls-port=3200,x509-dir=/tmp/spice_x509d,tls-channel=main,tls-channel=inputs,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4 -vga qxl -global qxl-vga.vram_size=33554432 -rtc base=utc,clock=host,driftfix=slew -M rhel6.4.0 -boot order=cdn,once=c,menu=off    -no-kvm-pit-reinjection -enable-kvm 

2.quitting a qemu via monitor
{'execute': 'quit', 'id': '2uD5b7hI'}
3.
  
Actual results:
qemu core dumps

Expected results:
qemu quit w/o error/core dump

Additional info:
Core was generated by `/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/qemu -name vm1 -nod'.
Program terminated with signal 11, Segmentation fault.
#0  qemu_bh_poll () at async.c:65
65	        if (!bh->deleted && bh->scheduled) {
(gdb) bt
#0  qemu_bh_poll () at async.c:65
#1  0x00007f9b734ce039 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4022
#2  0x00007f9b734efe7a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#3  0x00007f9b734d0b4a in main_loop (argc=44, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4192
#4  main (argc=44, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6528
(gdb)

Comment 1 Xiaoqing Wei 2012-10-24 10:48:41 UTC

Created attachment 632688 [details]
thread apply all bt full

Comment 2 Luiz Capitulino 2012-10-29 12:11:50 UTC

FWIW, this is unlikely to be a monitor/qmp bug. My impression is that either something is corrupting memory and the bh list got corrupted; or a bottom half got destroyed but didn't get removed from the bug list.

Kevin, I remember seeing something similar involving quit in a block layer bug, can you take a quick at this one?

Xiaoping, do you remember what you were doing when the bug happened?

Comment 3 Kevin Wolf 2012-10-29 12:27:15 UTC

Doesn't really ring a bell. I once had problems with nested event loops, where the inner qemu_bh_poll() would free the BH while the outer one was still working on it, but this should be fixed with 648fb0ea (which RHEL 6 does have since February). Paolo fixed a similar bug in aio.c, which RHEL doesn't have, but it would crash in a different place.

I would agree, though, that it's most likely not a monitor bug.

Comment 4 Xiaoqing Wei 2012-10-30 02:20:53 UTC

(In reply to comment #2)

Hi Luiz,

> Xiaoping, do you remember what you were doing when the bug happened?

The Only operation in guest is shutting down(the guest is running w/o any workload), and since the shutdown process is too slow, autotest trying to kill it by monitor, nothing else.

Xiaoqing

Comment 5 Luiz Capitulino 2012-10-30 12:07:27 UTC

So, you do a shutdown in the guest and run quit in the monitor in parallel, I'll try that.

Comment 7 Xiaoqing Wei 2013-04-17 08:16:48 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=952687#c4

Hi Luiz,

we have another similar bz, could you pls help to have a check whether dup ? 

Thx

Comment 8 Luiz Capitulino 2013-04-17 12:19:40 UTC

*** Bug 952687 has been marked as a duplicate of this bug. ***

Comment 10 Laszlo Ersek 2013-04-24 13:28:20 UTC

(In reply to comment #1)
> Created attachment 632688 [details]
> thread apply all bt full

Thread 6: kvm VCPU thread, running guest code
Thread 5: QXL thread (red_worker_main)
Thread 4: kvm VCPU thread, having just exited guest code & waiting for the BQL
Thread 3: Posix AIO thread, waiting for a new AIO request
Thread 2: Posix AIO thread, waiting for a new AIO request
Thread 1: qemu IO thread, finishing off an interation of the main loop,
          crashing in qemu_bh_poll()

qemu_bh_poll() doesn't seem to be reentrant by different threads.

I can see many qemu_aio_flush() / qemu_aio_wait() calls around the source, both of which functions call qemu_bh_poll(). Are we sure no other thread than the IO thread massages the BH list? For example, in the above set of threads, could Thread 3 or Thread 2 (AIO threads) interfere with the BH list?

Comment 13 Paolo Bonzini 2013-04-24 16:38:31 UTC

qemu_bh_poll() is reentrant actually, see the "nesting" variable.

It is also safe to call qemu_bh_schedule/cancel/delete while another thread is calling qemu_bh_poll(), though for cancel the results are of course unpredictable.

Comment 14 Laszlo Ersek 2013-04-24 17:36:24 UTC

(In reply to comment #13)
> qemu_bh_poll() is reentrant actually, see the "nesting" variable.

It is reentrant from within the same thread; it's not reentrant for multiple threads without synchronization. (The "nesting" variable itself is not synchronized.) From comment #10:

> qemu_bh_poll() doesn't seem to be reentrant by different threads.

Then,

> It is also safe to call qemu_bh_schedule/cancel/delete while another thread
> is calling qemu_bh_poll(),

How so? The first three functions modify fields ("scheduled" / "deleted") of the referenced QEMUBH object, while the other thread calling qemu_bh_poll() reads those fields. I can't see any synchronization.

Without synchronization, changes to these fields made by thread T1 are not required to be visible to thread T2 for any length of time. If T1 destroys some "business logic" object and deletes a pending, associated BH with qemu_bh_delete(), T2 may not see the "deleted" field as nonzero in qemu_bh_poll(), and call the callback that supposedly operates on the (already nonexistent) "business logic" object.

Comment 17 CongLi 2013-04-25 02:17:52 UTC

Created attachment 739640 [details]
core dump file - 0

For the fileshare just can keep the file for a period of time, the attachment is the core dump file.

Comment 18 CongLi 2013-04-25 02:23:57 UTC

Created attachment 739641 [details]
core dump file - 1

Comment 19 CongLi 2013-04-25 02:31:01 UTC

Created attachment 739643 [details]
core dump file - 2

Comment 20 Stefan Hajnoczi 2013-04-26 07:23:10 UTC

(In reply to comment #10)
> (In reply to comment #1)
> > Created attachment 632688 [details]
> > thread apply all bt full
> 
> Thread 6: kvm VCPU thread, running guest code
> Thread 5: QXL thread (red_worker_main)
> Thread 4: kvm VCPU thread, having just exited guest code & waiting for the
> BQL
> Thread 3: Posix AIO thread, waiting for a new AIO request
> Thread 2: Posix AIO thread, waiting for a new AIO request
> Thread 1: qemu IO thread, finishing off an interation of the main loop,
>           crashing in qemu_bh_poll()
> 
> qemu_bh_poll() doesn't seem to be reentrant by different threads.

Agreed.  It must be called with the global mutex held.

> I can see many qemu_aio_flush() / qemu_aio_wait() calls around the source,
> both of which functions call qemu_bh_poll(). Are we sure no other thread
> than the IO thread massages the BH list? For example, in the above set of
> threads, could Thread 3 or Thread 2 (AIO threads) interfere with the BH list?

The chance is low.  The POSIX aio workers really are just preadv()/pwritev()/fdatasync()/ioctl() worker functions - they don't call back into QEMU code.

What I find interesting is:

#0  qemu_bh_poll () at async.c:65
        bh = 0xff575700ffff57

The address looks very suspicious since a heap-allocated address should be aligned to sizeof(void*).  This is probably memory corruption.

Comment 21 Paolo Bonzini 2013-04-26 14:58:51 UTC

> > qemu_bh_poll() is reentrant actually, see the "nesting" variable.
> 
> It is reentrant from within the same thread; it's not reentrant for multiple 
> threads without synchronization. (The "nesting" variable itself is not 
> synchronized.) From comment #10:

Right, you're supposed to call it within the "big QEMU lock", which is almost always held.

> > It is also safe to call qemu_bh_schedule/cancel/delete while another thread
> > is calling qemu_bh_poll(),
> 
> How so? The first three functions modify fields ("scheduled" / "deleted") of 
> the referenced QEMUBH object, while the other thread calling qemu_bh_poll() 
> reads those fields. I can't see any synchronization.
>
> Without synchronization, changes to these fields made by thread T1 are not
> required to be visible to thread T2 for any length of time.

The qemu_notify_event() ensures that qemu_bh_poll() is executed from the beginning once more after qemu_bh_schedule().  (It is reasonable to assume that a system call includes a memory barrier).

> If T1 destroys some "business logic" object and deletes a pending, associated 
> BH with qemu_bh_delete(), T2 may not see the "deleted" field as nonzero in
> qemu_bh_poll(), and call the callback that supposedly operates on the 
> (already nonexistent) "business logic" object.

This is beyond the scope of the BH machinery.  This should be handled with some kind of reference counting, or by deferring the actual deletion of the object to a(nother) bottom half or another callback that is guaranteed to be called under the big QEMU lock.  Anyway this doesn't happen in RHEL6, which is almost exclusively single threaded.

Upstream commits related to the bug that Kevin mentioned in comment 3:
- commit 4231c88d27d9e46e6ad6e6b7bbb6e442bcf9cd05
- commit 2db2bfc0ccac5fd68dbf0ceb70fbc372c5d8a8c7
- commit 00f78533326c5ba2e62fafada16655aa558a5520

but this is unlikely to be the same corruption.

Comment 22 Laszlo Ersek 2013-05-07 12:07:41 UTC

Created attachment 744712 [details]
BH list from coredump in comments 17-19

The coredump attached for comments 17-19 saved a little bit different crash.
There are 5 threads. 4 VCPU threads are waiting for qemu_mutex. The IO
thread crashes as follows:

#0  qemu_bh_delete (bh=0x2) at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:118
#1  0x00007f81bfe3adbf in virtio_blk_dma_restart_bh (opaque=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:465
#2  0x00007f81bfe5d031 in qemu_bh_poll () at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:70
#3  0x00007f81bfe26419 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4017
#4  0x00007f81bfe4898a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#5  0x00007f81bfe29018 in main_loop (argc=50, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187
#6  main (argc=50, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6526

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f81bfd9d980 (LWP 15318))]#0  qemu_bh_delete (bh=0x2)
    at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:118
118         bh->scheduled = 0;
(gdb) frame 1
#1  0x00007f81bfe3adbf in virtio_blk_dma_restart_bh (opaque=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:465
465         qemu_bh_delete(s->bh);
(gdb) print s
$1 = (VirtIOBlock *) 0x7f81c0c28270

The BH chain seems more or less sound, but either the VirtIOBlock objects
linked via bh->opaque are garbled, or the opaque link itself is messed up.
We crash under the 5th BH (opaque == 0x7f81c0c28270).

See the attached graph.

Are so many virtio_blk_dma_restart_bh() BHs okay for a virtual machine that
has only two disks?

Comment 23 Laszlo Ersek 2013-05-07 12:28:18 UTC

Is maybe upstream commit 69b302b2044a9a0f6d157d25b39a91ff7124c61f related? The virtlab job linked in comment 16 says "block_hotplug" many times.

Comment 24 Laszlo Ersek 2013-05-07 12:50:49 UTC

Bug 822386, also linked from the virtlab job, has the same stackdump as comment 22 here. Luiz reproduced the problem with the upstream tree on 2012-Aug-16 (see bug 822386 comment 23), but upstream commit 69b302b2044a9a0f6d157d25b39a91ff7124c61f is from Feb 2013.

Comment 25 Laszlo Ersek 2013-05-07 13:11:39 UTC

My reason for suspecting 69b302b2 is this: when we unplug the virtio-blk
disk, the vmstate change handler it has registered in virtio_blk_init() is
not torn down in RHEL-6.

So when we issue a shutdown request (do_quit() -->
qemu_system_shutdown_request()), the vmstate change handler,
virtio_blk_dma_restart_cb(),
- accesses invalid memory directly,
- registers a BH (=virtio_blk_dma_restart_bh) for the nonexistent device.

This sort of seems consistent with the DDD data chart in comment 22, with
several pointer chains leading to "storage0.qcow2", which I suppose is the
disk that's repeatedly (un)plugged. (I would prefer to confirm this against
"block_hotplug.one_pci.fmt_qcow2.block_virtio" but I don't know how to
download the steps for that test case.)

Comment 26 Laszlo Ersek 2013-05-07 15:22:38 UTC

I found a crazy long list of virtio_blk_dma_restart_cb() handlers in the core dump, hanging off "vm_change_state_head".

Comment 29 Laszlo Ersek 2013-05-09 08:33:46 UTC

*** Bug 822386 has been marked as a duplicate of this bug. ***

Comment 32 Laszlo Ersek 2013-05-09 10:29:55 UTC

Created attachment 745608 [details]
virtio-blk: fix unplug + virsh reboot (RHEL-6 backport of 69b302b2)

Comment 39 langfang 2013-06-26 03:39:43 UTC

Reproduce this bug as follow version:

Host:
# uname -r
2.6.32-358.6.1.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.356.el6.x86_64

Guest:

RHEL 5.8-64

2.6.18-348.el5

Because  this bug senario very difficult to reproduce,so choose bug822386 senario to reproduce.

Steps as same as the duplicate Bug 822386(description0)
1.Boot guest
 /usr/libexec/qemu-kvm -monitor stdio -S -chardev socket,id=serial_id_20120515-041452-KkUY,path=/tmp/serial-20120515-041452-KkUY,server,nowait -device isa-serial,chardev=serial_id_20120515-041452-KkUY -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 -drive file=/home/RHEL-Server-5.8-64.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,snapshot=off,readonly=off,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idLYjg29,mac=9a:6e:47:a6:d8:f9,id=ndev00idLYjg29,bus=pci.0,addr=0x3 -netdev tap,id=idLYjg29,vhost=on -m 2048 -smp 4,cores=2,threads=1,sockets=2 -cpu 'Opteron_G4' -drive index=1,if=none,id=drive-ide0-0-0,media=cdrom,boot=off,snapshot=off,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device usb-tablet,id=usb-tablet1,bus=usb1.0 -spice port=8000,disable-ticketing -vga qxl -rtc base=utc,clock=host,driftfix=slew -M rhel6.3.0 -boot order=cdn,once=c,menu=off    -no-kvm-pit-reinjection -enable-kvm
2. In guest: 
#modprobe acpiphp

3. Hotplug-in a virtio-blk 
(qemu) __com.redhat_drive_add id=hot1,file=/root/hot1.raw,format=raw,media=disk 
(qemu) device_add virtio-blk-pci,id=hot_virtio,drive=hot1

4.Delete the virtio-blk device
(qemu) device_del hot_virtio
5. Stop/Cont guest
(qemu) stop

Results:
...
(qemu) __com.redhat_drive_add id=hot1,file=/root/hot1.raw,format=raw,media=disk 
(qemu) device_add virtio-blk-pci,id=hot_virtio,drive=hot1
[New Thread 0x7fffe53fb700 (LWP 8481)]
(qemu) device_del hot_virtio
(qemu) stop
(qemu) con[Thread 0x7fffe53fb700 (LWP 8481) exited]
cont
(qemu) 
Program received signal SIGSEGV, Segmentation fault.
qemu_bh_delete (bh=0x390) at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:118
118	    bh->scheduled = 0;
(gdb) bt
#0  qemu_bh_delete (bh=0x390) at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:118
#1  0x00007ffff7df3daf in virtio_blk_dma_restart_bh (opaque=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:465
#2  0x00007ffff7e15ff1 in qemu_bh_poll () at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:70
#3  0x00007ffff7ddf419 in main_loop_wait (timeout=1000)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4017
#4  0x00007ffff7e0197a in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#5  0x00007ffff7de2008 in main_loop (argc=42, argv=<value optimized out>, 
    envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187


Verify this bug as following version:
Host:
# uname -r
2.6.32-392.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.376.el6.x86_64

Guest:
2.6.18-348.el5


Steps as same as reproduce ,because this bug is difficult to reproduce,so i verify this bug use  script.

 # cat bug869586-loop.sh 
 #!/bin/bash
while true
do
echo "info pci" | nc -U /tmp/monitor-unix
echo "__com.redhat_drive_add id=hot1,file=/root/hot1.raw,format=raw,media=disk" | nc -U /tmp/monitor-unix
echo "device_add virtio-blk-pci,id=hot_virtio,drive=hot1" | nc -U /tmp/monitor-unix;
sleep 3
echo "info pci" | nc -U /tmp/monitor-unix
echo "device_del hot_virtio" | nc -U /tmp/monitor-unix
echo "info pci" | nc -U /tmp/monitor-unix
echo "stop" | nc -U /tmp/monitor-unix
echo "cont" | nc -U /tmp/monitor-unix
sleep 3
done


Resutls:
Tried more than 400 times ,not hit any problem,guest work well.So this bug fixed.

Comment 43 Markus Armbruster 2013-08-09 10:50:08 UTC

*** Bug 918798 has been marked as a duplicate of this bug. ***

Comment 45 errata-xmlrpc 2013-11-21 05:54:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1553.html