Bug 1339498

Summary: Core dump when do 'block-job-complete' after 'drive-mirror'
Product: Red Hat Enterprise Linux 7 Reporter: yduan
Component: qemu-kvm-rhevAssignee: John Snow <jsnow>
Status: CLOSED ERRATA QA Contact: Qianqian Zhu <qizhu>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: chayang, famz, juzhang, knoel, mrezanin, virt-maint, xfu
Target Milestone: rcKeywords: Regression, TestOnly
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-10.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 21:12:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1336310    
Bug Blocks:    

Description yduan 2016-05-25 08:11:30 UTC
Description of problem:
Core dump when do 'block-job-complete' after 'drive-mirror'.

Version-Release number of selected component (if applicable):
Host:
  kernel-3.10.0-408.el7.x86_64
  qemu-kvm-rhev-2.6.0-3.el7.x86_64
Guest:
  kernel-3.10.0-408.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start a VM using following commands:
 ...
 -object iothread,id=iothread0 \
 -drive file=sys.qcow2,format=qcow2,if=none,cache=none,id=drive_sysdisk,werror=stop,rerror=stop \
 -device virtio-blk-pci,drive=drive_sysdisk,id=device_sysdisk,iothread=iothread0,bootindex=0 \
 -drive file=datadisk.qcow2,format=qcow2,if=none,cache=none,id=drive_datadisk1,werror=stop,rerror=stop \
 -device virtio-blk-pci,drive=drive_datadisk1,id=device_datadisk1,iothread=iothread0 \
 ...

2.Block mirror
{"execute":"drive-mirror","arguments":{"device":"drive_datadisk1","target":"dest_datadisk1.qcow2","format":"qcow2","mode":"absolute-paths","sync":"full","speed":1000000000,"on-source-error": "stop", "on-target-error": "stop" } }
{"return": {}}

{"timestamp": {"seconds": 1464157486, "microseconds": 965574}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_datadisk1", "len": 536870912, "offset": 536870912, "speed": 1000000000, "type": "mirror"}}

(qemu) info block-jobs 
Type mirror, device drive_datadisk1: Completed 536870912 of 536870912 bytes, speed limit 1000000000 bytes/s

3.Block job complete
{ "execute": "block-job-complete", "arguments": { "device": "drive_datadisk1"}}
{"return": {}}
{"timestamp": {"seconds": 1464157596, "microseconds": 264218}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_datadisk1", "len": 536870912, "offset": 536870912, "speed": 1000000000, "type": "mirror"}}


(qemu) qemu-kvm: block.c:2338: bdrv_delete: Assertion `bdrv_op_blocker_is_empty(bs)' failed.
0boot.sh: line 38: 26338 Aborted                 (core dumped)

Actual results:
Core dumped.

Expected results:
Block mirror should work well with data-plane.

Additional info:
1.Not reproducible with qemu-kvm-rhev-2.3.0-31.el7_2.12.x86_64.
2.Backtrace
(gdb) bt
#0  0x00007f10d44915f7 in raise () from /lib64/libc.so.6
#1  0x00007f10d4492ce8 in abort () from /lib64/libc.so.6
#2  0x00007f10d448a566 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f10d448a612 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f10dc9f856f in bdrv_delete (bs=0x7f10de16e800) at block.c:2338
#5  bdrv_unref (bs=0x7f10de16e800) at block.c:3376
#6  0x00007f10dc9fb353 in block_job_defer_to_main_loop_bh (opaque=0x7f10de038310) at blockjob.c:476
#7  0x00007f10dc9f3fcd in aio_bh_call (bh=<optimized out>) at async.c:66
#8  aio_bh_poll (ctx=ctx@entry=0x7f10de109980) at async.c:94
#9  0x00007f10dc9fdb70 in aio_dispatch (ctx=0x7f10de109980) at aio-posix.c:308
#10 0x00007f10dc9f3dbe in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
    user_data=<optimized out>) at async.c:233
---Type <return> to continue, or q <return> to quit---
#11 0x00007f10d528f79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#12 0x00007f10dc9fc460 in glib_pollfds_poll () at main-loop.c:213
#13 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:258
#14 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506
#15 0x00007f10dc7c655f in main_loop () at vl.c:1934
#16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4667

Comment 4 John Snow 2016-06-23 21:13:49 UTC
Present upstream in 2.6.0 but absent from today's HEAD. Bisecting reveals the fix to have been:

commit 348295838384941d1e5420d10e57366c4e303d45
Author: Fam Zheng <famz>
Date:   Mon May 23 10:19:37 2016 +0800

    virtio-blk: Remove op blocker for dataplane

contained in this series:

https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg03713.html

Comment 5 Miroslav Rezanina 2016-06-28 13:00:26 UTC
Fix included in qemu-kvm-rhev-2.6.0-10.el7

Comment 6 Qianqian Zhu 2016-08-24 03:55:54 UTC
Reproduced with:
qemu-kvm-rhev-2.6.0-3.el7.x86_64
qemu-img-rhev-2.6.0-3.el7.x86_64
kernel-3.10.0-491.el7.x86_64

Steps:
1. Launch guest with data-plane:
/usr/libexec/qemu-kvm -name linux -cpu SandyBridge -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7bef3814-631a-48bb-bae8-2b1de75f7a13 -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot order=c,menu=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/mntnfs/RHEL-Server-7.3-64-virtio.qcow2,if=none,cache=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0 -object iothread,id=iothread0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on -spice port=5901,disable-ticketing -vga qxl -global qxl-vga.revision=3 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x3 -qmp tcp::5555,server,nowait
2. Do block mirror:
{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"dest_datadisk1.qcow2","format":"qcow2","mode":"absolute-paths","sync":"full","speed":1000000000,"on-source-error": "stop", "on-target-error": "stop" } }
3. { "execute": "block-job-complete", "arguments": { "device": "drive-virtio-disk0"}}

Result:
(qemu) qemu-kvm: block.c:2338: bdrv_delete: Assertion `bdrv_op_blocker_is_empty(bs)' failed.
Aborted (core dumped)

Verified with:
qemu-kvm-rhev-2.6.0-22.el7.x86_64
qemu-img-rhev-2.6.0-22.el7.x86_64
kernel-3.10.0-491.el7.x86_64

Same steps as above.

"block-job-complete" executed successfully, and block reopened without any errors.

(qemu) info block
drive-virtio-disk0 (#block1574): dest_datadisk1.qcow2 (qcow2)
    Cache mode:       writeback, direct
(qemu) info blockstats 
drive-virtio-disk0: rd_bytes=444006400 wr_bytes=7834624 rd_operations=12589 wr_operations=345 flush_operations=91 wr_total_time_ns=172532273888 rd_total_time_ns=118076094831 flush_total_time_ns=5312139277 rd_merged=350 wr_merged=6 idle_time_ns=7861863339

# qemu-img check dest_datadisk1.qcow2
No errors were found on the image.
80664/327680 = 24.62% allocated, 5.51% fragmented, 0.00% compressed clusters
Image end offset: 5287903232

Moving to verified.

Comment 8 errata-xmlrpc 2016-11-07 21:12:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html