Bug 1522752 - [data-plane]qemu core dump when live snapshots two disks together within a transaction, with one "snapshot-file" specified to a non-existed path
Summary: [data-plane]qemu core dump when live snapshots two disks together within a tr...
Status: CLOSED DUPLICATE of bug 1519721
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Stefan Hajnoczi
QA Contact: Qianqian Zhu
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-06 11:29 UTC by Qianqian Zhu
Modified: 2018-01-04 10:12 UTC (History)
6 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2018-01-04 10:12:50 UTC


Attachments (Terms of Use)

Description Qianqian Zhu 2017-12-06 11:29:02 UTC
Description of problem:
Launch guest with iothreads, then create live snapshots simultaneously for two disks with transaction, make the second snapshot fail, qemu will core dump.

Without iothreads, it works as expected: qemu will give an error: "Could not create file: Not a directory", and all jobs in that transaction will be cancelled.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-11.el7.x86_64
kernel-3.10.0-808.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Launch guest:
/usr/libexec/qemu-kvm -M q35,accel=kvm,kernel-irqchip=split \
-m 4G \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,iothread=iothread1 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=gluster://10.66.8.133/gv2/rhel69-64-virtio-scsi.qcow2 \
 -device scsi-hd,id=image1,drive=drive_image1 \
-device virtio-net-pci,mac=9a:89:8a:8b:8c:8d,id=idTqeiKU,netdev=idU17IIx,bus=pcie.0 \
-object iothread,id=iothread1  \
-netdev tap,id=idU17IIx  \
-cpu 'SandyBridge',+kvm_pv_unhalt  \
-vnc :0   \
-enable-kvm  \
-qmp stdio \
-object iothread,id=iothread2 \
-object iothread,id=iothread3 \
-device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pcie.0,iothread=iothread2 -drive id=drive_image2,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=gluster://10.66.8.133/gv2/data1.qcow2 \
-device scsi-hd,id=image2,drive=drive_image2

2. live snapshot transaction, the second job has a invalid path: ["snapshot-file": "/home/test/test/test/test"]

{ "execute": "transaction", "arguments": { "actions": [ { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive_image2","snapshot-file": "gluster://10.66.8.119/gv0/sndd","mode": "absolute-paths", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive_image1", "snapshot-file": "/home/test/test/test/test","mode": "absolute-paths", "format": "qcow2" } }]}}


Actual results:
qemu core dump:
{ "execute": "transaction", "arguments": { "actions": [ { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive_image2","snapshot-file": "gluster://10.66.8.119/gv0/sndd","mode": "absolute-paths", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive_image1", "snapshot-file": "/home/test/test/test/test","mode": "absolute-paths", "format": "qcow2" } }]}}
Formatting 'gluster://10.66.8.119/gv0/sndd', fmt=qcow2 size=2147483648 backing_file=gluster://10.66.8.133/gv2/data1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/home/test/test/test/test', fmt=qcow2 size=21474836480 backing_file=gluster://10.66.8.133/gv2/rhel69-64-virtio-scsi.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu: qemu_mutex_unlock: Operation not permitted
Aborted (core dumped)


Expected results:
All jobs in that transaction should be cancelled, guest still operate on initial images.

Additional info:

Comment 2 Qianqian Zhu 2017-12-07 05:28:28 UTC
Formatting '/home/test/test/test/test', fmt=qcow2 size=21474836480 backing_file=gluster://10.66.8.133/gv2/rhel69-64-virtio-scsi.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu: qemu_mutex_unlock: Operation not permitted

Program received signal SIGABRT, Aborted.
0x00007fffed9f01f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install elfutils-libelf-0.168-5.el7.x86_64 elfutils-libs-0.168-5.el7.x86_64 glibc-2.17-194.el7.x86_64 glusterfs-3.12.2-1.el7rhgs.x86_64 glusterfs-api-3.12.2-1.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-1.el7rhgs.x86_64 glusterfs-libs-3.12.2-1.el7rhgs.x86_64 libcurl-7.29.0-35.el7.x86_64 libgcc-4.8.5-14.el7.x86_64 libibverbs-13-5.el7.x86_64 librdmacm-13-5.el7.x86_64 libstdc++-4.8.5-14.el7.x86_64 nss-softokn-freebl-3.28.3-4.el7.x86_64 nss-util-3.28.4-2.el7.x86_64 spice-server-0.12.8-1.el7.x86_64 systemd-libs-219-38.el7.x86_64
(gdb) 
(gdb) bt
#0  0x00007fffed9f01f7 in raise () at /lib64/libc.so.6
#1  0x00007fffed9f18e8 in abort () at /lib64/libc.so.6
#2  0x00005555557989eb in error_exit (err=<optimized out>, msg=msg@entry=0x555555baae90 <__func__.23806> "qemu_mutex_unlock") at util/qemu-thread-posix.c:36
#3  0x0000555555abe816 in qemu_mutex_unlock (mutex=mutex@entry=0x555556d35c60) at util/qemu-thread-posix.c:96
#4  0x0000555555aba3c9 in aio_context_release (ctx=ctx@entry=0x555556d35c00) at util/async.c:494
#5  0x0000555555a45958 in bdrv_flush (bs=bs@entry=0x55555783d400) at block/io.c:2288
#6  0x00005555559fbb0b in bdrv_unref (bs=0x55555783d400) at block.c:3157
#7  0x00005555559fbb0b in bdrv_unref (bs=0x55555783d400) at block.c:3347
#8  0x00005555559fbb0b in bdrv_unref (bs=0x55555783d400) at block.c:4332
#9  0x00005555558acbc9 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=<optimized out>, errp=errp@entry=0x7fffffffc5a8) at blockdev.c:2272
#10 0x00005555558c40fd in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7fffffffc618) at qmp-marshal.c:6118
#11 0x0000555555ab0a69 in qmp_dispatch (errp=0x7fffffffc610, request=0x555558086800, cmds=0x555556081c40 <qmp_commands>) at qapi/qmp-dispatch.c:104
#12 0x0000555555ab0a69 in qmp_dispatch (cmds=0x555556081c40 <qmp_commands>, request=request@entry=0x555558086800) at qapi/qmp-dispatch.c:131
#13 0x00005555557dafe5 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3881
#14 0x0000555555ab60b8 in json_message_process_token (lexer=0x555556dc27f8, input=0x555556cd8720, type=JSON_RCURLY, x=395, y=1) at qobject/json-streamer.c:105
#15 0x0000555555ad319b in json_lexer_feed_char (lexer=lexer@entry=0x555556dc27f8, ch=125 '}', flush=flush@entry=false) at qobject/json-lexer.c:323
#16 0x0000555555ad325e in json_lexer_feed (lexer=0x555556dc27f8, buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373
#17 0x0000555555ab6179 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124
#18 0x00005555557d99de in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3923
#19 0x0000555555a65c4f in fd_chr_read (chan=0x555556cefb00, cond=<optimized out>, opaque=0x555556d02820) at chardev/char-fd.c:66
#20 0x00007fffef4a4d7a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#21 0x0000555555abbd5c in main_loop_wait () at util/main-loop.c:213
#22 0x0000555555abbd5c in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#23 0x0000555555abbd5c in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
#24 0x000055555579d86a in main () at vl.c:1917
#25 0x000055555579d86a in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4805

Comment 3 Ademar Reis 2017-12-29 20:05:50 UTC
Maybe a duplicate or sharing the same root cause of Bug 1519721

Comment 4 Stefan Hajnoczi 2018-01-03 11:13:21 UTC
Please retest with qemu-kvm-rhev-2.10.0-14.el7.  It contains fixes for the QMP 'transaction' command with dataplane.

Thanks,
Stefan

Comment 5 Qianqian Zhu 2018-01-04 02:28:41 UTC
Test with qemu-kvm-rhev-2.10.0-14.el7.x86_64, the issue has been fixed.

Steps same as comment 0.

Result:
Qmp prompts correct warning, and all the block jobs within the transaction are cancelled. Both qemu and geust works well after then.

{ "execute": "transaction", "arguments": { "actions": [ { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive_image2","snapshot-file": "gluster://10.66.8.119/gv0/sndd","mode": "absolute-paths", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive_image1", "snapshot-file": "/home/test/test/test/test","mode": "absolute-paths", "format": "qcow2" } }]}}
Formatting 'gluster://10.66.8.119/gv0/sndd', fmt=qcow2 size=2147483648 backing_file=gluster://10.66.8.133/gv2/data1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/home/test/test/test/test', fmt=qcow2 size=21474836480 backing_file=gluster://10.66.8.133/gv2/rhel69-64-virtio-scsi.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
{"error": {"class": "GenericError", "desc": "Could not create file: No such file or directory"}}

{ "execute": "query-block"}
{"return": [{"io-status": "ok", "device": "drive_image1", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 21474836480, "filename": "gluster://10.66.8.133/gv2/rhel69-64-virtio-scsi.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 4457431040, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#block117", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": false}, "file": "gluster://10.66.8.133/gv2/rhel69-64-virtio-scsi.qcow2", "encryption_key_missing": false}, "qdev": "image1", "type": "unknown"}, {"io-status": "ok", "device": "drive_image2", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 2147483648, "filename": "gluster://10.66.8.133/gv2/data1.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 2094534656, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#block308", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": false}, "file": "gluster://10.66.8.133/gv2/data1.qcow2", "encryption_key_missing": false}, "qdev": "image2", "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "type": "unknown"}]}

Comment 6 Stefan Hajnoczi 2018-01-04 10:12:50 UTC

*** This bug has been marked as a duplicate of bug 1519721 ***


Note You need to log in before you can comment on or make changes to this bug.