Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Possibly a duplicate of bug 1493890 - at any rate, killing the NBD server while the VM is trying to access it should gracefully trigger EIO errors to the guest, and keep the QMP monitor responsive. I'll try to reproduce this to see if the fixes being made for other NBD bugs also cover this issue
(In reply to Longxiang Lyu from comment #0)
> Description of problem:
> fail to quit qemu after stopping NBD service during block mirror
>
> Version-Release number of selected component (if applicable):
> kernel-3.10.0-709.el7.x86_64
> qemu-kvm-rhev-2.9.0-16.el7_4.8
>
> How reproducible:
> 100%
>
> Steps to Reproduce:
> 1. use qemu to export a disk as NBD driver
> # qemu-kvm -drive file=test.qcow2,format=raw,id=img0 -qmp
> tcp:0:5555,server,nowait -monitor stdio -incoming tcp:0:6666
> qmp:
> { "execute": "qmp_capabilities" }
> { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet",
> "data": { "host": "10.66.11.1", "port": "9000" } } } }
> { "execute": "nbd-server-add", "arguments": { "device": "img0", "writable":
> true } }
Presumably, this has to be a big enough disk, with non-zero contents, that...
> { "execute": "block-stream", "arguments": { "device": "img1", "on-error":
> "report" } }
>
> 4. stop shutdown NBD server
> { "execute" : "nbd-server-stop", "arguments" : {} }
you have enough time to kill the NBD server before the block-stream has a chance to complete (if the block-stream runs to completion, because the disk being streamed is too trivial, then I can't reproduce the hang).
At any rate, am I correct that it is the client that is hanging, and not the server? Can you use qemu-nbd instead of qemu-kvm as the server (to make it less confusing WHICH process is hanging)?
Description of problem: fail to quit qemu after stopping NBD service during block mirror Version-Release number of selected component (if applicable): kernel-3.10.0-709.el7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.8 How reproducible: 100% Steps to Reproduce: 1. use qemu to export a disk as NBD driver # qemu-kvm -drive file=test.qcow2,format=raw,id=img0 -qmp tcp:0:5555,server,nowait -monitor stdio -incoming tcp:0:6666 qmp: { "execute": "qmp_capabilities" } { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet", "data": { "host": "10.66.11.1", "port": "9000" } } } } { "execute": "nbd-server-add", "arguments": { "device": "img0", "writable": true } } 2. boot up a VM with the NBD image as second driver ... -drive file=/home/test/streamnbd/test.raw,format=raw,if=none,cache=none,snapshot=off,rerror=stop,werror=stop,id=img0 \ -device ide-hd,bus=ide.0,unit=0,drive=img0,id=ide-disk0,bootindex=0 \ -drive file=nbd://10.66.11.1:9000/img0,format=qcow2,if=none,cache=none,snapshot=off,rerror=stop,werror=stop,id=img1 \ -device ide-hd,bus=ide.0,unit=1,drive=img1,id=ide-disk1 \ … 3. in qmp, block stream the second block device { "execute": "blockdev-snapshot-sync", "arguments": { "device": "img1", "snapshot-file": "/home/test/streamnbd/sn1.qcow2", "format": "qcow2", "mode": "absolute-paths" } } { "execute": "block-stream", "arguments": { "device": "img1", "on-error": "report" } } 4. stop shutdown NBD server { "execute" : "nbd-server-stop", "arguments" : {} } 5. quit qemu Actual results: qmp output: {"timestamp": {"seconds": 1505976850, "microseconds": 793574}, "event": "BLOCK_JOB_ERROR", "data": {"device": "img1", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1505976850, "microseconds": 793693}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "img1", "len": 21474836480, "offset": 2714238976, "speed": 0, "type": "stream", "error": "Input/output error"}} {"timestamp": {"seconds": 1505976861, "microseconds": 106791}, "event": "SHUTDOWN", "data": {"guest": false}} fail to quit qemu. Expected results: qemu could quit. Additional info: # gdb -batch -ex bt -p 28291 [New LWP 28323] [New LWP 28318] [New LWP 28307] [New LWP 28306] [New LWP 28305] [New LWP 28304] [New LWP 28292] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x00007fe33e9ddaff in ppoll () from /lib64/libc.so.6 #0 0x00007fe33e9ddaff in ppoll () from /lib64/libc.so.6 #1 0x0000558dafe2efbb in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0x0000558dafe30c75 in aio_poll (ctx=ctx@entry=0x558db12f3980, blocking=<optimized out>) at util/aio-posix.c:622 #4 0x0000558dafdbf4a4 in bdrv_flush (bs=bs@entry=0x558db145c800) at block/io.c:2418 #5 0x0000558dafd7b84b in bdrv_close (bs=0x558db145c800) at block.c:2949 #6 bdrv_delete (bs=0x558db145c800) at block.c:3139 #7 bdrv_unref (bs=0x558db145c800) at block.c:4116 #8 0x0000558dafd7b65d in bdrv_set_backing_hd (bs=bs@entry=0x558db147a800, backing_hd=backing_hd@entry=0x0, errp=0x558db08466f8 <error_abort>) at block.c:1988 #9 0x0000558dafd7b895 in bdrv_close (bs=0x558db147a800) at block.c:2961 #10 bdrv_delete (bs=0x558db147a800) at block.c:3139 #11 bdrv_unref (bs=0x558db147a800) at block.c:4116 #12 0x0000558dafdb3634 in blk_remove_bs (blk=blk@entry=0x558db12d85a0) at block/block-backend.c:552 #13 0x0000558dafdb367b in blk_remove_all_bs () at block/block-backend.c:306 #14 0x0000558dafd78968 in bdrv_close_all () at block.c:3009 #15 0x0000558dafb2122b in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4737