1493901 – fail to quit qemu after stopping NBD service during block stream

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1493901 - fail to quit qemu after stopping NBD service during block stream

Summary: fail to quit qemu after stopping NBD service during block stream

Keywords:
Status:	CLOSED DUPLICATE of bug 1482478
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Blake
QA Contact:	Longxiang Lyu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-21 07:09 UTC by Longxiang Lyu
Modified:	2017-10-06 18:03 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-10-06 18:03:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Longxiang Lyu 2017-09-21 07:09:33 UTC

Description of problem:
fail to quit qemu after stopping NBD service during block mirror

Version-Release number of selected component (if applicable):
kernel-3.10.0-709.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.8

How reproducible:
100%

Steps to Reproduce:
1. use qemu to  export a disk as NBD driver
# qemu-kvm -drive file=test.qcow2,format=raw,id=img0 -qmp tcp:0:5555,server,nowait -monitor stdio -incoming tcp:0:6666
qmp:
{ "execute": "qmp_capabilities" }
{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet", "data": { "host": "10.66.11.1", "port": "9000" } } } }
{ "execute": "nbd-server-add", "arguments": { "device": "img0", "writable": true } }

2. boot up a VM with the NBD image as second driver
...
-drive file=/home/test/streamnbd/test.raw,format=raw,if=none,cache=none,snapshot=off,rerror=stop,werror=stop,id=img0 \
-device ide-hd,bus=ide.0,unit=0,drive=img0,id=ide-disk0,bootindex=0 \
-drive file=nbd://10.66.11.1:9000/img0,format=qcow2,if=none,cache=none,snapshot=off,rerror=stop,werror=stop,id=img1 \
-device ide-hd,bus=ide.0,unit=1,drive=img1,id=ide-disk1 \
…

3. in qmp, block stream the second block device
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "img1", "snapshot-file": "/home/test/streamnbd/sn1.qcow2", "format": "qcow2", "mode": "absolute-paths" } }
{ "execute": "block-stream", "arguments": { "device": "img1", "on-error": "report" } }

4. stop shutdown NBD server
{ "execute" : "nbd-server-stop", "arguments" : {} }

5. quit qemu

Actual results:
qmp output:
{"timestamp": {"seconds": 1505976850, "microseconds": 793574}, "event": "BLOCK_JOB_ERROR", "data": {"device": "img1", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1505976850, "microseconds": 793693}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "img1", "len": 21474836480, "offset": 2714238976, "speed": 0, "type": "stream", "error": "Input/output error"}}
{"timestamp": {"seconds": 1505976861, "microseconds": 106791}, "event": "SHUTDOWN", "data": {"guest": false}}

fail to quit qemu.

Expected results:
qemu could quit.

Additional info:
# gdb -batch -ex bt -p 28291
[New LWP 28323]
[New LWP 28318]
[New LWP 28307]
[New LWP 28306]
[New LWP 28305]
[New LWP 28304]
[New LWP 28292]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fe33e9ddaff in ppoll () from /lib64/libc.so.6
#0  0x00007fe33e9ddaff in ppoll () from /lib64/libc.so.6
#1  0x0000558dafe2efbb in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0x0000558dafe30c75 in aio_poll (ctx=ctx@entry=0x558db12f3980, blocking=<optimized out>) at util/aio-posix.c:622
#4  0x0000558dafdbf4a4 in bdrv_flush (bs=bs@entry=0x558db145c800) at block/io.c:2418
#5  0x0000558dafd7b84b in bdrv_close (bs=0x558db145c800) at block.c:2949
#6  bdrv_delete (bs=0x558db145c800) at block.c:3139
#7  bdrv_unref (bs=0x558db145c800) at block.c:4116
#8  0x0000558dafd7b65d in bdrv_set_backing_hd (bs=bs@entry=0x558db147a800, backing_hd=backing_hd@entry=0x0, errp=0x558db08466f8 <error_abort>) at block.c:1988
#9  0x0000558dafd7b895 in bdrv_close (bs=0x558db147a800) at block.c:2961
#10 bdrv_delete (bs=0x558db147a800) at block.c:3139
#11 bdrv_unref (bs=0x558db147a800) at block.c:4116
#12 0x0000558dafdb3634 in blk_remove_bs (blk=blk@entry=0x558db12d85a0) at block/block-backend.c:552
#13 0x0000558dafdb367b in blk_remove_all_bs () at block/block-backend.c:306
#14 0x0000558dafd78968 in bdrv_close_all () at block.c:3009
#15 0x0000558dafb2122b in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4737

Comment 2 Eric Blake 2017-09-26 18:48:34 UTC

Possibly a duplicate of bug 1493890 - at any rate, killing the NBD server while the VM is trying to access it should gracefully trigger EIO errors to the guest, and keep the QMP monitor responsive.  I'll try to reproduce this to see if the fixes being made for other NBD bugs also cover this issue

Comment 3 Eric Blake 2017-09-26 21:37:55 UTC

(In reply to Longxiang Lyu from comment #0)
> Description of problem:
> fail to quit qemu after stopping NBD service during block mirror
> 
> Version-Release number of selected component (if applicable):
> kernel-3.10.0-709.el7.x86_64
> qemu-kvm-rhev-2.9.0-16.el7_4.8
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. use qemu to  export a disk as NBD driver
> # qemu-kvm -drive file=test.qcow2,format=raw,id=img0 -qmp
> tcp:0:5555,server,nowait -monitor stdio -incoming tcp:0:6666
> qmp:
> { "execute": "qmp_capabilities" }
> { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet",
> "data": { "host": "10.66.11.1", "port": "9000" } } } }
> { "execute": "nbd-server-add", "arguments": { "device": "img0", "writable":
> true } }

Presumably, this has to be a big enough disk, with non-zero contents, that...


> { "execute": "block-stream", "arguments": { "device": "img1", "on-error":
> "report" } }
> 
> 4. stop shutdown NBD server
> { "execute" : "nbd-server-stop", "arguments" : {} }

you have enough time to kill the NBD server before the block-stream has a chance to complete (if the block-stream runs to completion, because the disk being streamed is too trivial, then I can't reproduce the hang).

At any rate, am I correct that it is the client that is hanging, and not the server?  Can you use qemu-nbd instead of qemu-kvm as the server (to make it less confusing WHICH process is hanging)?

Comment 5 Longxiang Lyu 2017-09-28 08:05:26 UTC

Steps to reproduce:
1. export the second disk to use: addition.qcow2
# qemu-img info addition.qcow2 
image: addition.qcow2
file format: qcow2
virtual size: 5.0G (5368709120 bytes)
disk size: 5.0G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
# qemu-nbd -f raw addition.qcow2 -p 9000 -t -x addition.qcow2


2. boot up a VM using addition.qcow2 as second disk
#!/bin/bash
/usr/libexec/qemu-kvm \
-name guest=test-virt \
-machine pc-i440fx-rhel7.4.0,accel=kvm \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=4,cores=1,threads=1 \
-boot strict=on \
-drive file=/home/test/streamnbd/test.raw,format=raw,if=none,cache=none,snapshot=off,rerror=stop,werror=stop,id=img0 \
-device virtio-blk-pci,drive=img0,id=disk0,bootindex=0 \
-drive file=nbd://10.66.11.1:9000/addition.qcow2,format=qcow2,if=none,cache=none,snapshot=off,rerror=stop,werror=stop,id=img1 \
-device virtio-blk-pci,drive=img1,id=disk1 \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:12:b3:20:61,bus=pci.0 \
-device qxl-vga \
-usbdevice tablet \
-vnc :2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 \
-monitor stdio \
-qmp tcp:0:4444,server,nowait \

3. in qmp, block stream the second block device
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "img1", "snapshot-file": "/home/test/streamnbd/sn1.qcow2", "format": "qcow2", "mode": "absolute-paths" } }
{ "execute": "block-stream", "arguments": { "device": "img1", "on-error": "report" } }

4. kill qemu-nbd of step1
# kill -9 $(ps aux | grep qemu-nbd | head -n1 | awk '{ print $2 }')

5. quit qemu, the VM hangs.

output from qmp:
{"timestamp": {"seconds": 1506585789, "microseconds": 786009}, "event": "BLOCK_JOB_ERROR", "data": {"device": "img1", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1506585789, "microseconds": 786170}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "img1", "len": 5368709120, "offset": 1425539072, "speed": 0, "type": "stream", "error": "Input/output error"}}

The stream job is forced complete. Then fail to quit the VM.

Comment 6 Eric Blake 2017-10-06 18:03:58 UTC


*** This bug has been marked as a duplicate of bug 1482478 ***

Note You need to log in before you can comment on or make changes to this bug.