Bug 1539530
| Summary: | Guest hung after block-job-complete for a drive-mirror image with NBD backend and iothread. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Yongxue Hong <yhong> |
| Component: | qemu-kvm | Assignee: | Sergio Lopez <slopezpa> |
| Status: | CLOSED DUPLICATE | QA Contact: | aihua liang <aliang> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | --- | CC: | aliang, areis, chayang, coli, juzhang, knoel, qzhang, rbalakri, virt-maint, xuwei, yhong |
| Target Milestone: | rc | ||
| Target Release: | 8.0 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-09-02 07:58:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1649160 | ||
It is also reproduced with blockdev-snapshot-sync and block-stream.
e.g:
[root@c155f3-u23 images]# cat ../commands/guest-85851.sh
/usr/libexec/qemu-kvm \
-name 'guest' \
-machine pseries-rhel7.5.0 \
-m 16G \
-nodefaults \
-smp 8,cores=4,threads=2,sockets=1 \
-boot order=cdn,once=d,menu=off,strict=off \
-device nec-usb-xhci,id=xhci0 \
-device usb-tablet,id=usb-tablet0 \
-device usb-kbd,id=usb-kbd0 \
-chardev socket,id=qmp_id_qmpmonitor,path=/var/tmp/qmp-cmd-monitor-yhong,server,nowait \
-mon chardev=qmp_id_qmpmonitor,mode=control \
-chardev socket,id=serial_id_serial,path=/var/tmp/serial-yhong,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial \
-enable-kvm \
-object iothread,id=iothread0 \
-object iothread,id=iothread1 \
-object iothread,id=iothread2 \
-device virtio-scsi-pci,id=scsi0,iothread=iothread0 \
-device virtio-scsi-pci,id=scsi1,iothread=iothread1 \
-device virtio-scsi-pci,id=scsi2,iothread=iothread2 \
-drive file=nbd:10.0.1.8:10086,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_system \
-device scsi-hd,bus=scsi0.0,drive=drive_system,id=system,bootindex=0 \
-drive file=/home/yhong/images/data1-15G.qcow2,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data1 \
-device scsi-hd,bus=scsi1.0,drive=drive_data1,id=data1 \
-drive file=nbd:10.0.1.8:20001,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data2 \
-device scsi-hd,bus=scsi2.0,drive=drive_data2,id=data2 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=40:f2:e9:5d:9c:03 \
-monitor stdio \
[root@c155f3-u23 images]# nc -U /var/tmp/qmp-cmd-monitor-yhong
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-17.el7)"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{ 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'drive_data2','snapshot-file': '/home/yhong/images/sn2', 'format': 'qcow2', 'mode': 'absolute-paths' } }
{"return": {}}
{ 'execute': 'block-stream', 'arguments': { 'device': 'drive_data2', 'speed':1000000000, 'on-error': 'report' } }
{"return": {}}
{"timestamp": {"seconds": 1516539970, "microseconds": 953874}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data2", "len": 5368709120, "offset": 5368709120, "speed": 1000000000, "type": "stream"}}
guest hung
The operation of NBD server: qemu-img create -f qcow2 -o preallocation=falloc rhel-alt-75-ppc64le-virtio-scsi-30G.qcow2 30G qemu-nbd -f raw rhel-alt-75-ppc64le-virtio-scsi-30G.qcow2 -p 10086 -t -e 10 & for i in `seq 0 10`; do qemu-img create -f qcow2 -o preallocation=falloc disk-test$i.5G.qcow2 5G; done for i in `seq 0 10`; do qemu-nbd -f raw disk-test$i.5G.qcow2 -p 2000$i -t -e 10 & done Yongxue, Does this bug also happen on Power8 and x86? Is this a regression? Thanks, Qunfang It is also reproduced on x86_64.
[root@ibm-x3850x5-07 yhong]# cat guest.sh
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pc \
-nodefaults \
-vga cirrus \
-object iothread,id=iothread0 \
-object iothread,id=iothread1 \
-device virtio-scsi-pci,id=scsi0,iothread=iothread0 \
-device virtio-scsi-pci,id=scsi1,iothread=iothread1 \
-device virtio-scsi-pci,id=scsi2 \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-device pvpanic,ioport=0x505,id=iduHVO78 \
-chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0,server,nowait \
-device isa-serial,chardev=serial_id_serial0 \
-chardev socket,id=seabioslog_id_20180129-013808-Mj5o0JZF,path=/var/tmp/seabios,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20180129-013808-Mj5o0JZF,iobase=0x402 \
-drive file=nbd:10.66.9.109:10086,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_system \
-device scsi-hd,bus=scsi0.0,drive=drive_system,id=system,bootindex=0 \
-drive file=nbd:10.66.9.109:20001,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data1 \
-device scsi-hd,bus=scsi1.0,drive=drive_data1,id=data1 \
-drive file=nbd:10.66.9.109:20002,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data2 \
-device scsi-hd,bus=scsi2.0,drive=drive_data2,id=data2 \
-device virtio-net-pci,mac=9a:1a:1b:1c:1d:1e,id=idHuzuRL,vectors=4,netdev=idYunEv6,bus=pci.0 \
-netdev tap,id=idYunEv6,vhost=on \
-device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-m 8192 \
-smp 16,maxcpus=16,cores=8,threads=1,sockets=2 \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,strict=off,order=cdn,once=d \
-enable-kvm \
-monitor stdio
[root@ibm-x3850x5-07 ~]# nc -U /var/tmp/monitor-qmpmonitor1
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-18.el7)"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{ 'execute': 'drive-mirror', 'arguments': { 'device': 'drive_data1', 'target': '/home/yhong/images/destination1.image', 'sync': 'full'} }
{"return": {}}
{"timestamp": {"seconds": 1517215362, "microseconds": 511234}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data1", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}}
{ "execute": "block-job-complete", "arguments": { "device": "drive_data1"} }
{"return": {}}
{"timestamp": {"seconds": 1517215375, "microseconds": 955169}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data1", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}}
Guest hung.
host : 3.10.0-837.el7.x86_64
guest : 3.10.0-837.el7.x86_64
qemu : qemu-kvm-rhev-2.10.0-18.el7.x86_64
Maybe related to Bug 1525827 GDB trace:
(gdb) bt
#0 0x00007f97c4a70f0f in ppoll () at /lib64/libc.so.6
#1 0x000055cc9204476b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2 0x000055cc9204476b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x000055cc92046435 in aio_poll (ctx=0x55cc950eba40, blocking=blocking@entry=true)
at util/aio-posix.c:629
#4 0x000055cc91fd39fa in nbd_client_close (bs=0x55cc954b8000) at block/nbd-client.c:62
#5 0x000055cc91fd39fa in nbd_client_close (bs=0x55cc954b8000) at block/nbd-client.c:383
#6 0x000055cc91fd237a in nbd_close (bs=<optimized out>) at block/nbd.c:492
#7 0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:3170
#8 0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:3354
#9 0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:4347
#10 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:3176
#11 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:3354
#12 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:4347
#13 0x000055cc91fc8fbe in mirror_exit (job=0x55cc9508b0e0, opaque=0x55cc971418d0) at block/mirror.c:602
#14 0x000055cc91f87769 in block_job_defer_to_main_loop_bh (opaque=0x55cc966938c0) at blockjob.c:965
#15 0x000055cc92043581 in aio_bh_poll (bh=0x55cc9706fc80) at util/async.c:90
#16 0x000055cc92043581 in aio_bh_poll (ctx=ctx@entry=0x55cc950eba40) at util/async.c:118
#17 0x000055cc920462f0 in aio_dispatch (ctx=0x55cc950eba40) at util/aio-posix.c:436
#18 0x000055cc9204345e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
---Type <return> to continue, or q <return> to quit---
#19 0x00007f97c65f38f9 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#20 0x000055cc9204558c in main_loop_wait () at util/main-loop.c:213
#21 0x000055cc9204558c in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#22 0x000055cc9204558c in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
#23 0x000055cc91d258da in main () at vl.c:1937
#24 0x000055cc91d258da in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
at vl.c:4828
(gdb) bt full
#0 0x00007f97c4a70f0f in ppoll () at /lib64/libc.so.6
#1 0x000055cc9204476b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2 0x000055cc9204476b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x000055cc92046435 in aio_poll (ctx=0x55cc950eba40, blocking=blocking@entry=true)
at util/aio-posix.c:629
node = <optimized out>
i = <optimized out>
ret = 0
progress = <optimized out>
timeout = -1
start = 0
__PRETTY_FUNCTION__ = "aio_poll"
#4 0x000055cc91fd39fa in nbd_client_close (bs=0x55cc954b8000) at block/nbd-client.c:62
waited_ = true
busy_ = true
bs_ = 0x55cc954b8000
ctx_ = 0x55cc950ebcc0
client = 0x55cc9514c1c0
client = <optimized out>
request = {handle = 0, from = 0, len = 0, flags = 0, type = 2}
#5 0x000055cc91fd39fa in nbd_client_close (bs=0x55cc954b8000) at block/nbd-client.c:383
---Type <return> to continue, or q <return> to quit---
client = <optimized out>
request = {handle = 0, from = 0, len = 0, flags = 0, type = 2}
#6 0x000055cc91fd237a in nbd_close (bs=<optimized out>) at block/nbd.c:492
s = 0x55cc9514c1c0
#7 0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:3170
child = <optimized out>
next = <optimized out>
ban = <optimized out>
ban_next = <optimized out>
#8 0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:3354
#9 0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:4347
#10 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:3176
child = <optimized out>
next = <optimized out>
ban = <optimized out>
ban_next = <optimized out>
#11 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:3354
#12 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:4347
#13 0x000055cc91fc8fbe in mirror_exit (job=0x55cc9508b0e0, opaque=0x55cc971418d0) at block/mirror.c:602
data = 0x55cc971418d0
replace_aio_context = 0x0
src = 0x55cc9513a800
target_bs = 0x55cc96e00000
---Type <return> to continue, or q <return> to quit---
mirror_top_bs = 0x55cc96e06800
local_err = 0x0
#14 0x000055cc91f87769 in block_job_defer_to_main_loop_bh (opaque=0x55cc966938c0) at blockjob.c:965
data = 0x55cc966938c0
aio_context = 0x55cc950ebcc0
#15 0x000055cc92043581 in aio_bh_poll (bh=0x55cc9706fc80) at util/async.c:90
bh = <optimized out>
bhp = <optimized out>
next = 0x55cc9706f290
ret = 1
deleted = false
#16 0x000055cc92043581 in aio_bh_poll (ctx=ctx@entry=0x55cc950eba40) at util/async.c:118
bh = <optimized out>
bhp = <optimized out>
next = 0x55cc9706f290
ret = 1
deleted = false
#17 0x000055cc920462f0 in aio_dispatch (ctx=0x55cc950eba40) at util/aio-posix.c:436
#18 0x000055cc9204345e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
ctx = <optimized out>
#19 0x00007f97c65f38f9 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#20 0x000055cc9204558c in main_loop_wait () at util/main-loop.c:213
---Type <return> to continue, or q <return> to quit---
context = 0x55cc950baa50
pfds = <optimized out>
context = 0x55cc950baa50
ret = 1
spin_counter = 0
ret = 1
timeout = 4294967295
timeout_ns = <optimized out>
#21 0x000055cc9204558c in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
context = 0x55cc950baa50
ret = 1
spin_counter = 0
ret = 1
timeout = 4294967295
timeout_ns = <optimized out>
#22 0x000055cc9204558c in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
ret = 1
timeout = 4294967295
timeout_ns = <optimized out>
#23 0x000055cc91d258da in main () at vl.c:1937
i = <optimized out>
snapshot = <optimized out>
linux_boot = <optimized out>
---Type <return> to continue, or q <return> to quit---
initrd_filename = <optimized out>
kernel_filename = <optimized out>
kernel_cmdline = <optimized out>
boot_order = <optimized out>
boot_once = 0x55cc95026aa8 "d"
cyls = <optimized out>
heads = <optimized out>
secs = <optimized out>
translation = <optimized out>
opts = <optimized out>
machine_opts = <optimized out>
hda_opts = <optimized out>
icount_opts = <optimized out>
accel_opts = <optimized out>
olist = <optimized out>
optind = 67
optarg = 0x7ffc4d54c4d5 "stdio"
loadvm = <optimized out>
machine_class = 0x0
cpu_model = <optimized out>
vga_model = 0x7ffc4d54be62 "cirrus"
qtest_chrdev = <optimized out>
qtest_log = <optimized out>
---Type <return> to continue, or q <return> to quit---
pid_file = <optimized out>
incoming = <optimized out>
defconfig = <optimized out>
userconfig = <optimized out>
nographic = <optimized out>
display_type = <optimized out>
display_remote = <optimized out>
log_mask = <optimized out>
log_file = <optimized out>
trace_file = <optimized out>
maxram_size = <optimized out>
ram_slots = <optimized out>
vmstate_dump_file = <optimized out>
main_loop_err = 0x0
err = 0x0
list_data_dirs = <optimized out>
bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffc4d54b050}
__func__ = "main"
__FUNCTION__ = "main"
#24 0x000055cc91d258da in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
at vl.c:4828
i = <optimized out>
snapshot = <optimized out>
---Type <return> to continue, or q <return> to quit---
linux_boot = <optimized out>
initrd_filename = <optimized out>
kernel_filename = <optimized out>
kernel_cmdline = <optimized out>
boot_order = <optimized out>
boot_once = 0x55cc95026aa8 "d"
cyls = <optimized out>
heads = <optimized out>
secs = <optimized out>
translation = <optimized out>
opts = <optimized out>
machine_opts = <optimized out>
hda_opts = <optimized out>
icount_opts = <optimized out>
accel_opts = <optimized out>
olist = <optimized out>
optind = 67
optarg = 0x7ffc4d54c4d5 "stdio"
loadvm = <optimized out>
machine_class = 0x0
cpu_model = <optimized out>
vga_model = 0x7ffc4d54be62 "cirrus"
qtest_chrdev = <optimized out>
---Type <return> to continue, or q <return> to quit---
qtest_log = <optimized out>
pid_file = <optimized out>
incoming = <optimized out>
defconfig = <optimized out>
userconfig = <optimized out>
nographic = <optimized out>
display_type = <optimized out>
display_remote = <optimized out>
log_mask = <optimized out>
log_file = <optimized out>
trace_file = <optimized out>
maxram_size = <optimized out>
ram_slots = <optimized out>
vmstate_dump_file = <optimized out>
main_loop_err = 0x0
err = 0x0
list_data_dirs = <optimized out>
bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffc4d54b050}
__func__ = "main"
__FUNCTION__ = "main"
Test on both qemu-kvm-rhev-2.12.0-20.el7.x86_64 and qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64, also hit this issue.
Test env:
kernel version:3.10.0-957.1.2.el7.x86_64
qemu-kvm-rhev version:qemu-kvm-rhev-2.12.0-20.el7.x86_64
/qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
Test steps:
1. Host1: Start ndb server:
qemu-nbd rhel76-64-virtio-scsi.qcow2 -f raw -p 9000 -t
2. Host2: Start guest with data plane enabled and nbd backend:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pc \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20181107-005924-PkIxnG9p,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20181107-005924-PkIxnG9p,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idkp9HYI \
-chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20181107-005924-PkIxnG9p,server,nowait \
-device isa-serial,chardev=serial_id_serial0 \
-chardev socket,id=seabioslog_id_20181107-005924-PkIxnG9p,path=/var/tmp/seabios-20181107-005924-PkIxnG9p,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20181107-005924-PkIxnG9p,iobase=0x402 \
-device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
-device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
-device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
-device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
-device virtio-net-pci,mac=9a:44:45:46:47:48,id=iddDGLIi,vectors=4,netdev=idDdrbRp,bus=pci.0,addr=0x7 \
-netdev tap,id=idDdrbRp,vhost=on \
-m 2048 \
-smp 10,maxcpus=10,cores=5,threads=1,sockets=2 \
-cpu host \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=localtime,clock=host,driftfix=slew \
-boot order=cdn,once=c,menu=off,strict=off \
-enable-kvm \
-monitor stdio \
-object iothread,id=iothread0 \
-object iothread,id=iothread1 \
-device virtio-scsi-pci,id=scsi0,iothread=iothread0 \
-device virtio-scsi-pci,id=scsi1,iothread=iothread1 \
-device virtio-scsi-pci,id=scsi2 \
-drive if=none,id=drive_image1,aio=threads,cache=none,format=qcow2,werror=stop,rerror=stop,file=nbd:10.66.144.42:9000 \
-device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \
3. Host2: Do block stream
{ 'execute': 'drive-mirror', 'arguments': { 'device': 'drive_image1', 'target': '/home/destination1.image', 'sync': 'full'} }
--->{"timestamp": {"seconds": 1545120961, "microseconds": 574056}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_image1", "len": 4769185792, "offset": 4769185792, "speed": 0, "type": "mirror"}}
4. Host2: After block job reach "ready" status, complete the block job.
{ "execute": "block-job-complete", "arguments": { "device": "drive_image1"} }
Actual Result:
After step4, guest hang.
Pstack info:
# pstack 11057
Thread 16 (Thread 0x7f8e4cb50700 (LWP 11058)):
#0 0x00007f8e5431b1c9 in syscall () at /lib64/libc.so.6
#1 0x0000559155d62c10 in qemu_event_wait ()
#2 0x0000559155d7313e in call_rcu_thread ()
#3 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#4 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 15 (Thread 0x7f8e4c34f700 (LWP 11059)):
#0 0x00007f8e543162cf in ppoll () at /lib64/libc.so.6
#1 0x0000559155d5ec0b in qemu_poll_ns ()
#2 0x0000559155d60987 in aio_poll ()
#3 0x0000559155b3049e in iothread_run ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 14 (Thread 0x7f8e4bb4e700 (LWP 11060)):
#0 0x00007f8e543162cf in ppoll () at /lib64/libc.so.6
#1 0x0000559155d5ec0b in qemu_poll_ns ()
#2 0x0000559155d60987 in aio_poll ()
#3 0x0000559155b3049e in iothread_run ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 13 (Thread 0x7f8e4b24c700 (LWP 11077)):
#0 0x00007f8e543162cf in ppoll () at /lib64/libc.so.6
#1 0x0000559155d5ec0b in qemu_poll_ns ()
#2 0x0000559155d60987 in aio_poll ()
#3 0x0000559155b3049e in iothread_run ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 12 (Thread 0x7f8e4aa4b700 (LWP 11078)):
#0 0x00007f8e545fe4ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1 0x00007f8e545f9dcb in _L_lock_883 () at /lib64/libpthread.so.0
#2 0x00007f8e545f9c98 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3 0x0000559155d62539 in qemu_mutex_lock_impl ()
#4 0x0000559155a455f8 in qemu_mutex_lock_iothread ()
#5 0x0000559155a0866d in prepare_mmio_access.isra.24 ()
#6 0x0000559155a0cd08 in flatview_read_continue ()
#7 0x0000559155a0ceec in flatview_read ()
#8 0x0000559155a0d00f in address_space_read_full ()
#9 0x0000559155a683aa in kvm_cpu_exec ()
#10 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#11 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#12 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 11 (Thread 0x7f8e4a24a700 (LWP 11079)):
#0 0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1 0x0000559155a68165 in kvm_vcpu_ioctl ()
#2 0x0000559155a68233 in kvm_cpu_exec ()
#3 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 10 (Thread 0x7f8e49a49700 (LWP 11080)):
#0 0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1 0x0000559155a68165 in kvm_vcpu_ioctl ()
#2 0x0000559155a68233 in kvm_cpu_exec ()
#3 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 9 (Thread 0x7f8e49248700 (LWP 11081)):
#0 0x00007f8e545fe4ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1 0x00007f8e545f9dcb in _L_lock_883 () at /lib64/libpthread.so.0
#2 0x00007f8e545f9c98 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3 0x0000559155d62539 in qemu_mutex_lock_impl ()
#4 0x0000559155a455f8 in qemu_mutex_lock_iothread ()
#5 0x0000559155a0866d in prepare_mmio_access.isra.24 ()
#6 0x0000559155a0cd08 in flatview_read_continue ()
#7 0x0000559155a0ceec in flatview_read ()
#8 0x0000559155a0d00f in address_space_read_full ()
#9 0x0000559155a683aa in kvm_cpu_exec ()
#10 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#11 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#12 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 8 (Thread 0x7f8e48a47700 (LWP 11082)):
#0 0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1 0x0000559155a68165 in kvm_vcpu_ioctl ()
#2 0x0000559155a68233 in kvm_cpu_exec ()
#3 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 7 (Thread 0x7f8e48246700 (LWP 11083)):
#0 0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1 0x0000559155a68165 in kvm_vcpu_ioctl ()
#2 0x0000559155a68233 in kvm_cpu_exec ()
#3 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 6 (Thread 0x7f8e47a45700 (LWP 11084)):
#0 0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1 0x0000559155a68165 in kvm_vcpu_ioctl ()
#2 0x0000559155a68233 in kvm_cpu_exec ()
#3 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 5 (Thread 0x7f8e47244700 (LWP 11085)):
#0 0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1 0x0000559155a68165 in kvm_vcpu_ioctl ()
#2 0x0000559155a68233 in kvm_cpu_exec ()
#3 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 4 (Thread 0x7f8e46a43700 (LWP 11086)):
#0 0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1 0x0000559155a68165 in kvm_vcpu_ioctl ()
#2 0x0000559155a68233 in kvm_cpu_exec ()
#3 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 3 (Thread 0x7f8e46242700 (LWP 11087)):
#0 0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1 0x0000559155a68165 in kvm_vcpu_ioctl ()
#2 0x0000559155a68233 in kvm_cpu_exec ()
#3 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 2 (Thread 0x7f8dc3fff700 (LWP 11091)):
#0 0x00007f8e545fb965 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1 0x0000559155d627e9 in qemu_cond_wait_impl ()
#2 0x0000559155c7d28f in vnc_worker_thread_loop ()
#3 0x0000559155c7d858 in vnc_worker_thread ()
#4 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7f8e6da08dc0 (LWP 11057)):
#0 0x00007f8e543162cf in ppoll () at /lib64/libc.so.6
#1 0x0000559155d5ec0b in qemu_poll_ns ()
#2 0x0000559155d60987 in aio_poll ()
#3 0x0000559155ce3a6a in nbd_client_close ()
#4 0x0000559155ce139a in nbd_close ()
#5 0x0000559155c8a3f2 in bdrv_unref ()
#6 0x0000559155c8a41f in bdrv_unref ()
#7 0x0000559155cd3f1d in mirror_exit_common ()
#8 0x0000559155c8f952 in job_do_finalize ()
#9 0x0000559155c8fc10 in job_exit ()
#10 0x0000559155d5d771 in aio_bh_poll ()
#11 0x0000559155d60820 in aio_dispatch ()
#12 0x0000559155d5d64e in aio_ctx_dispatch ()
#13 0x00007f8e6d10d049 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#14 0x0000559155d5fb17 in main_loop_wait ()
#15 0x0000559155a01de7 in main ()
*** This bug has been marked as a duplicate of bug 1634219 *** |
Description of problem: Guest hung after block-job-complete for a drive-mirror image with NBD backend and iothread. Version-Release number of selected component (if applicable): host: 4.14.0-33.el7a.ppc64le guest: 4.14.0-33.el7a.ppc64le qemu: qemu-kvm-rhev-2.10.0-18.el7.ppc64le How reproducible: 100% Steps to Reproduce: 1.Boot a guest with NBD backend and iothread. e.g: [root@c155f3-u23 commands]# cat guest-9332.sh /usr/libexec/qemu-kvm \ -name 'guest' \ -machine pseries-rhel7.5.0 \ -m 16G \ -nodefaults \ -smp 8,cores=4,threads=2,sockets=1 \ -boot order=cdn,once=d,menu=off,strict=off \ -device nec-usb-xhci,id=xhci0 \ -device usb-tablet,id=usb-tablet0 \ -device usb-kbd,id=usb-kbd0 \ -chardev socket,id=qmp_id_qmpmonitor,path=/var/tmp/qmp-cmd-monitor-yhong,server,nowait \ -mon chardev=qmp_id_qmpmonitor,mode=control \ -chardev socket,id=serial_id_serial,path=/var/tmp/serial-yhong,server,nowait \ -device spapr-vty,reg=0x30000000,chardev=serial_id_serial \ -enable-kvm \ -object iothread,id=iothread0 \ -object iothread,id=iothread1 \ -device virtio-scsi-pci,id=scsi0,iothread=iothread0 \ -device virtio-scsi-pci,id=scsi1,iothread=iothread1 \ -device virtio-scsi-pci,id=scsi2 \ -drive file=nbd:10.0.1.8:10086,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_system \ -device scsi-hd,bus=scsi0.0,drive=drive_system,id=system,bootindex=0 \ -drive file=nbd:10.0.1.8:20001,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data1 \ -device scsi-hd,bus=scsi1.0,drive=drive_data1,id=data1 \ -drive file=nbd:10.0.1.8:20002,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data2 \ -device scsi-hd,bus=scsi2.0,drive=drive_data2,id=data2 \ -netdev tap,id=hostnet0,script=/etc/qemu-ifup \ -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=40:f2:e9:5d:9c:03 \ -monitor stdio \ 2.Do drive-mirror with drive_data1. e.g: { 'execute': 'drive-mirror', 'arguments': { 'device': 'drive_data1', 'target': '/home/yhong/images/destination1.image', 'sync': 'full'} } {"return": {}} {"timestamp": {"seconds": 1517138778, "microseconds": 444429}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data1", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}} 3.Do block-job-complete with drive_data1. e.g: { "execute": "block-job-complete", "arguments": { "device": "drive_data1"} } {"return": {}} {"timestamp": {"seconds": 1517138811, "microseconds": 523421}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data1", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}} Actual results: Guest hung. Expected results: Guest could run normally. Additional info: It is not reproduced with drive_data2 without iothread. [root@c155f3-u23 ~]# nc -U /var/tmp/qmp-cmd-monitor-yhong {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-18.el7)"}, "capabilities": []}} {'execute':'qmp_capabilities'} {"return": {}} { 'execute': 'drive-mirror', 'arguments': { 'device': 'drive_data2', 'target': '/home/yhong/images/destination2.image', 'sync': 'full'} } {"return": {}} {"timestamp": {"seconds": 1517139279, "microseconds": 147175}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data2", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}} { "execute": "block-job-complete", "arguments": { "device": "drive_data2"} } {"return": {}} {"timestamp": {"seconds": 1517139287, "microseconds": 306040}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data2", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}} [root@c155f3-u23 commands]# sh guest-9332.sh QEMU 2.10.0 monitor - type 'help' for more information (qemu) VNC server running on ::1:5900 (qemu) (qemu) Formatting '/home/yhong/images/destination2.image', fmt=qcow2 size=5368709120 cluster_size=65536 lazy_refcounts=off refcount_bits=16 (qemu) info block drive_system (#block117): nbd://10.0.1.8:10086 (qcow2) Attached to: system Cache mode: writeback, direct drive_data1 (#block306): nbd://10.0.1.8:20001 (qcow2) Attached to: data1 Cache mode: writeback, direct drive_data2 (#block1229): /home/yhong/images/destination2.image (qcow2) Attached to: data2 Cache mode: writeback, direct (qemu) Guest run normally.