Bug 1741186

Summary: [Upstream]Incremental backup: Qemu core dump when full backup at NBD Client via pull mode(data plane enable)
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: aihua liang <aliang>
Component: qemu-kvmAssignee: Sergio Lopez <slopezpa>
Status: CLOSED CURRENTRELEASE QA Contact: aihua liang <aliang>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: coli, jferlan, jinzhao, juzhang, ngu, qzhang, slopezpa, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-30 06:34:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1741094, 1748253    
Bug Blocks:    

Description aihua liang 2019-08-14 13:14:37 UTC
Description of problem:
 [Upstream]Incremental backup: Qemu core dump when full backup at NBD Client via pull mode(data plane enable)

Version-Release number of selected component (if applicable):
 kernel version:4.18.0-122.el8.x86_64
 qemu-kvm version:v4.1.0-rc5

How reproducible:
 100%

Steps to Reproduce:
1.Start guest with qemu cmds:
   /usr/local/bin/qemu-system-x86_64 \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idn20piu  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190602-221944-MrlxVzia,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190602-221944-MrlxVzia,path=/var/tmp/seabios-20190602-221944-MrlxVzia,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190602-221944-MrlxVzia,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -drive driver=qcow2,id=drive_image1,file=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2,if=none,cache=none \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pcie.0,iothread=iothread0,bus=pcie.0-root-port-3,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -blockdev driver=file,filename=/mnt/data.qcow2,node-name=file_node \
    -blockdev driver=qcow2,file=file_node,node-name=drive_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,bus=pcie.0-root-port-5,addr=0x0,iothread=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -drive format=qcow2,file=/mnt/data2.qcow2,id=drive_data2,cache=none,if=none \
    -device virtio-blk-pci,id=data2,drive=drive_data2,bus=pcie.0-root-port-6,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:33:34:35:36:37,id=idj01pFr,vectors=4,netdev=idMgbx8B,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idMgbx8B,vhost=on \
    -m 4096  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

2. Start NBD Server
    {"execute":"nbd-server-start","arguments":{"addr":{"type":"inet","data":{"host":"10.73.224.68","port":"10809"}}}}

3. Create scratch.img and add it.
    #qemu-img create -f qcow2 -b /mnt/data.qcow2 -F qcow2 scratch.img
    {"execute":"blockdev-add","arguments":{"driver":"qcow2","node-name":"tmp","file":{"driver":"file","filename":"scratch.img"},"backing":"drive_data1"}}

4. Do full backup and add bitmap
    { "execute": "transaction", "arguments": { "actions": [ {"type": "blockdev-backup", "data": { "device": "drive_data1", "target": "tmp", "sync": "none", "job-id":"j1" } }, {"type": "block-dirty-bitmap-add", "data": { "node": "drive_data1", "name": "bitmap0" } } ] }}

5. Expose the backup image
    {"execute":"nbd-server-add","arguments":{"device":"tmp"}}

6. NBD Client: create full backup image.
    #qemu-img create -f qcow2 back1.img 2G

7. NBD Client: Do full backup via pull mode
    #./copyif3.sh nbd://10.73.224.68:10809/tmp back1.img
image: nbd://10.73.224.68:10809/tmp
file format: raw
virtual size: 2 GiB (2147483648 bytes)
disk size: unavailable
image: back1.img
file format: qcow2
virtual size: 2 GiB (2147483648 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
qemu-img: Could not open 'nbd://10.73.224.68:10809/tmp': Failed to read initial magic: Unable to read from socket: Connection reset by peer

Actual results:
 Qemu coredump with info:
   (qemu) qemu-system-x86_64: io/channel.c:411: qio_channel_restart_read: Assertion `qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)' failed.
blk_debug.txt: line 42:  6026 Aborted                 (core dumped) /usr/local/bin/qemu-system-x86_64 -name 'avocado-vt-vm1' -machine q35 -nodefaults -device VGA,bus=pcie.0,addr=0x1 -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190602-221944-MrlxVzia,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190602-221944-MrlxVzia,server,nowait -mon chardev=qmp_id_catch_monitor,mode=control -device pvpanic,ioport=0x505,id=idn20piu -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190602-221944-MrlxVzia,server,nowait ...


Expected results:
  Full backup via pull mode can be executed successfully.

Additional info:
  Both virtio_blk and virtio_scsi have this issue.
  It works ok when data plane disable.
  Test on qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.x86_64, it works ok.

Comment 1 aihua liang 2019-08-14 13:17:25 UTC
Coredump file located at:
  10.73.194.27:/vol/s2coredump/bz1741186//var/lib/systemd/coredump/core.qemu-system-x86.0.07561b954d40494698825a82752fab61.5218.1565786874000000.lz4

Comment 5 aihua liang 2019-08-27 06:09:11 UTC
Test on qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64, also hit this issue.

Comment 6 Sergio Lopez 2019-09-12 11:17:07 UTC
While the context is slightly different, the underlying issue is the same as BZ#1748253.

Comment 8 aihua liang 2019-09-29 05:57:36 UTC
As bz1748253 has been verified, check this bug on qemu-img-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64. 
It not exist any more, but when cancel to expose tmp, qemu crash, details as bellow:

Test steps:
  1.Start guest with qemu cmds:
   /usr/local/bin/qemu-system-x86_64 \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idn20piu  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190602-221944-MrlxVzia,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190602-221944-MrlxVzia,path=/var/tmp/seabios-20190602-221944-MrlxVzia,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190602-221944-MrlxVzia,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -drive driver=qcow2,id=drive_image1,file=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2,if=none,cache=none \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pcie.0,iothread=iothread0,bus=pcie.0-root-port-3,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -blockdev driver=file,filename=/home/data.qcow2,node-name=file_node \
    -blockdev driver=qcow2,file=file_node,node-name=drive_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,bus=pcie.0-root-port-5,addr=0x0,iothread=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -drive format=qcow2,file=/home/data2.qcow2,id=drive_data2,cache=none,if=none \
    -device virtio-blk-pci,id=data2,drive=drive_data2,bus=pcie.0-root-port-6,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:33:34:35:36:37,id=idj01pFr,vectors=4,netdev=idMgbx8B,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idMgbx8B,vhost=on \
    -m 4096  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

2. Start NBD Server
    {"execute":"nbd-server-start","arguments":{"addr":{"type":"inet","data":{"host":"10.73.224.68","port":"10809"}}}}

3. Create scratch.img and add it.
    #qemu-img create -f qcow2 -b /home/data.qcow2 -F qcow2 scratch.img
    {"execute":"blockdev-add","arguments":{"driver":"qcow2","node-name":"tmp","file":{"driver":"file","filename":"scratch.img"},"backing":"drive_data1"}}

4. Do full backup and add bitmap
    { "execute": "transaction", "arguments": { "actions": [ {"type": "blockdev-backup", "data": { "device": "drive_data1", "target": "tmp", "sync": "none", "job-id":"j1" } }, {"type": "block-dirty-bitmap-add", "data": { "node": "drive_data1", "name": "bitmap0" } } ] }}

5. Expose the backup image
    {"execute":"nbd-server-add","arguments":{"device":"tmp"}}

6. NBD Client: create full backup image.
    #qemu-img create -f qcow2 back1.img 2G

7. NBD Client: Do full backup via pull mode
    #./copyif3.sh nbd://10.73.224.68:10809/tmp back1.img
    image: nbd://10.73.224.68:10809/tmp
file format: raw
virtual size: 2 GiB (2147483648 bytes)
disk size: unavailable
image: back1.img
file format: qcow2
virtual size: 2 GiB (2147483648 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

 0 1056768:
read 1056768/1056768 bytes at offset 0
1.008 MiB, 1 ops; 00.11 sec (9.050 MiB/sec and 8.9795 ops/sec)

 1056768 28672:
read 28672/28672 bytes at offset 1056768
28 KiB, 1 ops; 00.00 sec (60.792 MiB/sec and 2223.2350 ops/sec)

 1085440 4096:
read 4096/4096 bytes at offset 1085440
4 KiB, 1 ops; 00.00 sec (8.926 MiB/sec and 2285.1606 ops/sec)

 1089536 24576:
read 24576/24576 bytes at offset 1089536
24 KiB, 1 ops; 00.00 sec (54.109 MiB/sec and 2308.6531 ops/sec)

 1114112 8192:
read 8192/8192 bytes at offset 1114112
8 KiB, 1 ops; 00.00 sec (5.832 MiB/sec and 746.4837 ops/sec)

 1122304 61440:
read 61440/61440 bytes at offset 1122304
60 KiB, 1 ops; 00.00 sec (34.604 MiB/sec and 590.5708 ops/sec)

 1183744 61440:
read 61440/61440 bytes at offset 1183744
60 KiB, 1 ops; 00.00 sec (134.640 MiB/sec and 2297.8575 ops/sec)

 1245184 33488896:
read 33488896/33488896 bytes at offset 1245184
31.938 MiB, 1 ops; 00.01 sec (3.350 GiB/sec and 107.4013 ops/sec)

 34734080 28672:
read 28672/28672 bytes at offset 34734080
28 KiB, 1 ops; 00.00 sec (18.843 MiB/sec and 689.1086 ops/sec)

 34762752 36864:
read 36864/36864 bytes at offset 34762752
36 KiB, 1 ops; 00.00 sec (78.143 MiB/sec and 2222.7359 ops/sec)

 134217728 65536:
read 65536/65536 bytes at offset 134217728
64 KiB, 1 ops; 00.00 sec (75.364 MiB/sec and 1205.8319 ops/sec)

 135266304 65536:
read 65536/65536 bytes at offset 135266304
64 KiB, 1 ops; 00.00 sec (112.867 MiB/sec and 1805.8691 ops/sec)

 142606336 260112384:
read 260112384/260112384 bytes at offset 142606336
248.062 MiB, 1 ops; 00.21 sec (1.176 GiB/sec and 4.8560 ops/sec)

 411041792 260112384:
read 260112384/260112384 bytes at offset 411041792
248.062 MiB, 1 ops; 0:00:01.18 (210.899 MiB/sec and 0.8502 ops/sec)

 679477248 260112384:
read 260112384/260112384 bytes at offset 679477248
248.062 MiB, 1 ops; 00.21 sec (1.175 GiB/sec and 4.8517 ops/sec)

 947912704 260112384:
read 260112384/260112384 bytes at offset 947912704
248.062 MiB, 1 ops; 0:00:01.20 (207.260 MiB/sec and 0.8355 ops/sec)

 1216348160 75497472:
read 75497472/75497472 bytes at offset 1216348160
72 MiB, 1 ops; 00.06 sec (1.184 GiB/sec and 16.8449 ops/sec)

 2147418112 65536:
read 65536/65536 bytes at offset 2147418112
64 KiB, 1 ops; 00.07 sec (941.938 KiB/sec and 14.7178 ops/sec)
Success!

8. Clear test env after full backup via pull mode.
   {"execute":"nbd-server-remove","arguments":{"name":"tmp"}}
Ncat: Connection reset by peer.

   (qemu)qemu: qemu_mutex_unlock_impl: Operation not permitted
debug.txt: line 42:  2804 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -machine q35 -nodefaults -device VGA,bus=pcie.0,addr=0x1 -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190602-221944-MrlxVzia,server,nowait...

(gdb) bt
#0  0x00007f29edd6c8df in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f29edd56cf5 in __GI_abort () at abort.c:79
#2  0x000055d2761eb58e in error_exit
    (err=<optimized out>, msg=msg@entry=0x55d27638f210 <__func__.19897> "qemu_mutex_unlock_impl")
    at util/qemu-thread-posix.c:36
#3  0x000055d2761eb7ca in qemu_mutex_unlock_impl
    (mutex=mutex@entry=0x55d2777e4250, file=file@entry=0x55d27638e5ff "util/async.c", line=line@entry=515) at util/qemu-thread-posix.c:96
#4  0x000055d2761e6759 in aio_context_release (ctx=ctx@entry=0x55d2777e41f0) at util/async.c:515
#5  0x000055d2761675c1 in bdrv_do_drained_begin
    (poll=<optimized out>, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55d277a0de00)
    at block/io.c:426
#6  0x000055d2761675c1 in bdrv_do_drained_begin
    (bs=0x55d277a0de00, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:392
#7  0x000055d276158f6b in blk_drain (blk=blk@entry=0x55d278374760) at block/block-backend.c:1575
#8  0x000055d2761599cd in blk_remove_bs (blk=blk@entry=0x55d278374760) at block/block-backend.c:794
#9  0x000055d276159c47 in blk_delete (blk=0x55d278374760) at block/block-backend.c:414
#10 0x000055d276159c47 in blk_unref (blk=0x55d278374760) at block/block-backend.c:469
#11 0x000055d27617afef in nbd_export_put (exp=0x55d278ae9e70) at nbd/server.c:1658
#12 0x000055d27617afef in nbd_export_put (exp=0x55d278ae9e70) at nbd/server.c:1630
#13 0x000055d2760e2e4c in qmp_marshal_nbd_server_remove
    (args=<optimized out>, ret=<optimized out>, errp=0x7fff29654bc8) at qapi/qapi-commands-block.c:285
#14 0x000055d27619f23c in do_qmp_dispatch
    (errp=0x7fff29654bc0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55d276a7b880 <qmp_commands>) at qapi/qmp-dispatch.c:131
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x000055d27619f23c in qmp_dispatch
    (cmds=0x55d276a7b880 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>)
    at qapi/qmp-dispatch.c:174
#16 0x000055d2760c1741 in monitor_qmp_dispatch (mon=0x55d27782c280, req=<optimized out>)
    at monitor/qmp.c:120
#17 0x000055d2760c1d8a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:209
#18 0x000055d2761e5fb6 in aio_bh_call (bh=0x55d27774ab20) at util/async.c:117
#19 0x000055d2761e5fb6 in aio_bh_poll (ctx=ctx@entry=0x55d2777496d0) at util/async.c:117
#20 0x000055d2761e93f4 in aio_dispatch (ctx=0x55d2777496d0) at util/aio-posix.c:459
#21 0x000055d2761e5e92 in aio_ctx_dispatch
    (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
#22 0x00007f29f2bea67d in g_main_dispatch (context=0x55d2777d3c50) at gmain.c:3176
#23 0x00007f29f2bea67d in g_main_context_dispatch (context=context@entry=0x55d2777d3c50) at gmain.c:3829
#24 0x000055d2761e84a8 in glib_pollfds_poll () at util/main-loop.c:218
#25 0x000055d2761e84a8 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:241
#26 0x000055d2761e84a8 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:517
#27 0x000055d275fd1419 in main_loop () at vl.c:1809
#28 0x000055d275e7f993 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4506



Hi, Sergio

  Please help to check if any illegal operation exist? If not, is it suitable to close this bug as current release and file a new one to track this issue? Or it's ok to track the new issue here?

BR,
aliang

Comment 10 aihua liang 2019-09-29 06:16:46 UTC
No need to file a new bug, bz1741094 has covered it. So suggest to close this bug as current release and track the new issue by bz1741094.

Comment 11 Sergio Lopez 2019-09-30 05:50:21 UTC
Hi,

Yes, this is another issue which is known and fixed upstream, described in bz1741094. So I agree in that we should close this one.

Thanks,
Sergio.

Comment 12 aihua liang 2019-09-30 06:34:55 UTC
As comment10 and comment11, close this bug as current release.