Bug 1733022 - copy-on-read block driver makes migration fail/crash
Summary: copy-on-read block driver makes migration fail/crash
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.1
Assignee: Kevin Wolf
QA Contact: CongLi
URL:
Whiteboard:
Depends On:
Blocks: 1738377
TreeView+ depends on / blocked
 
Reported: 2019-07-25 03:04 UTC by Han Han
Modified: 2019-11-06 07:18 UTC (History)
14 users (show)

Fixed In Version: qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1738377 (view as bug list)
Environment:
Last Closed: 2019-11-06 07:17:49 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
domain xml, qemu cmdline, backtraces, qmps (9.58 KB, application/gzip)
2019-07-25 03:04 UTC, Han Han
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 None None None 2019-11-06 07:18:16 UTC

Description Han Han 2019-07-25 03:04:53 UTC
Created attachment 1593280 [details]
domain xml, qemu cmdline, backtraces, qmps

Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-5.6.0-1.el8.x86_64
qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.x86_64

How reproducible:
100%

Steps to Reproduce:
The following is reproduced by libvirt
1. Start a vm with -blockdev and copy-on-read=on enabled

2. Do virsh save
# virsh save copy /tmp/1.sav                                      
error: Failed to save domain copy to /tmp/1.sav
error: operation failed: domain is not running

Then find the qemu SIGABRT:
# coredumpctl|tail -n1
Thu 2019-07-25 10:51:12 CST    1577   107   107   6 none      /usr/libexec/qemu-kvm

Backtrace:
Thread 9 "live_migration" received signal SIGABRT, Aborted.
[Switching to Thread 0x7f1bb989f700 (LWP 7431)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50        return ret;
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f1bc2a65cc5 in __GI_abort () at abort.c:79
#2  0x0000559845130af1 in error_handle_fatal (errp=<optimized out>, err=0x7f1ba80537c0) at util/error.c:38
#3  0x0000559845130bcd in error_setv (errp=0x5598459b1e58 <error_abort>, src=0x55984520c770 "block.c", line=1779, func=0x5598452a8bc0 <__func__.32665> "bdrv_check_perm",
    err_class=ERROR_CLASS_GENERIC_ERROR, fmt=<optimized out>, ap=0x7f1bb989e480, suffix=0x0) at util/error.c:71
#4  0x0000559845130d54 in error_setg_internal (errp=errp@entry=0x5598459b1e58 <error_abort>, src=src@entry=0x55984520c770 "block.c", line=line@entry=1779,
    func=func@entry=0x5598452a8bc0 <__func__.32665> "bdrv_check_perm", fmt=fmt@entry=0x5598452a6efd "Block node is read-only") at util/error.c:95
#5  0x0000559845063d35 in bdrv_check_perm (bs=bs@entry=0x55984772c250, q=q@entry=0x0, cumulative_perms=cumulative_perms@entry=4,
    cumulative_shared_perms=cumulative_shared_perms@entry=31, ignore_children=ignore_children@entry=0x0, errp=0x5598459b1e58 <error_abort>) at block.c:1779
#6  0x00005598450640cb in bdrv_inactivate_recurse (bs=0x55984772c250) at block.c:5287
#7  0x00005598450640ed in bdrv_inactivate_recurse (bs=0x559847746450) at block.c:5293
#8  0x000055984506599f in bdrv_inactivate_all () at block.c:5325
#9  0x000055984500194c in qemu_savevm_state_complete_precopy (f=0x559847718f00, iterable_only=<optimized out>, inactivate_disks=<optimized out>) at migration/savevm.c:1334
#10 0x0000559844ffe176 in migration_thread (opaque=0x5598476cec50) at migration/migration.c:2756
#11 0x000055984512cff4 in qemu_thread_start (args=0x559848a32080) at util/qemu-thread-posix.c:502
#12 0x00007f1bc2e0f2de in start_thread (arg=<optimized out>) at pthread_create.c:486
#13 0x00007f1bc2b40463 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


Actual results:
As above 

Expected results:
No SIGABRT

Additional info:
-blockdev and copy-on-read=on are both required to reproduce the bug.
Please refer to the qemu-cmd line and qmp of 'virsh save' in attachment to reproduce the bug by native qemu.

Comment 2 Kevin Wolf 2019-07-29 11:15:08 UTC
The problem is that the copy-on-write driver requests write permissions even on inactive nodes, which cannot be provided.

I posted a fix upstream: https://lists.gnu.org/archive/html/qemu-block/2019-07/msg01129.html

Comment 4 CongLi 2019-08-23 07:19:03 UTC
Verified this bug on qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64.

src:
    -blockdev driver=file,node-name=file_image1,filename=/root/migration/rhel810-64-virtio.qcow2,cache.direct=on,cache.no-flush=on,aio=native \
    -blockdev driver=qcow2,node-name=drive_image1,file=file_image1 \
    -blockdev driver=copy-on-read,node-name=node1,file=drive_image1 \
    -device virtio-blk-pci,id=image1,drive=node1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4 \
dst:
    -blockdev driver=file,node-name=file_image1,filename=/mnt/rhel810-64-virtio.qcow2,cache.direct=on,cache.no-flush=on,aio=native \
    -blockdev driver=qcow2,node-name=drive_image1,file=file_image1 \
    -blockdev driver=copy-on-read,node-name=node1,file=drive_image1 \
    -device virtio-blk-pci,id=image1,drive=node1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4 \
    -incoming tcp:0:8888 \

Migration works well.

Thanks.

Comment 6 errata-xmlrpc 2019-11-06 07:17:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.