Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1733022

Summary: copy-on-read block driver makes migration fail/crash
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Han Han <hhan>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
Status: CLOSED ERRATA QA Contact: CongLi <coli>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: aliang, chayang, coli, ddepaula, dyuan, hhuang, jgao, jinzhao, juzhang, pkrempa, rbalakri, virt-maint, xiaohli, xuwei
Target Milestone: rcFlags: knoel: mirror+
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1738377 (view as bug list) Environment:
Last Closed: 2019-11-06 07:17:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1738377    
Attachments:
Description Flags
domain xml, qemu cmdline, backtraces, qmps none

Description Han Han 2019-07-25 03:04:53 UTC
Created attachment 1593280 [details]
domain xml, qemu cmdline, backtraces, qmps

Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-5.6.0-1.el8.x86_64
qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.x86_64

How reproducible:
100%

Steps to Reproduce:
The following is reproduced by libvirt
1. Start a vm with -blockdev and copy-on-read=on enabled

2. Do virsh save
# virsh save copy /tmp/1.sav                                      
error: Failed to save domain copy to /tmp/1.sav
error: operation failed: domain is not running

Then find the qemu SIGABRT:
# coredumpctl|tail -n1
Thu 2019-07-25 10:51:12 CST    1577   107   107   6 none      /usr/libexec/qemu-kvm

Backtrace:
Thread 9 "live_migration" received signal SIGABRT, Aborted.
[Switching to Thread 0x7f1bb989f700 (LWP 7431)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50        return ret;
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f1bc2a65cc5 in __GI_abort () at abort.c:79
#2  0x0000559845130af1 in error_handle_fatal (errp=<optimized out>, err=0x7f1ba80537c0) at util/error.c:38
#3  0x0000559845130bcd in error_setv (errp=0x5598459b1e58 <error_abort>, src=0x55984520c770 "block.c", line=1779, func=0x5598452a8bc0 <__func__.32665> "bdrv_check_perm",
    err_class=ERROR_CLASS_GENERIC_ERROR, fmt=<optimized out>, ap=0x7f1bb989e480, suffix=0x0) at util/error.c:71
#4  0x0000559845130d54 in error_setg_internal (errp=errp@entry=0x5598459b1e58 <error_abort>, src=src@entry=0x55984520c770 "block.c", line=line@entry=1779,
    func=func@entry=0x5598452a8bc0 <__func__.32665> "bdrv_check_perm", fmt=fmt@entry=0x5598452a6efd "Block node is read-only") at util/error.c:95
#5  0x0000559845063d35 in bdrv_check_perm (bs=bs@entry=0x55984772c250, q=q@entry=0x0, cumulative_perms=cumulative_perms@entry=4,
    cumulative_shared_perms=cumulative_shared_perms@entry=31, ignore_children=ignore_children@entry=0x0, errp=0x5598459b1e58 <error_abort>) at block.c:1779
#6  0x00005598450640cb in bdrv_inactivate_recurse (bs=0x55984772c250) at block.c:5287
#7  0x00005598450640ed in bdrv_inactivate_recurse (bs=0x559847746450) at block.c:5293
#8  0x000055984506599f in bdrv_inactivate_all () at block.c:5325
#9  0x000055984500194c in qemu_savevm_state_complete_precopy (f=0x559847718f00, iterable_only=<optimized out>, inactivate_disks=<optimized out>) at migration/savevm.c:1334
#10 0x0000559844ffe176 in migration_thread (opaque=0x5598476cec50) at migration/migration.c:2756
#11 0x000055984512cff4 in qemu_thread_start (args=0x559848a32080) at util/qemu-thread-posix.c:502
#12 0x00007f1bc2e0f2de in start_thread (arg=<optimized out>) at pthread_create.c:486
#13 0x00007f1bc2b40463 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


Actual results:
As above 

Expected results:
No SIGABRT

Additional info:
-blockdev and copy-on-read=on are both required to reproduce the bug.
Please refer to the qemu-cmd line and qmp of 'virsh save' in attachment to reproduce the bug by native qemu.

Comment 2 Kevin Wolf 2019-07-29 11:15:08 UTC
The problem is that the copy-on-write driver requests write permissions even on inactive nodes, which cannot be provided.

I posted a fix upstream: https://lists.gnu.org/archive/html/qemu-block/2019-07/msg01129.html

Comment 4 CongLi 2019-08-23 07:19:03 UTC
Verified this bug on qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64.

src:
    -blockdev driver=file,node-name=file_image1,filename=/root/migration/rhel810-64-virtio.qcow2,cache.direct=on,cache.no-flush=on,aio=native \
    -blockdev driver=qcow2,node-name=drive_image1,file=file_image1 \
    -blockdev driver=copy-on-read,node-name=node1,file=drive_image1 \
    -device virtio-blk-pci,id=image1,drive=node1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4 \
dst:
    -blockdev driver=file,node-name=file_image1,filename=/mnt/rhel810-64-virtio.qcow2,cache.direct=on,cache.no-flush=on,aio=native \
    -blockdev driver=qcow2,node-name=drive_image1,file=file_image1 \
    -blockdev driver=copy-on-read,node-name=node1,file=drive_image1 \
    -device virtio-blk-pci,id=image1,drive=node1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4 \
    -incoming tcp:0:8888 \

Migration works well.

Thanks.

Comment 6 errata-xmlrpc 2019-11-06 07:17:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723