Bug 1703907 - [upstream]QEMU coredump when converting to qcow2: external data file images on block devices with copy_offloading
Summary: [upstream]QEMU coredump when converting to qcow2: external data file images o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.0
Assignee: Kevin Wolf
QA Contact: CongLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-29 05:33 UTC by Tingting Mao
Modified: 2020-05-05 09:46 UTC (History)
7 users (show)

Fixed In Version: qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:45:18 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:46:07 UTC

Description Tingting Mao 2019-04-29 05:33:09 UTC
Description of problem:
Qemu coredump when converting images over ISCSI with copy_offloading


Version-Release number of selected component (if applicable):
Qemu: 
# qemu-img --version
qemu-img version 4.0.50 (v4.0.0-89-gdb7f1c3faf)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

Kernel:
# uname -r
4.18.0-80.16.el8.x86_64


How reproducible:
3/3


Steps to Reproduce:

1.Login ISCSI target
# iscsiadm --mode discoverydb --type sendtargets --portal 10.66.10.26 --discover
10.66.10.26:3260,1 iqn.2019-04.com.ecample:t1

# iscsiadm --mode node --targetname iqn.2019-04.com.ecample:t1 --portal 10.66.10.26:3260 --login
Logging in to [iface: default, target: iqn.2019-04.com.ecample:t1, portal: 10.66.10.26,3260] (multiple)
Login to [iface: default, target: iqn.2019-04.com.ecample:t1, portal: 10.66.10.26,3260] successful.

2. Create lvs over ISCSI target
2.1 PV creation
# pvcreate /dev/sdb 
  Physical volume "/dev/sdb" successfully created.

2.2 VG creation
# vgcreate vg /dev/sdb
  Volume group "vg" successfully created

2.3 LVs creation
# lvcreate -L 1G -n tgt.img vg
  Logical volume "tgt.img" created.
# lvcreate -L 1G -n tgt.qcow2 vg
  Logical volume "tgt.qcow2" created.

3. Convert a local image to the LVs with qcow2: external data file format and with copy_offloading as well

# qemu-img info src.img 
image: src.img
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 512M

# qemu-img convert -f raw -O qcow2 src.img -o data_file=/dev/vg/tgt.img,data_file_raw=on /dev/vg/tgt.qcow2 -p -C
qemu-img: block/qcow2-cache.c:58: qcow2_cache_get_table_idx: Assertion `idx >= 0 && idx < c->size && table_offset % c->table_size == 0' failed.
Aborted (core dumped)




Actual results:
Aborted. As above.


Expected results:
No core dumped. Covert successfully.


Additional info:
1. Converting to qcow2 image work well
# qemu-img convert -f raw -O qcow2 src.img  /dev/vg/tgt.qcow2 -p 
    (100.00/100%)

2. Converting to qcow2 image with copy_offloading works well
# qemu-img convert -f raw -O qcow2 src.img  /dev/vg/tgt.qcow2 -p -C
    (100.00/100%)

3. Converting to qcow2: external data file image works well(Without copy_offloading)
# qemu-img convert -f raw -O qcow2 src.img -o data_file=/dev/vg/tgt.img,data_file_raw=on /dev/vg/tgt.qcow2 -p 
    (100.00/100%)

Comment 2 Tingting Mao 2019-10-23 11:23:44 UTC
Still hit this issue in 'qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61'.

Comment 3 Ademar Reis 2020-02-05 22:56:44 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 4 Kevin Wolf 2020-02-07 10:12:40 UTC
This doesn't require iscsi, but a block device might be needed. I couldn't reproduce it with local files, but loop devices are enough to reproduce it.

The crash happens while aborting the copy offloading operation (it probably failed because block devices don't support copy offloading).

(gdb) bt
#0  0x00007ffff765157f in raise () from /lib64/libc.so.6
#1  0x00007ffff763b895 in abort () from /lib64/libc.so.6
#2  0x00007ffff763b769 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007ffff7649a26 in __assert_fail () from /lib64/libc.so.6
#4  0x00005555555ca6d1 in qcow2_cache_get_table_idx (c=<optimized out>, table=<optimized out>) at block/qcow2-cache.c:56
#5  qcow2_cache_get_table_idx (table=<optimized out>, c=<optimized out>) at block/qcow2-cache.c:52
#6  qcow2_cache_entry_mark_dirty (c=<optimized out>, table=<optimized out>) at block/qcow2-cache.c:430
#7  0x00005555555c0c6d in update_refcount (bs=bs@entry=0x55555573c210, offset=offset@entry=0, length=length@entry=2097152, addend=addend@entry=1, decrease=decrease@entry=true, 
    type=type@entry=QCOW2_DISCARD_NEVER) at block/qcow2-refcount.c:862
#8  0x00005555555c036f in qcow2_free_clusters (bs=0x55555573c210, offset=0, size=2097152, type=QCOW2_DISCARD_NEVER) at block/qcow2-refcount.c:1147
#9  0x00005555555c747d in qcow2_alloc_cluster_abort (bs=<optimized out>, m=<optimized out>) at block/qcow2-cluster.c:1014
#10 0x00005555555b76e9 in qcow2_handle_l2meta (bs=bs@entry=0x55555573c210, pl2meta=pl2meta@entry=0x7ffff5adcb00, link_l2=link_l2@entry=false) at block/qcow2.c:1950
#11 0x00005555555b9351 in qcow2_co_copy_range_to (bs=0x55555573c210, src=0x55555571b160, src_offset=0, dst=<optimized out>, dst_offset=0, bytes=<optimized out>, read_flags=0, 
    write_flags=0) at block/qcow2.c:3729
#12 0x00005555555ec01b in bdrv_co_copy_range_internal (src=0x55555571b160, src_offset=0, dst=0x55555573aea0, dst_offset=0, bytes=2097152, read_flags=0, write_flags=0, recurse_src=false)
    at block/io.c:3064
#13 0x00005555555ec5d5 in bdrv_co_copy_range_to (src=<optimized out>, src_offset=<optimized out>, dst=<optimized out>, dst_offset=<optimized out>, bytes=<optimized out>, 
    read_flags=<optimized out>, write_flags=0) at block/io.c:3106
#14 0x00005555555ebf3f in bdrv_co_copy_range_internal (src=0x55555571b160, src_offset=0, dst=0x55555573aea0, dst_offset=0, bytes=2097152, read_flags=0, write_flags=0, recurse_src=true)
    at block/io.c:3049
#15 0x00005555555ec5b5 in bdrv_co_copy_range_from (src=<optimized out>, src_offset=<optimized out>, dst=<optimized out>, dst_offset=<optimized out>, bytes=<optimized out>, 
    read_flags=<optimized out>, write_flags=0) at block/io.c:3090
#16 0x00005555555ebf3f in bdrv_co_copy_range_internal (src=0x555555710380, src_offset=0, dst=0x55555573aea0, dst_offset=0, bytes=2097152, read_flags=0, write_flags=0, recurse_src=true)
    at block/io.c:3049
#17 0x00005555555ec5b5 in bdrv_co_copy_range_from (src=<optimized out>, src_offset=<optimized out>, dst=<optimized out>, dst_offset=<optimized out>, bytes=<optimized out>, 
    read_flags=<optimized out>, write_flags=0) at block/io.c:3090
#18 0x0000555555594f87 in convert_co_copy_range (nb_sectors=4096, sector_num=0, s=<optimized out>) at qemu-img.c:1840
#19 convert_co_do_copy (opaque=<optimized out>) at qemu-img.c:1930
#20 0x000055555565b8a3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115
#21 0x00007ffff7667250 in ?? () from /lib64/libc.so.6
#22 0x00007fffffffc6b0 in ?? ()
#23 0x0000000000000000 in ?? ()

In this stack trace, there is at least one entry that is certainly wrong: It would try to free the image header.

    qcow2_free_clusters (bs=0x55555573c210, offset=0, size=2097152, type=QCOW2_DISCARD_NEVER)

This is because offset 0 into s->data_file is misinterpreted as offset 0 into bs->file.

Comment 5 Kevin Wolf 2020-02-24 15:41:31 UTC
This crash is fixed in upstream commit c3b6658c1a.

Comment 10 Tingting Mao 2020-03-02 06:24:25 UTC
Tried to reproduce and verify this bug over loop devices as below:


Prepare a source image.
# qemu-img create -f raw src.img 2G
# qemu-io -c 'write -P 1 0 1G' src.img -f raw
# qemu-img info src.img 
image: src.img
file format: raw
virtual size: 2 GiB (2147483648 bytes)
disk size: 1 GiB

Prepare target images of loop devices:
# qemu-img create -f raw loop.img 10G
# losetup -f loop.img
# qemu-img create -f raw loop1.img 10G
# losetup -f loop1.img
# losetup -l
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE            DIO LOG-SEC
/dev/loop1         0      0         0  0 /home/test/loop1.img   0     512
/dev/loop0         0      0         0  0 /home/test/loop.img    0     512



Reproduced with 'qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e':
# qemu-img convert -f raw src.img -O qcow2 -o data_file=/dev/loop0,data_file_raw=on /dev/loop1 -p -C  
qemu-img: block/qcow2-cache.c:56: qcow2_cache_get_table_idx: Assertion `idx >= 0 && idx < c->size && table_offset % c->table_size == 0' failed.
Aborted (core dumped)



Verified with 'qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae':
# qemu-img convert -f raw src.img -O qcow2 -o data_file=/dev/loop0,data_file_raw=on /dev/loop1 -p -C 
    (100.00/100%)
# echo $?
0

Comment 13 errata-xmlrpc 2020-05-05 09:45:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.