Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2036193

Summary: qemu abort() when creating overlays on top of a RBD disk
Product: Red Hat Enterprise Linux 9 Reporter: Meina Li <meili>
Component: qemu-kvmAssignee: Stefano Garzarella <sgarzare>
qemu-kvm sub component: Ceph QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: coli, hhan, kkiwi, lmen, virt-maint, xuzhang, yicui
Version: 9.0Keywords: TestOnly, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-04 01:34:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2034791    
Bug Blocks:    

Description Meina Li 2021-12-30 10:57:36 UTC
Description of problem:
Guest shutoff after creating snapshot with rbd disk for a while

Version-Release number of selected component (if applicable):
libvirt-7.10.0-1.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64
kernel-5.14.0-39.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a rbd disk image.
# qemu-img convert -f qcow2 -O raw /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 rbd:blockpull-pool/rbd_blockpull_ktd5.img:mon_host=**IP**

2. Start a guest with the rbd disk.
# virsh start avocado-vt-vm1
Domain 'avocado-vt-vm1' started
# virsh dumpxml avocado-vt-vm1 | grep /disk -B8
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source protocol='rbd' name='blockpull-pool/rbd_blockpull_ktd5.img' index='1'>
        <host name='**IP**' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
# virsh list --all
 Id   Name             State
--------------------------------
 2    avocado-vt-vm1   running

3. Create snapshot.
# for i in {1..3}; do virsh snapshot-create-as avocado-vt-vm1 s$i --disk-only --diskspec vda,file=/tmp/rbd.s$i; done
Domain snapshot s1 created
Domain snapshot s2 created
Domain snapshot s3 created

4. Check the status of guest after a while.
# virsh list --all
 Id   Name             State
---------------------------------
 -    avocado-vt-vm1   shut off

Actual results:
The guest will shutoff after creating snapshots for a while.

Expected results:
The guest is running.

Additional info:
1)# cat /var/log/libvirt/qemu/avocado-vt-vm1.log
...
qemu-kvm: ../block/rbd.c:1355: int qemu_rbd_co_block_status(BlockDriverState *, _Bool, int64_t, int64_t, int64_t *, int64_t *, BlockDriverState **): Assertion `req.bytes <= bytes' failed.
2021-12-30 10:49:18.282+0000: shutting down, reason=crashed
2)# coredumpctl dump 110922
...
                Stack trace of thread 110922:
                #0  0x00007f7a8d3fa7fc __pthread_kill_implementation (libc.so.6 + 0x8f7fc)
                #1  0x00007f7a8d3ad676 __GI_raise (libc.so.6 + 0x42676)
                #2  0x00007f7a8d3977d3 __GI_abort (libc.so.6 + 0x2c7d3)
                #3  0x00007f7a8d3976fb __assert_fail_base (libc.so.6 + 0x2c6fb)
                #4  0x00007f7a8d3a6396 __GI___assert_fail (libc.so.6 + 0x3b396)
                #5  0x00007f7a8a98d021 qemu_rbd_co_block_status (block-rbd.so + 0x5021)
                #6  0x000056340d0fd38e bdrv_co_block_status (qemu-kvm + 0x7d338e)
                #7  0x000056340d0fd545 bdrv_co_block_status (qemu-kvm + 0x7d3545)
                #8  0x000056340d0fceeb bdrv_co_common_block_status_above (qemu-kvm + 0x7d2eeb)
                #9  0x000056340d0b5530 bdrv_common_block_status_above (qemu-kvm + 0x78b530)
                #10 0x000056340d130720 qcow2_co_pwritev_task_entry (qemu-kvm + 0x806720)
                #11 0x000056340d12b177 qcow2_co_pwritev_part (qemu-kvm + 0x801177)
                #12 0x000056340d0fa9ed bdrv_driver_pwritev (qemu-kvm + 0x7d09ed)
                #13 0x000056340d0fc360 bdrv_aligned_pwritev (qemu-kvm + 0x7d2360)
                #14 0x000056340d0fb753 bdrv_co_pwritev_part (qemu-kvm + 0x7d1753)
                #15 0x000056340d0e7362 blk_co_do_pwritev_part (qemu-kvm + 0x7bd362)
                #16 0x000056340d0e77d7 blk_aio_write_entry (qemu-kvm + 0x7bd7d7)
                #17 0x000056340d2a3016 coroutine_trampoline (qemu-kvm + 0x979016)
                #18 0x00007f7a8d3c2810 n/a (libc.so.6 + 0x57810)

Comment 1 Klaus Heinrich Kiwi 2022-01-03 13:55:23 UTC
(In reply to Meina Li from comment #0)
> Description of problem:
> Guest shutoff after creating snapshot with rbd disk for a while
> 
> Version-Release number of selected component (if applicable):
> libvirt-7.10.0-1.el9.x86_64
> qemu-kvm-6.2.0-1.el9.x86_64
> kernel-5.14.0-39.el9.x86_64
> 


Submitter, can you tell us if this is a new testcase or a regression? If it's a regression, any record of when it was the last time it was tested and worked?


> 2. Start a guest with the rbd disk.
> # virsh start avocado-vt-vm1
> Domain 'avocado-vt-vm1' started
> # virsh dumpxml avocado-vt-vm1 | grep /disk -B8
>     <disk type='network' device='disk'>
>       <driver name='qemu' type='raw' cache='none'/>
[...]
> # for i in {1..3}; do virsh snapshot-create-as avocado-vt-vm1 s$i
> --disk-only --diskspec vda,file=/tmp/rbd.s$i; done
> Domain snapshot s1 created
> Domain snapshot s2 created
> Domain snapshot s3 created
[...]
> qemu-kvm: ../block/rbd.c:1355: int qemu_rbd_co_block_status(BlockDriverState
> *, _Bool, int64_t, int64_t, int64_t *, int64_t *, BlockDriverState **):
> Assertion `req.bytes <= bytes' failed.
> 2021-12-30 10:49:18.282+0000: shutting down, reason=crashed
> 2)# coredumpctl dump 110922
> ...
>                 Stack trace of thread 110922:
>                 #0  0x00007f7a8d3fa7fc __pthread_kill_implementation
> (libc.so.6 + 0x8f7fc)
>                 #1  0x00007f7a8d3ad676 __GI_raise (libc.so.6 + 0x42676)
>                 #2  0x00007f7a8d3977d3 __GI_abort (libc.so.6 + 0x2c7d3)
>                 #3  0x00007f7a8d3976fb __assert_fail_base (libc.so.6 +
> 0x2c6fb)
>                 #4  0x00007f7a8d3a6396 __GI___assert_fail (libc.so.6 +
> 0x3b396)
>                 #5  0x00007f7a8a98d021 qemu_rbd_co_block_status
> (block-rbd.so + 0x5021)
>                 #6  0x000056340d0fd38e bdrv_co_block_status (qemu-kvm +
> 0x7d338e)
>                 #7  0x000056340d0fd545 bdrv_co_block_status (qemu-kvm +
> 0x7d3545)
>                 #8  0x000056340d0fceeb bdrv_co_common_block_status_above
> (qemu-kvm + 0x7d2eeb)
>                 #9  0x000056340d0b5530 bdrv_common_block_status_above
> (qemu-kvm + 0x78b530)
>                 #10 0x000056340d130720 qcow2_co_pwritev_task_entry (qemu-kvm
> + 0x806720)
>                 #11 0x000056340d12b177 qcow2_co_pwritev_part (qemu-kvm +
> 0x801177)

It's interesting that even though the RBD image is in RAW format and using --disk-only to create-snapshot-as, this assertion fail is apparently still going through qcow2 routines? Maybe they are common (to the copy-on-write operation instead of the format)?

Stefano, can you take a look?

Comment 2 Stefano Garzarella 2022-01-03 15:14:10 UTC
(In reply to Klaus Heinrich Kiwi from comment #1)
> 
> It's interesting that even though the RBD image is in RAW format and using
> --disk-only to create-snapshot-as, this assertion fail is apparently still
> going through qcow2 routines? Maybe they are common (to the copy-on-write
> operation instead of the format)?

I guess because the local snapshots are qcow2, while the backend is RBD.

> 
> Stefano, can you take a look?

It seems the same issue of BZ2034791, for now I'm setting this BZ as TestOnly and depending on BZ2034791, but maybe we can close this as DUPLICATE.

Comment 3 Meina Li 2022-01-04 01:34:59 UTC
This bug can not be reproduced in qemu-kvm-6.1.0-8.el9.
After checking the steps of BZ2034791, this bug does duplicate it. So I directly close this as DUPLICATE.

*** This bug has been marked as a duplicate of bug 2034791 ***