RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2147617 - qemu-img finishes successfully while having errors in commit or bitmaps operations
Summary: qemu-img finishes successfully while having errors in commit or bitmaps opera...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: CentOS Stream
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks: 2150180
TreeView+ depends on / blocked
 
Reported: 2022-11-24 12:25 UTC by Albert Esteve
Modified: 2023-05-25 02:28 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-6.2.0-30.module+el8.8.0+18165+621caf3a
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2150180 (view as bug list)
Environment:
Last Closed: 2023-05-16 08:16:35 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab qemu-project qemu issues 1330 0 None opened qemu-img finishes successfully while having errors in commit or bitmaps operations 2022-11-24 12:25:36 UTC
Gitlab redhat/rhel/src/qemu-kvm qemu-kvm merge_requests 251 0 None None None 2023-02-06 16:11:11 UTC
Red Hat Issue Tracker RHELPLAN-140385 0 None None None 2022-11-24 13:14:08 UTC
Red Hat Product Errata RHSA-2023:2757 0 None None None 2023-05-16 08:17:42 UTC

Description Albert Esteve 2022-11-24 12:25:37 UTC
Description of problem:
Problem raises when trying to merge two images with the top image almost
full, and base image having stale bitmaps (bitmaps missing from
the top image).
In our usercase, the size of the LV that contains the base image is not
accounting for the stale bitmaps, and therefore, when we run commit or
bitmap --merge, it fails with:

> qcow2_free_clusters failed: No space left on device
> qemu-img: Lost persistent bitmaps during inactivation of node '#block308': Failed to write bitmap 'stale-bitmap-002' to file: No space left on device
> qemu-img: Failed to flush the refcount block cache: No space left on device

However, in both cases qemu-img returned successfully,
while having logs printed to stderr, and failing the merge.

Two cases:
- qemu-img commit, the data was commited successfully,
  it failed as it was adding the bitmaps. Still, the process
  exit with success. This can be ok, since bitmaps are
  basically an optimization.
- qemu-img bitmap, the process shall return with error code.
  If we cannot write a bitmap is fatal error.
  Also, bitmaps in the base image are left with the in-use
  flag set.

Still, probably is best to fail in both cases.

It would be best to allocate the bitmap storage upfront so failing
with ENOSPC is not possible at the end.
A user cannot recover from this failure since the bitmaps
are left in in-use state, and there is no way to fix them.

Version-Release number of selected component (if applicable):
6.2.0

How reproducible:
100%

Steps to Reproduce:
Check linked upsteam ticket for steps to reproduce.

Actual results:
Operations return with success exit code.

Expected results:
Operations shall fail.

Additional info:

Comment 1 Kevin Wolf 2022-11-24 13:13:06 UTC
Okay, after looking at the code, the problem seems clear to me.

This is an error that happens only while closing the image. Unfortunately closing the image is done by bdrv_unref(), a code path that doesn't return an error code, so qemu-img never knows about it. Explicitly inactivating the image first should allow qemu-img to see the error, so maybe that's an approach to try. The other option would be allowing bdrv_unref() to fail - this would require more code changes, but potentially cover more problematic cases.

Eric, Hanna, any preferences on which approach to take?

Comment 2 Hanna Czenczek 2022-11-24 16:56:26 UTC
Would you have bdrv_unref() fail outright (i.e. refuse to actually close the image on error, and force the caller to retain its refcount somehow) or just return an error message but still succeed in its operation of dropping the refcount and closing the image?

The former sounds weird and difficult (not necessarily wrong, though), the latter sounds more sensible.  I wonder what callers would do with the error message, though.  It’s clear what to do in the root of a user-facing function, but when it’s deeply nested?  If we can always push this upwards to a user-facing function (blockdev-del[1], qemu-img, ...), it seems like the better approach, but without trying, I don’t know.

[1] Even though I don’t know what to do in case of blockdev-del.  Depending on the answer to my first question, we probably don’t want it to return an error, even if something went wrong when the node was closed.

If it turns out that qemu-img is the only place where we can really do something useful with the information of whether something went wrong during bdrv_unref(), explicit inactivation seems better to me.

Hanna

Comment 3 aihua liang 2022-12-01 11:05:15 UTC
Can reproduce this issue with qemu-kvm-6.2.0-26.module+el8.8.0+17341+68372c23.

Steps:
 1. Create lv devices
   #qemu-img create -f raw test.img 400M
   #losetup /dev/loop0 test.img
   #pvcreate /dev/loop0
   #vgcreate test /dev/loop0
   #lvcreate -n base --size 128M test
   #lvcreate -n top --size 128M test

 2. Create base image and add 6 bitmaps to it.
   #qemu-img create -f qcow2 /dev/test/base 128M
   #qemu-img bitmap --add /dev/test/base stale-bitmap-1
   #qemu-img bitmap --add /dev/test/base stale-bitmap-2
   #qemu-img bitmap --add /dev/test/base stale-bitmap-3
   #qemu-img bitmap --add /dev/test/base stale-bitmap-4
   #qemu-img bitmap --add /dev/test/base stale-bitmap-5
   #qemu-img bitmap --add /dev/test/base stale-bitmap-6

 3. Create snapshot image, add a bitmap to it
   #qemu-img create -f qcow2 /dev/test/top -F qcow2 -b /dev/test/base
   #qemu-img bitmap --add /dev/test/top good-bitmap

 4. Fullwrite top
   # qemu-io -f qcow2 /dev/test/top -c "write 0 126M"
   wrote 132120576/132120576 bytes at offset 0
126 MiB, 1 ops; 00.20 sec (624.019 MiB/sec and 4.9525 ops/sec)

 5. Commit from base to top
   #qemu-img commit -f qcow2 -t none -b /dev/test/base -d -p /dev/test/top
   (100.00/100%)
Image committed.

 6. Add bitmap "good-bitmap" to base
   #qemu-img bitmap --add /dev/test/base good-bitmap

 7. Merge bitmap
   #qemu-img bitmap --merge "bitmap-good" -F qcow2 -b /dev/test/top /dev/test/base "bitmap-good"
qcow2_free_clusters failed: No space left on device
qemu-img: Lost persistent bitmaps during inactivation of node '#block119': Failed to write bitmap 'bitmap-good' to file: No space left on device
qemu-img: Failed to flush the refcount block cache: No space left on device

Comment 6 aihua liang 2023-02-14 06:22:17 UTC
Test on qemu-kvm-6.2.0-30.module+el8.8.0+18165+621caf3a, both commit and bitmap merge failed as expected.

Scenario 1: commit with error
Test Steps:
 1. Create lv devices
   #qemu-img create -f raw test.img 400M
   #losetup /dev/loop0 test.img
   #pvcreate /dev/loop0
   #vgcreate test /dev/loop0
   #lvcreate -n base --size 128M test
   #lvcreate -n top --size 128M test

 2. Create base image and add 6 bitmaps to it.
   #qemu-img create -f qcow2 /dev/test/base 128M
   #qemu-img bitmap --add /dev/test/base stale-bitmap-1
   #qemu-img bitmap --add /dev/test/base stale-bitmap-2
   #qemu-img bitmap --add /dev/test/base stale-bitmap-3
   #qemu-img bitmap --add /dev/test/base stale-bitmap-4
   #qemu-img bitmap --add /dev/test/base stale-bitmap-5
   #qemu-img bitmap --add /dev/test/base stale-bitmap-6
   #qemu-img bitmap --add /dev/test/base stale-bitmap-7

 3. Create snapshot image, add a bitmap to it
   #qemu-img create -f qcow2 /dev/test/top -F qcow2 -b /dev/test/base
   #qemu-img bitmap --add /dev/test/top good-bitmap

 4. Fullwrite top
   # qemu-io -f qcow2 /dev/test/top -c "write 0 126M"
   wrote 132120576/132120576 bytes at offset 0
126 MiB, 1 ops; 00.20 sec (624.019 MiB/sec and 4.9525 ops/sec)

 5. Commit from base to top
   #qemu-img commit -f qcow2 -t none -b /dev/test/base -d -p /dev/test/top
   (100.00/100%)
qemu-img: Lost persistent bitmaps during inactivation of node '#block397': Failed to write bitmap 'stale-bitmap-7' to file: No space left on device
qemu-img: Lost persistent bitmaps during inactivation of node '#block397': Failed to write bitmap 'stale-bitmap-7' to file: No space left on device
qemu-img: Error while closing the image: Invalid argument


Scenario 2: bitmap merge with error
1. Create lv devices
   #qemu-img create -f raw test.img 400M
   #losetup /dev/loop0 test.img
   #pvcreate /dev/loop0
   #vgcreate test /dev/loop0
   #lvcreate -n base --size 128M test
   #lvcreate -n top --size 128M test

 2. Create base image and add 6 bitmaps to it.
   #qemu-img create -f qcow2 /dev/test/base 128M
   #qemu-img bitmap --add /dev/test/base stale-bitmap-1
   #qemu-img bitmap --add /dev/test/base stale-bitmap-2
   #qemu-img bitmap --add /dev/test/base stale-bitmap-3
   #qemu-img bitmap --add /dev/test/base stale-bitmap-4
   #qemu-img bitmap --add /dev/test/base stale-bitmap-5
   #qemu-img bitmap --add /dev/test/base stale-bitmap-6

 3. Create snapshot image, add a bitmap to it
   #qemu-img create -f qcow2 /dev/test/top -F qcow2 -b /dev/test/base
   #qemu-img bitmap --add /dev/test/top good-bitmap

 4. Fullwrite top
   # qemu-io -f qcow2 /dev/test/top -c "write 0 126M"
   wrote 132120576/132120576 bytes at offset 0
126 MiB, 1 ops; 00.20 sec (624.019 MiB/sec and 4.9525 ops/sec)

 5. Commit from base to top
    # qemu-img commit -f qcow2 -t none -b /dev/test/base -d -p /dev/test/top
    (100.00/100%)
Image committed.

 6. Add bitmap good-bitmap to base
   #qemu-img bitmap --add /dev/test/base good-bitmap

 7. Merge bitmap
   #qemu-img bitmap --merge good-bitmap -F qcow2 -b /dev/test/top /dev/test/base good-bitmap
qemu-img: Lost persistent bitmaps during inactivation of node '#block147': Failed to write bitmap 'good-bitmap' to file: No space left on device
qemu-img: Error while closing the image: Invalid argument
qemu-img: Lost persistent bitmaps during inactivation of node '#block147': Failed to write bitmap 'good-bitmap' to file: No space left on device

Comment 7 Yanan Fu 2023-02-15 06:53:55 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 10 aihua liang 2023-02-17 07:13:28 UTC
As comment 6 and comment 7, set bug's status to "VERIFIED".

Comment 12 errata-xmlrpc 2023-05-16 08:16:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2757


Note You need to log in before you can comment on or make changes to this bug.