Bug 1631052

Summary: x-block-dirty-bitmap-merge into a disabled bitmap dest causes assert
Product: Red Hat Enterprise Linux 7 Reporter: John Snow <jsnow>
Component: qemu-kvm-rhevAssignee: John Snow <jsnow>
Status: CLOSED ERRATA QA Contact: aihua liang <aliang>
Severity: high Docs Contact:
Priority: high    
Version: 7.7CC: chayang, coli, eblake, juzhang, ngu, virt-maint
Target Milestone: rcKeywords: TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1638690 (view as bug list) Environment:
Last Closed: 2019-08-22 09:18:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1658343    
Bug Blocks: 1636224, 1638690, 1651787    
Attachments:
Description Flags
gdb_debug_info_for_coredump-76-bz1631052-10122018 none

Description John Snow 2018-09-19 19:59:19 UTC
Description of problem:

Merging into a disabled bitmap (i.e., one that has been turned 'off') causes an assertion.

Version-Release number of selected component (if applicable):

qemu 3.0,
qemu-kvm-rhev-2.12.0-17.el7

How reproducible:

100%


Steps to Reproduce:
1. {"execute":"x-block-dirty-bitmap-merge","arguments":{"node":"n","src_name":"existing_map","dst_name":"disabled_map"}}
2.
3.

Actual results:

{"execute":"x-block-dirty-bitmap-merge","arguments":{"node":"n","src_name":"existing_map","dst_name":"disabled_map"}}


Expected results:

The command must either be rejected, OR it must complete successfully. 

Additional info:

Reported-by: Eric Blake

Comment 2 Eric Blake 2018-09-19 20:15:56 UTC
Preferably complete successfully. Merging into a frozen bitmap (one in use by a block job) should probably be forbidden, but merging into a disabled one (where it is merely not tracking new writes) makes sense in at least the following scenario:

If the image is keeping a linear sequence of bitmaps in order to track various points in time which can each serve as the reference point at the start of a differential backup, where all but the most recent bitmap is disabled, then creating the differential backup is a matter of creating a temporary live bitmap, then live-merging all of the earlier bitmaps in. In that context, merges can always be performed into a live bitmap, immediately prior to the transaction that kicks off the blockdev-backup job acting on the temporary bitmap.

But once the user decides that a point in time is no longer important (that is, that no further differential backups will be made from a given point in time), the two bitmaps on either side of that point in time need to be merged into a single bitmap. If both of those bitmaps are disabled, and this bug is not fixed, then the only workaround I can think of to be able to perform the merge without triggering the assertion would be to temporarily pause the guest (to ensure no further I/O), then re-enable the destination bitmap, perform the merge, disable the destination bitmap, resume guest I/O, and finally delete the unused bitmap.  But pausing the guest I/O is contrary to the goal of bitmap management being non-invasive to a running guest, compared to just allowing the merge of one disabled bitmap into another.

I'm also trying to figure if merging a live bitmap into a disabled one ever makes sense.  I suppose it does, if the merge is atomic, by starting with bitmap1 tracking the sectors touched between times T1-T2 and bitmap2 from T2-present. Then we perform a transaction that does both a merge of bitmap2 into bitmap1, followed by a clear of bitmap2.  The point in time of our transaction is T3, and serves to reshuffle things so that now bitmap1 tracks sectors touched between T1-T3, and bitmap2 from T3-present.

Comment 3 Eric Blake 2018-09-19 21:44:12 UTC
(In reply to John Snow from comment #1)
> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg02358.html

In addition to the relaxation of the assertion for disabled bitmaps (in that linked thread), we probably should not declare this bug complete without also including a second patch that freezes any bitmap tied to an NBD export with x-nbd-server-add-bitmap (freeze goes away at subsequent nbd-server-remove), and double-check that merging to a frozen bitmap is a graceful error. Perhaps it also warrants iotests enhancements.

Comment 4 Eric Blake 2018-09-21 19:17:44 UTC
We also need to permit clearing of a disabled bitmap. And just as merge should not be permitted on a locked bitmap, neither should enable/disable be allowed on a locked bitmap, and it would not hurt if enable on an enabled bitmap, or disable on a disabled bitmap, reports an error about a redundant operation instead of being a no-op.

Comment 5 John Snow 2018-10-09 20:58:29 UTC
(In reply to Eric Blake from comment #4)
> We also need to permit clearing of a disabled bitmap. And just as merge
> should not be permitted on a locked bitmap, neither should enable/disable be
> allowed on a locked bitmap, and it would not hurt if enable on an enabled
> bitmap, or disable on a disabled bitmap, reports an error about a redundant
> operation instead of being a no-op.

https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00406.html
Awaiting merge upstream.

- Clearing is permitted on disabled bitmaps after this series
- Merge/Enable/Disable are all prohibited on locked/frozen bitmaps
- Redundant operations are not prohibited (new BZ if needed.)

"we probably should not declare this bug complete without also including a second patch that freezes any bitmap tied to an NBD export with x-nbd-server-add-bitmap (freeze goes away at subsequent nbd-server-remove), and double-check that merging to a frozen bitmap is a graceful error. Perhaps it also warrants iotests enhancements."

-NBD exports "lock" the bitmap, instead of "freezing" it
-merging to a locked/frozen bitmap is a graceful error.

Comment 6 Gu Nini 2018-10-12 09:15:28 UTC
Created attachment 1493178 [details]
gdb_debug_info_for_coredump-76-bz1631052-10122018

I reproduced the bug with following steps:

{"execute":"block-dirty-bitmap-add","arguments":{"node":"disk0","name":"bitmap0"}} 
{"return": {}}
{"execute":"block-dirty-bitmap-add","arguments":{"node":"disk0","name":"bitmap1"}}
{"return": {}}
{"execute":"x-block-dirty-bitmap-disable","arguments":{"node":"disk0","name":"bitmap1"}}
{"return": {}}
{"execute":"x-block-dirty-bitmap-merge","arguments":{"node":"disk0","src_name":"bitmap0","dst_name":"bitmap1"}}
Ncat: Connection reset by peer.


(qemu) 
(qemu) qemu-kvm: block/dirty-bitmap.c:775: bdrv_merge_dirty_bitmap: Assertion `bdrv_dirty_bitmap_enabled(dest)' failed.
./vm1.sh: line 30:  9204 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name

Comment 7 John Snow 2018-12-12 00:27:19 UTC
This is addressed by the patchset posted against BZ #1658343. Setting depends-on and TestOnly flag.

Eric, if the upstream behavior needs to be addressed in some way WRT your Libvirt API, can you open new bug(s) as needed?

Comment 9 aihua liang 2019-02-14 09:21:54 UTC
Test on qemu-kvm-rhev-2.12.0-23.el7.x86_64, don't this issue any more.
 
  Test steps:
    1.start guest with qemu cmds:
      -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \
      -blockdev driver=file,node-name=file_base,filename=/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2,auto-read-only=on \
      -blockdev driver=qcow2,file=file_base,node-name=drive_image1,auto-read-only=on \
      -device scsi-hd,id=image1,drive=drive_image1 \

    2.add bitmap0 on drive_image1
       {"execute": "block-dirty-bitmap-add","arguments":{ "node": "drive_image1", "name": "bitmap0"}}

    3.add a disabled bitmap bitmap_tmp
       {"execute": "block-dirty-bitmap-add","arguments":{ "node": "drive_image1", "name": "bitmap_tmp","x-disabled": true }}

    4.check bitmap info:
       {"execute":"query-block"}
{"return": [{"device": "", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 21474836480, "filename": "/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 4326883328, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "drive_image1", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2", "encryption_key_missing": false}, "qdev": "image1", "dirty-bitmaps": [{"name": "bitmap_tmp", "status": "disabled", "granularity": 65536, "count": 0}, {"name": "bitmap0", "status": "active", "granularity": 65536, "count": 9961472}], "type": "unknown"}]}

     5.merge bitmap0 to bitmap_tmp
       {"execute": "x-block-dirty-bitmap-merge","arguments":{"node": "drive_image1", "src_name": "bitmap0", "dst_name":"bitmap_tmp" } }

     6.check bitmap info:
       {"execute":"query-block"}
{"return": [{"device": "", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 21474836480, "filename": "/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 4326883328, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "drive_image1", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2", "encryption_key_missing": false}, "qdev": "image1", "dirty-bitmaps": [{"name": "bitmap_tmp", "status": "disabled", "granularity": 65536, "count": 10027008}, {"name": "bitmap0", "status": "active", "granularity": 65536, "count": 10223616}], "type": "unknown"}]}

Additional info:
  kernel version:3.10.0-993.el7.x86_64

Comment 10 aihua liang 2019-02-15 04:08:26 UTC
I'll set bug's status to "Verified" after it changes to "ON_QA".

Comment 11 aihua liang 2019-03-19 06:36:35 UTC
Set bug's status to "Verified", for it has keywords "TestOnly", thanks.

Comment 13 errata-xmlrpc 2019-08-22 09:18:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2553