Bug 1631052
Summary: | x-block-dirty-bitmap-merge into a disabled bitmap dest causes assert | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | John Snow <jsnow> | ||||
Component: | qemu-kvm-rhev | Assignee: | John Snow <jsnow> | ||||
Status: | CLOSED ERRATA | QA Contact: | aihua liang <aliang> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.7 | CC: | chayang, coli, eblake, juzhang, ngu, virt-maint | ||||
Target Milestone: | rc | Keywords: | TestOnly | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1638690 (view as bug list) | Environment: | |||||
Last Closed: | 2019-08-22 09:18:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1658343 | ||||||
Bug Blocks: | 1636224, 1638690, 1651787 | ||||||
Attachments: |
|
Description
John Snow
2018-09-19 19:59:19 UTC
Preferably complete successfully. Merging into a frozen bitmap (one in use by a block job) should probably be forbidden, but merging into a disabled one (where it is merely not tracking new writes) makes sense in at least the following scenario: If the image is keeping a linear sequence of bitmaps in order to track various points in time which can each serve as the reference point at the start of a differential backup, where all but the most recent bitmap is disabled, then creating the differential backup is a matter of creating a temporary live bitmap, then live-merging all of the earlier bitmaps in. In that context, merges can always be performed into a live bitmap, immediately prior to the transaction that kicks off the blockdev-backup job acting on the temporary bitmap. But once the user decides that a point in time is no longer important (that is, that no further differential backups will be made from a given point in time), the two bitmaps on either side of that point in time need to be merged into a single bitmap. If both of those bitmaps are disabled, and this bug is not fixed, then the only workaround I can think of to be able to perform the merge without triggering the assertion would be to temporarily pause the guest (to ensure no further I/O), then re-enable the destination bitmap, perform the merge, disable the destination bitmap, resume guest I/O, and finally delete the unused bitmap. But pausing the guest I/O is contrary to the goal of bitmap management being non-invasive to a running guest, compared to just allowing the merge of one disabled bitmap into another. I'm also trying to figure if merging a live bitmap into a disabled one ever makes sense. I suppose it does, if the merge is atomic, by starting with bitmap1 tracking the sectors touched between times T1-T2 and bitmap2 from T2-present. Then we perform a transaction that does both a merge of bitmap2 into bitmap1, followed by a clear of bitmap2. The point in time of our transaction is T3, and serves to reshuffle things so that now bitmap1 tracks sectors touched between T1-T3, and bitmap2 from T3-present. (In reply to John Snow from comment #1) > https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg02358.html In addition to the relaxation of the assertion for disabled bitmaps (in that linked thread), we probably should not declare this bug complete without also including a second patch that freezes any bitmap tied to an NBD export with x-nbd-server-add-bitmap (freeze goes away at subsequent nbd-server-remove), and double-check that merging to a frozen bitmap is a graceful error. Perhaps it also warrants iotests enhancements. We also need to permit clearing of a disabled bitmap. And just as merge should not be permitted on a locked bitmap, neither should enable/disable be allowed on a locked bitmap, and it would not hurt if enable on an enabled bitmap, or disable on a disabled bitmap, reports an error about a redundant operation instead of being a no-op. (In reply to Eric Blake from comment #4) > We also need to permit clearing of a disabled bitmap. And just as merge > should not be permitted on a locked bitmap, neither should enable/disable be > allowed on a locked bitmap, and it would not hurt if enable on an enabled > bitmap, or disable on a disabled bitmap, reports an error about a redundant > operation instead of being a no-op. https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00406.html Awaiting merge upstream. - Clearing is permitted on disabled bitmaps after this series - Merge/Enable/Disable are all prohibited on locked/frozen bitmaps - Redundant operations are not prohibited (new BZ if needed.) "we probably should not declare this bug complete without also including a second patch that freezes any bitmap tied to an NBD export with x-nbd-server-add-bitmap (freeze goes away at subsequent nbd-server-remove), and double-check that merging to a frozen bitmap is a graceful error. Perhaps it also warrants iotests enhancements." -NBD exports "lock" the bitmap, instead of "freezing" it -merging to a locked/frozen bitmap is a graceful error. Created attachment 1493178 [details] gdb_debug_info_for_coredump-76-bz1631052-10122018 I reproduced the bug with following steps: {"execute":"block-dirty-bitmap-add","arguments":{"node":"disk0","name":"bitmap0"}} {"return": {}} {"execute":"block-dirty-bitmap-add","arguments":{"node":"disk0","name":"bitmap1"}} {"return": {}} {"execute":"x-block-dirty-bitmap-disable","arguments":{"node":"disk0","name":"bitmap1"}} {"return": {}} {"execute":"x-block-dirty-bitmap-merge","arguments":{"node":"disk0","src_name":"bitmap0","dst_name":"bitmap1"}} Ncat: Connection reset by peer. (qemu) (qemu) qemu-kvm: block/dirty-bitmap.c:775: bdrv_merge_dirty_bitmap: Assertion `bdrv_dirty_bitmap_enabled(dest)' failed. ./vm1.sh: line 30: 9204 Aborted (core dumped) /usr/libexec/qemu-kvm -name This is addressed by the patchset posted against BZ #1658343. Setting depends-on and TestOnly flag. Eric, if the upstream behavior needs to be addressed in some way WRT your Libvirt API, can you open new bug(s) as needed? Test on qemu-kvm-rhev-2.12.0-23.el7.x86_64, don't this issue any more. Test steps: 1.start guest with qemu cmds: -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \ -blockdev driver=file,node-name=file_base,filename=/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2,auto-read-only=on \ -blockdev driver=qcow2,file=file_base,node-name=drive_image1,auto-read-only=on \ -device scsi-hd,id=image1,drive=drive_image1 \ 2.add bitmap0 on drive_image1 {"execute": "block-dirty-bitmap-add","arguments":{ "node": "drive_image1", "name": "bitmap0"}} 3.add a disabled bitmap bitmap_tmp {"execute": "block-dirty-bitmap-add","arguments":{ "node": "drive_image1", "name": "bitmap_tmp","x-disabled": true }} 4.check bitmap info: {"execute":"query-block"} {"return": [{"device": "", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 21474836480, "filename": "/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 4326883328, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "drive_image1", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2", "encryption_key_missing": false}, "qdev": "image1", "dirty-bitmaps": [{"name": "bitmap_tmp", "status": "disabled", "granularity": 65536, "count": 0}, {"name": "bitmap0", "status": "active", "granularity": 65536, "count": 9961472}], "type": "unknown"}]} 5.merge bitmap0 to bitmap_tmp {"execute": "x-block-dirty-bitmap-merge","arguments":{"node": "drive_image1", "src_name": "bitmap0", "dst_name":"bitmap_tmp" } } 6.check bitmap info: {"execute":"query-block"} {"return": [{"device": "", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 21474836480, "filename": "/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 4326883328, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "drive_image1", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2", "encryption_key_missing": false}, "qdev": "image1", "dirty-bitmaps": [{"name": "bitmap_tmp", "status": "disabled", "granularity": 65536, "count": 10027008}, {"name": "bitmap0", "status": "active", "granularity": 65536, "count": 10223616}], "type": "unknown"}]} Additional info: kernel version:3.10.0-993.el7.x86_64 I'll set bug's status to "Verified" after it changes to "ON_QA". Set bug's status to "Verified", for it has keywords "TestOnly", thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2553 |