Bug 1775462
Summary: | Creating luks-inside-qcow2 images with cluster_size=2k/4k will get a corrupted image | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Tingting Mao <timao> |
Component: | qemu-kvm | Assignee: | Daniel Berrangé <berrange> |
qemu-kvm sub component: | General | QA Contact: | Xueqiang Wei <xuwei> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | areis, berrange, coli, ehadley, jinzhao, juzhang, qzhang, virt-maint, zhenyzha |
Version: | 8.2 | Keywords: | Regression, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-28 07:12:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tingting Mao
2019-11-22 03:00:51 UTC
*** Bug 1775480 has been marked as a duplicate of this bug. *** git bisect blames this commit upstream a5fff8d4b4d928311a5005efa12d0991fe3b66f9 is the first bad commit commit a5fff8d4b4d928311a5005efa12d0991fe3b66f9 Author: Vladimir Sementsov-Ogievskiy <vsementsov> Date: Wed Feb 27 16:14:30 2019 +0300 qcow2-refcount: avoid eating RAM qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat an unpredictable amount of memory on corrupted table entries, which are referencing regions far beyond the end of file. Prevent this, by skipping such regions from further processing. Interesting that iotest 138 checks exactly the behavior which we fix here. So, change the test appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov> Reviewed-by: Max Reitz <mreitz> Message-id: 20190227131433.197063-3-vsementsov Signed-off-by: Max Reitz <mreitz> block/qcow2-refcount.c | 19 +++++++++++++++++++ tests/qemu-iotests/138 | 12 +++++------- tests/qemu-iotests/138.out | 5 ++++- 3 files changed, 28 insertions(+), 8 deletions(-) IIUC, what's happening here is that the qcow2_crypto_hdr_init_func() is allocating a series of clusters big enough to hold the LUKS header, and key material that follows it. We'll only ever initialize key material for the first slot, however, so many of the clusters will remain unwritten, presumed to return all zeroes on future reads but it doesn't really matter if not. With a5fff8d4b4d928311a5005efa12d0991fe3b66f9 there was some optimization done which appears to believe that all allocated clusters will have had some data written to them. This assumption is invalid for LUKS headers and so it mistakenly complains about corrupt files. As a workaround we can make the image creation function explicitly write zeros to all clusters allocated for the LUKS header & key material. *** Bug 1775480 has been marked as a duplicate of this bug. *** QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Merged in 5.0.0 release upstream as commit 087ab8e775f48766068e65de1bc99d03b40d1670 Author: Daniel P. Berrangé <berrange> Date: Fri Feb 7 13:55:20 2020 +0000 block: always fill entire LUKS header space with zeros When initializing the LUKS header the size with default encryption parameters will currently be 2068480 bytes. This is rounded up to a multiple of the cluster size, 2081792, with 64k sectors. If the end of the header is not the same as the end of the cluster we fill the extra space with zeros. This was forgetting that not even the space allocated for the header will be fully initialized, as we only write key material for the first key slot. The space left for the other 7 slots is never written to. An optimization to the ref count checking code: commit a5fff8d4b4d928311a5005efa12d0991fe3b66f9 (refs/bisect/bad) Author: Vladimir Sementsov-Ogievskiy <vsementsov> Date: Wed Feb 27 16:14:30 2019 +0300 qcow2-refcount: avoid eating RAM made the assumption that every cluster which was allocated would have at least some data written to it. This was violated by way the LUKS header is only partially written, with much space simply reserved for future use. Depending on the cluster size this problem was masked by the logic which wrote zeros between the end of the LUKS header and the end of the cluster. $ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \ -f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\ encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \ cluster_size_check.qcow2 100M Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600 encrypt.format=luks encrypt.key-secret=cluster_encrypt0 encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16 $ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \ 'json:{"driver": "qcow2", "encrypt.format": "luks", \ "encrypt.key-secret": "cluster_encrypt0", \ "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x2000 size 0x1f9000 Leaked cluster 4 refcount=1 reference=0 ...snip... Leaked cluster 130 refcount=1 reference=0 1 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 127 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 268288 The problem only exists when the disk image is entirely empty. Writing data to the disk image payload will solve the problem by causing the end of the file to be extended further. The change fixes it by ensuring that the entire allocated LUKS header region is fully initialized with zeros. The qemu-img check will still fail for any pre-existing disk images created prior to this change, unless at least 1 byte of the payload is written to. Fully writing zeros to the entire LUKS header is a good idea regardless as it ensures that space has been allocated on the host filesystem (or whatever block storage backend is used). Signed-off-by: Daniel P. Berrangé <berrange> Message-Id: <20200207135520.2669430-1-berrange> Reviewed-by: Eric Blake <eblake> Signed-off-by: Max Reitz <mreitz> Tested with qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2, not hit this issue. So set status to VERIFIED. Details: Version: kernel-4.18.0-193.5.1.el8_2.x86_64 qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2 spice-server-0.14.2-1.el8.x86_64 seavgabios-bin-1.13.0-1.module+el8.2.0+5520+4e5817f3.noarch seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64 Scenario 1(With cluster_size=2k) # qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=2k,encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G # qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' No errors were found on the image. Image end offset: 2091008 Scenario 2(With cluster_size=4K) # qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=4k,encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G # qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' No errors were found on the image. Image end offset: 2084864 Tried with all the supported cluster_size via below script. All work well. #!/bin/bash for ((i=9; i<=21; i++)) do cluster=$(( 2 ** ${i} )) qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=${cluster},encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G > /dev/null qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' > /dev/null 2>&1 if [ $? != 0 ]; then echo "Check failed with cluster_size=${cluster}!" fi done # sh test.sh # echo $? 0 Tried with plain qcow2 images without encryption. All work well. #!/bin/bash for ((i=9; i<=21; i++)) do cluster=$(( 2 ** ${i} )) qemu-img create -f qcow2 -o cluster_size=${cluster} cluster_size_check.qcow2 1G > /dev/null qemu-img check cluster_size_check.qcow2 > /dev/null 2>&1 if [ $? != 0 ]; then echo "Check failed with cluster_size=${cluster}!" fi done # sh test.sh # echo $? 0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3172 |