Description of problem: As subject. For luks-inside-qcow2 images with remain cluster_size, like 512, 1k, ...2M , there is no the issue. And for plain qcow2 without encryption, there is no the issue. Version-Release number of selected component (if applicable): qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb kernel-4.18.0-148.el8.x86_64 How reproducible: 100% Steps to Reproduce: Scenario 1(With cluster_size=2k) # qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=2k,encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G # qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x5800 size 0x1f9000 Leaked cluster 11 refcount=1 reference=0 Leaked cluster 12 refcount=1 reference=0 Leaked cluster 13 refcount=1 reference=0 Leaked cluster 14 refcount=1 reference=0 Leaked cluster 15 refcount=1 reference=0 Leaked cluster 16 refcount=1 reference=0 Leaked cluster 17 refcount=1 reference=0 Leaked cluster 18 refcount=1 reference=0 Leaked cluster 19 refcount=1 reference=0 Leaked cluster 20 refcount=1 reference=0 Leaked cluster 21 refcount=1 reference=0 Leaked cluster 22 refcount=1 reference=0 Leaked cluster 23 refcount=1 reference=0 Leaked cluster 24 refcount=1 reference=0 Leaked cluster 25 refcount=1 reference=0 Leaked cluster 26 refcount=1 reference=0 Leaked cluster 27 refcount=1 reference=0 Leaked cluster 28 refcount=1 reference=0 Leaked cluster 29 refcount=1 reference=0 Leaked cluster 30 refcount=1 reference=0 Leaked cluster 31 refcount=1 reference=0 Leaked cluster 32 refcount=1 reference=0 Leaked cluster 33 refcount=1 reference=0 Leaked cluster 34 refcount=1 reference=0 Leaked cluster 35 refcount=1 reference=0 Leaked cluster 36 refcount=1 reference=0 Leaked cluster 37 refcount=1 reference=0 Leaked cluster 38 refcount=1 reference=0 Leaked cluster 39 refcount=1 reference=0 Leaked cluster 40 refcount=1 reference=0 Leaked cluster 41 refcount=1 reference=0 Leaked cluster 42 refcount=1 reference=0 Leaked cluster 43 refcount=1 reference=0 Leaked cluster 44 refcount=1 reference=0 Leaked cluster 45 refcount=1 reference=0 Leaked cluster 46 refcount=1 reference=0 Leaked cluster 47 refcount=1 reference=0 Leaked cluster 48 refcount=1 reference=0 Leaked cluster 49 refcount=1 reference=0 Leaked cluster 50 refcount=1 reference=0 Leaked cluster 51 refcount=1 reference=0 Leaked cluster 52 refcount=1 reference=0 Leaked cluster 53 refcount=1 reference=0 Leaked cluster 54 refcount=1 reference=0 Leaked cluster 55 refcount=1 reference=0 Leaked cluster 56 refcount=1 reference=0 Leaked cluster 57 refcount=1 reference=0 Leaked cluster 58 refcount=1 reference=0 Leaked cluster 59 refcount=1 reference=0 Leaked cluster 60 refcount=1 reference=0 Leaked cluster 61 refcount=1 reference=0 Leaked cluster 62 refcount=1 reference=0 Leaked cluster 63 refcount=1 reference=0 Leaked cluster 64 refcount=1 reference=0 Leaked cluster 65 refcount=1 reference=0 Leaked cluster 66 refcount=1 reference=0 Leaked cluster 67 refcount=1 reference=0 Leaked cluster 68 refcount=1 reference=0 Leaked cluster 69 refcount=1 reference=0 Leaked cluster 70 refcount=1 reference=0 Leaked cluster 71 refcount=1 reference=0 Leaked cluster 72 refcount=1 reference=0 Leaked cluster 73 refcount=1 reference=0 Leaked cluster 74 refcount=1 reference=0 Leaked cluster 75 refcount=1 reference=0 Leaked cluster 76 refcount=1 reference=0 Leaked cluster 77 refcount=1 reference=0 Leaked cluster 78 refcount=1 reference=0 Leaked cluster 79 refcount=1 reference=0 Leaked cluster 80 refcount=1 reference=0 Leaked cluster 81 refcount=1 reference=0 Leaked cluster 82 refcount=1 reference=0 Leaked cluster 83 refcount=1 reference=0 Leaked cluster 84 refcount=1 reference=0 Leaked cluster 85 refcount=1 reference=0 Leaked cluster 86 refcount=1 reference=0 Leaked cluster 87 refcount=1 reference=0 Leaked cluster 88 refcount=1 reference=0 Leaked cluster 89 refcount=1 reference=0 Leaked cluster 90 refcount=1 reference=0 Leaked cluster 91 refcount=1 reference=0 Leaked cluster 92 refcount=1 reference=0 Leaked cluster 93 refcount=1 reference=0 Leaked cluster 94 refcount=1 reference=0 Leaked cluster 95 refcount=1 reference=0 Leaked cluster 96 refcount=1 reference=0 Leaked cluster 97 refcount=1 reference=0 Leaked cluster 98 refcount=1 reference=0 Leaked cluster 99 refcount=1 reference=0 Leaked cluster 100 refcount=1 reference=0 Leaked cluster 101 refcount=1 reference=0 Leaked cluster 102 refcount=1 reference=0 Leaked cluster 103 refcount=1 reference=0 Leaked cluster 104 refcount=1 reference=0 Leaked cluster 105 refcount=1 reference=0 Leaked cluster 106 refcount=1 reference=0 Leaked cluster 107 refcount=1 reference=0 Leaked cluster 108 refcount=1 reference=0 Leaked cluster 109 refcount=1 reference=0 Leaked cluster 110 refcount=1 reference=0 Leaked cluster 111 refcount=1 reference=0 Leaked cluster 112 refcount=1 reference=0 Leaked cluster 113 refcount=1 reference=0 Leaked cluster 114 refcount=1 reference=0 Leaked cluster 115 refcount=1 reference=0 Leaked cluster 116 refcount=1 reference=0 Leaked cluster 117 refcount=1 reference=0 Leaked cluster 118 refcount=1 reference=0 Leaked cluster 119 refcount=1 reference=0 Leaked cluster 120 refcount=1 reference=0 Leaked cluster 121 refcount=1 reference=0 Leaked cluster 122 refcount=1 reference=0 Leaked cluster 123 refcount=1 reference=0 Leaked cluster 124 refcount=1 reference=0 Leaked cluster 125 refcount=1 reference=0 Leaked cluster 126 refcount=1 reference=0 Leaked cluster 127 refcount=1 reference=0 Leaked cluster 128 refcount=1 reference=0 Leaked cluster 129 refcount=1 reference=0 Leaked cluster 130 refcount=1 reference=0 Leaked cluster 131 refcount=1 reference=0 Leaked cluster 132 refcount=1 reference=0 Leaked cluster 133 refcount=1 reference=0 Leaked cluster 134 refcount=1 reference=0 Leaked cluster 135 refcount=1 reference=0 Leaked cluster 136 refcount=1 reference=0 Leaked cluster 137 refcount=1 reference=0 1 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 127 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 282624 Scenario 2(With cluster_size=4K) # qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=4k,encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G # qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x4000 size 0x1f9000 Leaked cluster 4 refcount=1 reference=0 Leaked cluster 5 refcount=1 reference=0 Leaked cluster 6 refcount=1 reference=0 Leaked cluster 7 refcount=1 reference=0 Leaked cluster 8 refcount=1 reference=0 Leaked cluster 9 refcount=1 reference=0 Leaked cluster 10 refcount=1 reference=0 Leaked cluster 11 refcount=1 reference=0 Leaked cluster 12 refcount=1 reference=0 Leaked cluster 13 refcount=1 reference=0 Leaked cluster 14 refcount=1 reference=0 Leaked cluster 15 refcount=1 reference=0 Leaked cluster 16 refcount=1 reference=0 Leaked cluster 17 refcount=1 reference=0 Leaked cluster 18 refcount=1 reference=0 Leaked cluster 19 refcount=1 reference=0 Leaked cluster 20 refcount=1 reference=0 Leaked cluster 21 refcount=1 reference=0 Leaked cluster 22 refcount=1 reference=0 Leaked cluster 23 refcount=1 reference=0 Leaked cluster 24 refcount=1 reference=0 Leaked cluster 25 refcount=1 reference=0 Leaked cluster 26 refcount=1 reference=0 Leaked cluster 27 refcount=1 reference=0 Leaked cluster 28 refcount=1 reference=0 Leaked cluster 29 refcount=1 reference=0 Leaked cluster 30 refcount=1 reference=0 Leaked cluster 31 refcount=1 reference=0 Leaked cluster 32 refcount=1 reference=0 Leaked cluster 33 refcount=1 reference=0 Leaked cluster 34 refcount=1 reference=0 Leaked cluster 35 refcount=1 reference=0 Leaked cluster 36 refcount=1 reference=0 Leaked cluster 37 refcount=1 reference=0 Leaked cluster 38 refcount=1 reference=0 Leaked cluster 39 refcount=1 reference=0 Leaked cluster 40 refcount=1 reference=0 Leaked cluster 41 refcount=1 reference=0 Leaked cluster 42 refcount=1 reference=0 Leaked cluster 43 refcount=1 reference=0 Leaked cluster 44 refcount=1 reference=0 Leaked cluster 45 refcount=1 reference=0 Leaked cluster 46 refcount=1 reference=0 Leaked cluster 47 refcount=1 reference=0 Leaked cluster 48 refcount=1 reference=0 Leaked cluster 49 refcount=1 reference=0 Leaked cluster 50 refcount=1 reference=0 Leaked cluster 51 refcount=1 reference=0 Leaked cluster 52 refcount=1 reference=0 Leaked cluster 53 refcount=1 reference=0 Leaked cluster 54 refcount=1 reference=0 Leaked cluster 55 refcount=1 reference=0 Leaked cluster 56 refcount=1 reference=0 Leaked cluster 57 refcount=1 reference=0 Leaked cluster 58 refcount=1 reference=0 Leaked cluster 59 refcount=1 reference=0 Leaked cluster 60 refcount=1 reference=0 Leaked cluster 61 refcount=1 reference=0 Leaked cluster 62 refcount=1 reference=0 Leaked cluster 63 refcount=1 reference=0 Leaked cluster 64 refcount=1 reference=0 Leaked cluster 65 refcount=1 reference=0 Leaked cluster 66 refcount=1 reference=0 Leaked cluster 67 refcount=1 reference=0 1 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 64 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 278528 Actual results: As above, the image is corrupted. Expected results: Image is well. There is no error in the image. Additional info: 1. Tried with all the supported cluster_size via below script. Only 2k and 4k hit the issue. #!/bin/bash for ((i=9; i<=21; i++)) do cluster=$(( 2 ** ${i} )) qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=${cluster},encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G > /dev/null qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' > /dev/null 2>&1 if [ $? != 0 ]; then echo "Check failed with cluster_size=${cluster}!" fi done # sh test.sh Check failed with cluster_size=2048! Check failed with cluster_size=4096! 2. There is no the issue for plain qcow2 images without encryption. #!/bin/bash for ((i=9; i<=21; i++)) do cluster=$(( 2 ** ${i} )) qemu-img create -f qcow2 -o cluster_size=${cluster} cluster_size_check.qcow2 1G > /dev/null qemu-img check cluster_size_check.qcow2 > /dev/null 2>&1 if [ $? != 0 ]; then echo "Check failed with cluster_size=${cluster}!" fi done # sh test.sh # echo $? 0
*** Bug 1775480 has been marked as a duplicate of this bug. ***
git bisect blames this commit upstream a5fff8d4b4d928311a5005efa12d0991fe3b66f9 is the first bad commit commit a5fff8d4b4d928311a5005efa12d0991fe3b66f9 Author: Vladimir Sementsov-Ogievskiy <vsementsov> Date: Wed Feb 27 16:14:30 2019 +0300 qcow2-refcount: avoid eating RAM qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat an unpredictable amount of memory on corrupted table entries, which are referencing regions far beyond the end of file. Prevent this, by skipping such regions from further processing. Interesting that iotest 138 checks exactly the behavior which we fix here. So, change the test appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov> Reviewed-by: Max Reitz <mreitz> Message-id: 20190227131433.197063-3-vsementsov Signed-off-by: Max Reitz <mreitz> block/qcow2-refcount.c | 19 +++++++++++++++++++ tests/qemu-iotests/138 | 12 +++++------- tests/qemu-iotests/138.out | 5 ++++- 3 files changed, 28 insertions(+), 8 deletions(-)
IIUC, what's happening here is that the qcow2_crypto_hdr_init_func() is allocating a series of clusters big enough to hold the LUKS header, and key material that follows it. We'll only ever initialize key material for the first slot, however, so many of the clusters will remain unwritten, presumed to return all zeroes on future reads but it doesn't really matter if not. With a5fff8d4b4d928311a5005efa12d0991fe3b66f9 there was some optimization done which appears to believe that all allocated clusters will have had some data written to them. This assumption is invalid for LUKS headers and so it mistakenly complains about corrupt files. As a workaround we can make the image creation function explicitly write zeros to all clusters allocated for the LUKS header & key material.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Merged in 5.0.0 release upstream as commit 087ab8e775f48766068e65de1bc99d03b40d1670 Author: Daniel P. Berrangé <berrange> Date: Fri Feb 7 13:55:20 2020 +0000 block: always fill entire LUKS header space with zeros When initializing the LUKS header the size with default encryption parameters will currently be 2068480 bytes. This is rounded up to a multiple of the cluster size, 2081792, with 64k sectors. If the end of the header is not the same as the end of the cluster we fill the extra space with zeros. This was forgetting that not even the space allocated for the header will be fully initialized, as we only write key material for the first key slot. The space left for the other 7 slots is never written to. An optimization to the ref count checking code: commit a5fff8d4b4d928311a5005efa12d0991fe3b66f9 (refs/bisect/bad) Author: Vladimir Sementsov-Ogievskiy <vsementsov> Date: Wed Feb 27 16:14:30 2019 +0300 qcow2-refcount: avoid eating RAM made the assumption that every cluster which was allocated would have at least some data written to it. This was violated by way the LUKS header is only partially written, with much space simply reserved for future use. Depending on the cluster size this problem was masked by the logic which wrote zeros between the end of the LUKS header and the end of the cluster. $ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \ -f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\ encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \ cluster_size_check.qcow2 100M Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600 encrypt.format=luks encrypt.key-secret=cluster_encrypt0 encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16 $ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \ 'json:{"driver": "qcow2", "encrypt.format": "luks", \ "encrypt.key-secret": "cluster_encrypt0", \ "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x2000 size 0x1f9000 Leaked cluster 4 refcount=1 reference=0 ...snip... Leaked cluster 130 refcount=1 reference=0 1 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 127 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 268288 The problem only exists when the disk image is entirely empty. Writing data to the disk image payload will solve the problem by causing the end of the file to be extended further. The change fixes it by ensuring that the entire allocated LUKS header region is fully initialized with zeros. The qemu-img check will still fail for any pre-existing disk images created prior to this change, unless at least 1 byte of the payload is written to. Fully writing zeros to the entire LUKS header is a good idea regardless as it ensures that space has been allocated on the host filesystem (or whatever block storage backend is used). Signed-off-by: Daniel P. Berrangé <berrange> Message-Id: <20200207135520.2669430-1-berrange> Reviewed-by: Eric Blake <eblake> Signed-off-by: Max Reitz <mreitz>
Tested with qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2, not hit this issue. So set status to VERIFIED. Details: Version: kernel-4.18.0-193.5.1.el8_2.x86_64 qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2 spice-server-0.14.2-1.el8.x86_64 seavgabios-bin-1.13.0-1.module+el8.2.0+5520+4e5817f3.noarch seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64 Scenario 1(With cluster_size=2k) # qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=2k,encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G # qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' No errors were found on the image. Image end offset: 2091008 Scenario 2(With cluster_size=4K) # qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=4k,encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G # qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' No errors were found on the image. Image end offset: 2084864 Tried with all the supported cluster_size via below script. All work well. #!/bin/bash for ((i=9; i<=21; i++)) do cluster=$(( 2 ** ${i} )) qemu-img create --object secret,id=cluster_encrypt0,data=redhat -f qcow2 -o cluster_size=${cluster},encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 cluster_size_check.qcow2 1G > /dev/null qemu-img check --object secret,id=cluster_encrypt0,data=redhat 'json:{"driver": "qcow2", "encrypt.format": "luks", "encrypt.key-secret": "cluster_encrypt0", "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' > /dev/null 2>&1 if [ $? != 0 ]; then echo "Check failed with cluster_size=${cluster}!" fi done # sh test.sh # echo $? 0 Tried with plain qcow2 images without encryption. All work well. #!/bin/bash for ((i=9; i<=21; i++)) do cluster=$(( 2 ** ${i} )) qemu-img create -f qcow2 -o cluster_size=${cluster} cluster_size_check.qcow2 1G > /dev/null qemu-img check cluster_size_check.qcow2 > /dev/null 2>&1 if [ $? != 0 ]; then echo "Check failed with cluster_size=${cluster}!" fi done # sh test.sh # echo $? 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3172