Bug 1527085

Summary: The copied flag should be updated during '-r leaks'
Product: Red Hat Enterprise Linux 7 Reporter: Ping Li <pingl>
Component: qemu-kvm-rhevAssignee: Hanna Czenczek <hreitz>
Status: CLOSED ERRATA QA Contact: Tingting Mao <timao>
Severity: low Docs Contact:
Priority: low    
Version: 7.5CC: aliang, chayang, coli, hreitz, juzhang, knoel, kwolf, michen, mrezanin, ngu, pingl, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-5.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1527122 (view as bug list) Environment:
Last Closed: 2018-11-01 11:04:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1518738    
Bug Blocks: 1527122    

Description Ping Li 2017-12-18 14:13:00 UTC
Description of problem:
Using '-r leaks' to repair cluster leaks, still found corruptions on the image. So OFLAG_COPIED also should be updated during repairing. 

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-12.el7
kernel-3.10.0-823.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create image
# qemu-img create -f qcow2 base.qcow2 100M
Formatting 'base.qcow2', fmt=qcow2 size=104857600 cluster_size=65536 lazy_refcounts=off refcount_bits=16
# qemu-io -c "write 0 1M" base.qcow2 
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0973 sec (10.271 MiB/sec and 10.2712 ops/sec)

2. Create internal snapshot
# qemu-img snapshot -c sn base.qcow2 

3. Delete internal snapshot through blkdebug
# cat >> blkdebug.cfg << eof
[inject-error]
event = "cluster_free"
errno = "28"
immediately = "off"
eof
# qemu-img snapshot -d sn blkdebug:blkdebug.cfg:base.qcow2 
qcow2_free_clusters failed: No space left on device
qemu-img: Could not delete snapshot 'sn': Failed to free the cluster and L1 table: No space left on device

4. Check the image
# qemu-img check base.qcow2 
Leaked cluster 4 refcount=2 reference=1
Leaked cluster 5 refcount=2 reference=1
Leaked cluster 6 refcount=2 reference=1
Leaked cluster 7 refcount=2 reference=1
Leaked cluster 8 refcount=2 reference=1
Leaked cluster 9 refcount=2 reference=1
Leaked cluster 10 refcount=2 reference=1
Leaked cluster 11 refcount=2 reference=1
Leaked cluster 12 refcount=2 reference=1
Leaked cluster 13 refcount=2 reference=1
Leaked cluster 14 refcount=2 reference=1
Leaked cluster 15 refcount=2 reference=1
Leaked cluster 16 refcount=2 reference=1
Leaked cluster 17 refcount=2 reference=1
Leaked cluster 18 refcount=2 reference=1
Leaked cluster 19 refcount=2 reference=1
Leaked cluster 20 refcount=2 reference=1
Leaked cluster 21 refcount=1 reference=0
Leaked cluster 22 refcount=1 reference=0

19 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
16/1600 = 1.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 1507328
# echo $?
3   -------->  Check completed, image has leaked clusters, but is good otherwise

5. Repair cluster leaks
# qemu-img check -r leaks base.qcow2
Leaked cluster 4 refcount=2 reference=1
Leaked cluster 5 refcount=2 reference=1
Leaked cluster 6 refcount=2 reference=1
Leaked cluster 7 refcount=2 reference=1
Leaked cluster 8 refcount=2 reference=1
Leaked cluster 9 refcount=2 reference=1
Leaked cluster 10 refcount=2 reference=1
Leaked cluster 11 refcount=2 reference=1
Leaked cluster 12 refcount=2 reference=1
Leaked cluster 13 refcount=2 reference=1
Leaked cluster 14 refcount=2 reference=1
Leaked cluster 15 refcount=2 reference=1
Leaked cluster 16 refcount=2 reference=1
Leaked cluster 17 refcount=2 reference=1
Leaked cluster 18 refcount=2 reference=1
Leaked cluster 19 refcount=2 reference=1
Leaked cluster 20 refcount=2 reference=1
Leaked cluster 21 refcount=1 reference=0
Leaked cluster 22 refcount=1 reference=0
Repairing cluster 4 refcount=2 reference=1
Repairing cluster 5 refcount=2 reference=1
Repairing cluster 6 refcount=2 reference=1
Repairing cluster 7 refcount=2 reference=1
Repairing cluster 8 refcount=2 reference=1
Repairing cluster 9 refcount=2 reference=1
Repairing cluster 10 refcount=2 reference=1
Repairing cluster 11 refcount=2 reference=1
Repairing cluster 12 refcount=2 reference=1
Repairing cluster 13 refcount=2 reference=1
Repairing cluster 14 refcount=2 reference=1
Repairing cluster 15 refcount=2 reference=1
Repairing cluster 16 refcount=2 reference=1
Repairing cluster 17 refcount=2 reference=1
Repairing cluster 18 refcount=2 reference=1
Repairing cluster 19 refcount=2 reference=1
Repairing cluster 20 refcount=2 reference=1
Repairing cluster 21 refcount=1 reference=0
Repairing cluster 22 refcount=1 reference=0
ERROR OFLAG_COPIED L2 cluster: l1_index=0 l1_entry=40000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=50000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=60000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=70000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=80000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=90000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=a0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=b0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=c0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=d0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=e0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=f0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=100000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=110000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=120000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=130000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=140000 refcount=1
The following inconsistencies were found and repaired:

    19 leaked clusters
    0 corruptions

Double checking the fixed image now...
ERROR OFLAG_COPIED L2 cluster: l1_index=0 l1_entry=40000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=50000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=60000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=70000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=80000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=90000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=a0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=b0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=c0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=d0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=e0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=f0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=100000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=110000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=120000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=130000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=140000 refcount=1

17 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.
16/1600 = 1.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 1376256
# echo $?
2   -------->  Check completed, image is corrupted

Actual results:
Corruptions were found after repairing

Expected results:
No errors were found after repairing

Additional info:

Comment 3 Hanna Czenczek 2018-04-28 17:06:02 UTC
Sent an upstream series: http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg05252.html

Comment 5 Miroslav Rezanina 2018-06-25 14:07:57 UTC
Fix included in qemu-kvm-rhev-2.12.0-5.el7

Comment 7 Ping Li 2018-06-27 02:28:50 UTC
Verified the issue with below packages and test steps.

Packages tested:
kernel-3.10.0-915.el7.x86_64
qemu-kvm-rhev-2.12.0-5.el7

Test steps:
1. Create image and write data into the image
#  qemu-img create -f qcow2 base.qcow2 100M
# qemu-io -c "write 0 1M" base.qcow2

2. Create internal snapshot
# qemu-img snapshot -c sn base.qcow2

3. Delete internal snapshot through blkdebug
# cat >> blkdebug.cfg << eof
[inject-error]
event = "cluster_free"
errno = "28"
immediately = "off"
eof
]# qemu-img snapshot -d sn blkdebug:blkdebug.cfg:base.qcow2 
qcow2_free_clusters failed: No space left on device
qemu-img: Could not delete snapshot 'sn': Failed to free the cluster and L1 table: No space left on device

4. Check the image
# qemu-img check base.qcow2 
Leaked cluster 4 refcount=2 reference=1
Leaked cluster 5 refcount=2 reference=1
Leaked cluster 6 refcount=2 reference=1
Leaked cluster 7 refcount=2 reference=1
Leaked cluster 8 refcount=2 reference=1
Leaked cluster 9 refcount=2 reference=1
Leaked cluster 10 refcount=2 reference=1
Leaked cluster 11 refcount=2 reference=1
Leaked cluster 12 refcount=2 reference=1
Leaked cluster 13 refcount=2 reference=1
Leaked cluster 14 refcount=2 reference=1
Leaked cluster 15 refcount=2 reference=1
Leaked cluster 16 refcount=2 reference=1
Leaked cluster 17 refcount=2 reference=1
Leaked cluster 18 refcount=2 reference=1
Leaked cluster 19 refcount=2 reference=1
Leaked cluster 20 refcount=2 reference=1
Leaked cluster 21 refcount=1 reference=0
Leaked cluster 22 refcount=1 reference=0

19 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
16/1600 = 1.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 1507328

5. Repair cluster leaks
# qemu-img check -r leaks base.qcow2
Leaked cluster 4 refcount=2 reference=1
Leaked cluster 5 refcount=2 reference=1
Leaked cluster 6 refcount=2 reference=1
Leaked cluster 7 refcount=2 reference=1
Leaked cluster 8 refcount=2 reference=1
Leaked cluster 9 refcount=2 reference=1
Leaked cluster 10 refcount=2 reference=1
Leaked cluster 11 refcount=2 reference=1
Leaked cluster 12 refcount=2 reference=1
Leaked cluster 13 refcount=2 reference=1
Leaked cluster 14 refcount=2 reference=1
Leaked cluster 15 refcount=2 reference=1
Leaked cluster 16 refcount=2 reference=1
Leaked cluster 17 refcount=2 reference=1
Leaked cluster 18 refcount=2 reference=1
Leaked cluster 19 refcount=2 reference=1
Leaked cluster 20 refcount=2 reference=1
Leaked cluster 21 refcount=1 reference=0
Leaked cluster 22 refcount=1 reference=0
Repairing cluster 4 refcount=2 reference=1
Repairing cluster 5 refcount=2 reference=1
Repairing cluster 6 refcount=2 reference=1
Repairing cluster 7 refcount=2 reference=1
Repairing cluster 8 refcount=2 reference=1
Repairing cluster 9 refcount=2 reference=1
Repairing cluster 10 refcount=2 reference=1
Repairing cluster 11 refcount=2 reference=1
Repairing cluster 12 refcount=2 reference=1
Repairing cluster 13 refcount=2 reference=1
Repairing cluster 14 refcount=2 reference=1
Repairing cluster 15 refcount=2 reference=1
Repairing cluster 16 refcount=2 reference=1
Repairing cluster 17 refcount=2 reference=1
Repairing cluster 18 refcount=2 reference=1
Repairing cluster 19 refcount=2 reference=1
Repairing cluster 20 refcount=2 reference=1
Repairing cluster 21 refcount=1 reference=0
Repairing cluster 22 refcount=1 reference=0
Repairing OFLAG_COPIED L2 cluster: l1_index=0 l1_entry=40000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=50000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=60000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=70000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=80000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=90000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=a0000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=b0000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=c0000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=d0000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=e0000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=f0000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=100000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=110000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=120000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=130000 refcount=1
Repairing OFLAG_COPIED data cluster: l2_entry=140000 refcount=1
The following inconsistencies were found and repaired:

    19 leaked clusters
    17 corruptions

Double checking the fixed image now...
No errors were found on the image.   -----> Leaked clusters were repaired
16/1600 = 1.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 1376256

Comment 11 Ping Li 2018-07-02 05:29:47 UTC
Reproduced the issue mentioned in comment 8 with latest build qemu-kvm-rhev-2.12.0-6.el7

Comment 12 Tingting Mao 2018-07-23 02:21:13 UTC
Reproduced the issue in comment 8 with latest qemu-kvm-rhev-2.12.0-7.el7 package.

Comment 13 Tingting Mao 2018-07-25 07:31:46 UTC
For qemu-kvm-rhev-2.12.0-8.el7, there is also the issue. Besides 217, there is also no execution permission for 214, 215, 222, 223 and 226, so I reported a new bug[1] for all of them. And set this bug as verified.

[1]Bug 1608229 - There is no execution permission for several cases in qemu-iotests directory

Comment 14 errata-xmlrpc 2018-11-01 11:04:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443