Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1052093

Summary:	qcow2 corruptions (leaked clusters after installing a rhel7 guest using virtio_scsi)
Product:	Red Hat Enterprise Linux 7	Reporter:	ShupingCui <scui>
Component:	qemu-kvm	Assignee:	Hanna Czenczek <hreitz>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	7.0	CC:	areis, coli, fyang, hhuang, huding, jherrman, juzhang, knoel, kwolf, lmiksik, michen, mrezanin, pbonzini, rbalakri, scui, tdosek, virt-maint, xwei, ypu
Target Milestone:	rc	Keywords:	Regression, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	qemu-kvm-1.5.3-63.el7	Doc Type:	Bug Fix
Doc Text:	Previously, QEMU did not free pre-allocated zero clusters correctly and the clusters under some circumstances leaked. With this update, pre-allocated zero clusters are freed appropriately and the cluster leaks no longer occur.	Story Points:	---
Clone Of:
Clones:	1110188 (view as bug list)		Environment:
Last Closed:	2015-03-05 08:03:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	833649, 1076185, 1110188

Description ShupingCui 2014-01-13 10:07:54 UTC

Description of problem:
leaked clusters were found after install rhel7 guest using virtio_scsi, not found with virtio_blk.

Version-Release number of selected component (if applicable):
Host:
# uname -r
3.10.0-67.el7.x86_64
# rpm -qa | grep qemu-kvm
qemu-kvm-common-rhev-1.5.3-34.el7.x86_64
qemu-kvm-tools-1.5.3-34.el7.x86_64
qemu-kvm-rhev-1.5.3-34.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. install rhel7 guest
/usr/libexec/qemu-kvm \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M pc-q35-rhel7.0.0  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -device intel-hda,bus=pcie.0,addr=02 \
    -device hda-duplex  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140110-101500-bAb4rvXq,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140110-101500-bAb4rvXq,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pcie.0,addr=03  \
    -chardev socket,id=devvs,path=/tmp/virtio_port-vs-20140110-101500-bAb4rvXq,server,nowait \
    -device virtserialport,chardev=devvs,name=vs,id=vs,bus=virtio_serial_pci0.0  \
    -chardev socket,id=seabioslog_id_20140110-101500-bAb4rvXq,path=/tmp/seabios-20140110-101500-bAb4rvXq,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140110-101500-bAb4rvXq,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pcie.0,addr=04 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,addr=05 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.0-64-virtio.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:08:09:0a:0b:0c,id=idqiB5iJ,netdev=idlmunRA,bus=pcie.0,addr=06  \
    -netdev tap,id=idlmunRA,vhost=on,vhostfd=28,fd=27  \
    -m 2048  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'Penryn' \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,media=cdrom,file=/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/isos/linux/RHEL7.0-Server-x86_64.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
    -drive id=drive_fl,if=none,cache=none,snapshot=off,readonly=off,aio=native,file=/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/rhel70-64/ks.vfd \
    -global isa-fdc.driveA=drive_fl \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -kernel '/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/rhel70-64/vmlinuz'  \
    -append 'ks=hd:fd0:/ks.cfg nicdelay=60 console=ttyS0,115200 console=tty0'  \
    -initrd '/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/rhel70-64/initrd.img'  \
    -spice port=3000,password=123456,addr=0,tls-port=3200,x509-dir=/tmp/spice_x509d,tls-channel=main,tls-channel=inputs,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=d,menu=off  \
    -no-kvm-pit-reinjection \
    -no-shutdown \
    -enable-kvm
2. shutdown the guest when installation finished
3. check the guest image
# qemu-img check /usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.0-64-virtio.qcow2

Actual results:
Leaked cluster 8 refcount=1 reference=0
Leaked cluster 9 refcount=1 reference=0
Leaked cluster 10 refcount=1 reference=0
Leaked cluster 11 refcount=1 reference=0
Leaked cluster 12 refcount=1 reference=0
Leaked cluster 13 refcount=1 reference=0
Leaked cluster 14 refcount=1 reference=0
Leaked cluster 15 refcount=1 reference=0
Leaked cluster 16 refcount=1 reference=0
Leaked cluster 17 refcount=1 reference=0
Leaked cluster 18 refcount=1 reference=0
Leaked cluster 19 refcount=1 reference=0
Leaked cluster 20 refcount=1 reference=0
Leaked cluster 21 refcount=1 reference=0
Leaked cluster 22 refcount=1 reference=0
Leaked cluster 23 refcount=1 reference=0
Leaked cluster 315 refcount=1 reference=0

17 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
60799/327680 = 18.55% allocated, 22.30% fragmented, 0.00% compressed clusters
Image end offset: 3988389888


Expected results:
no leaked clusters found

Additional info:
[root@localhost ~]# qemu-img info /home/kvm_autotest_root/images/RHEL-Server-7.0-64-virtio.qcow2
image: /home/kvm_autotest_root/images/RHEL-Server-7.0-64-virtio.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 3.7G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false


guest can boot up successfully after installation finished.

Comment 2 xhan 2014-01-13 10:39:13 UTC

Met this same problem on host:

cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x19
cpu MHz		: 3842.265
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 6783.93
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 3 CongLi 2014-01-14 09:02:23 UTC

Hit this bug on version:
kernel-3.10.0-67.el7.x86_64
qemu-kvm-rhev-1.5.3-35.el7.x86_64

I have run a test loop about migration using autotest. 

1. At the beginning of the test, the images has no error.

2. After some cases running, met warning 'Leaked clusters were noticed during image check. No data integrity problem was found though.'.

3. Run more cases, the image is corrupted.
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR cluster 105675 refcount=0 reference=1
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR cluster 105676 refcount=0 reference=1
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR cluster 105677 refcount=0 reference=1
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR OFLAG_COPIED data cluster: l2_entry=8000000181b80000 refcount=0
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR OFLAG_COPIED data cluster: l2_entry=8000000181b90000 refcount=0
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR OFLAG_COPIED data cluster: l2_entry=8000000181ba0000 refcount=0
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR OFLAG_COPIED data cluster: l2_entry=8000000181bb0000 refcount=0

Thanks,
Cong

Comment 4 CongLi 2014-01-14 09:04:50 UTC

(In reply to CongLi from comment #3)
> Hit this bug on version:
> kernel-3.10.0-67.el7.x86_64
> qemu-kvm-rhev-1.5.3-35.el7.x86_64

the driveformat = virtio_blk instead of virtio_scsi.

CML:
/home/staf-kvm-devel/autotest-devel/client/tests/virt/qemu/qemu \
    -S  \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M pc  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140114-142327-dOX77U24,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140114-142327-dOX77U24,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20140114-142327-dOX77U24,path=/tmp/seabios-20140114-142327-dOX77U24,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140114-142327-dOX77U24,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/home/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.0-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
    -device virtio-net-pci,mac=9a:bd:be:bf:c0:c1,id=idqyFAse,netdev=idSIxMtI,bus=pci.0,addr=05  \
    -netdev tap,id=idSIxMtI,vhost=on,vhostfd=23,fd=22  \
    -m 2048  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2  \
    -cpu 'Opteron_G3',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=3000,password=123456,addr=0,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off  \
    -no-kvm-pit-reinjection \
    -enable-kvm

Comment 10 xhan 2014-01-26 08:34:15 UTC

Do not hit this bug on qemu-kvm-1.5.3-41.el7.x86_64.

Still exists on qemu-kvm-1.5.3-40.el7.x86_64.

Comment 11 Kevin Wolf 2014-01-27 12:09:41 UTC

The only commit between -40 and -41 that looks vaguely related is:

de979b1 scsi-disk: add UNMAP limits to block limits VPD page

The commit message says that this commit makes Linux send UNMAP commands (which
uses bdrv_discard) rather than WRITE SAME (which uses bdrv_write_zeroes). Perhaps
something is wrong in the qcow2 discard code.

Comment 12 Paolo Bonzini 2014-01-28 16:09:59 UTC

Please try reproducing on -40 with the additional command line option -global scsi-hd.discard_granularity=0 - thanks!

> Perhaps something is wrong in the qcow2 discard code.

Or write_zeroes.

Comment 17 Kevin Wolf 2014-03-06 16:23:59 UTC

Upstream commit 8f730dd2 fixes a cluster leak for overwriting preallocated zero
clusters. qcow2_co_write_zeroes() does create such clusters, so that might be
related to the originally reported leaks.

It still isn't a hint for the corruption case, though.

On the other hand, the report for the corrupted image in comment 3 involved
live migration, so we may be looking at two entirely different bugs here.

Comment 18 CongLi 2014-03-27 03:18:33 UTC

(In reply to Kevin Wolf from comment #17)
> Upstream commit 8f730dd2 fixes a cluster leak for overwriting preallocated
> zero
> clusters. qcow2_co_write_zeroes() does create such clusters, so that might be
> related to the originally reported leaks.
> 
> It still isn't a hint for the corruption case, though.
> 
> On the other hand, the report for the corrupted image in comment 3 involved
> live migration, so we may be looking at two entirely different bugs here.

Hi Kevin,

I have file a new bug according to comment 3:
Bug 1081326 - qcow2 image corrupted after a migration loop
https://bugzilla.redhat.com/show_bug.cgi?id=1081326

But I'm still reproducing it with the latedst build, will update the test result asap.

Thanks,
Cong

Comment 20 Ademar Reis 2014-04-28 21:40:39 UTC

(In reply to Kevin Wolf from comment #17)
> Upstream commit 8f730dd2 fixes a cluster leak for overwriting preallocated
> zero
> clusters. qcow2_co_write_zeroes() does create such clusters, so that might be
> related to the originally reported leaks.
> 

Max: please backport the commit above then.

> It still isn't a hint for the corruption case, though.
> 
> On the other hand, the report for the corrupted image in comment 3 involved
> live migration, so we may be looking at two entirely different bugs here.

Cong, you've opened Bug 1081326, which was closed as dupe of bug 1048575, VERIFIED already. Is that right?

If that's the case, the backport of the leaked clusters patch should be enough to close this BZ.

Comment 21 CongLi 2014-04-29 02:26:57 UTC

(In reply to Ademar Reis from comment #20)

> Cong, you've opened Bug 1081326, which was closed as dupe of bug 1048575,
> VERIFIED already. Is that right?
> 
> If that's the case, the backport of the leaked clusters patch should be
> enough to close this BZ.

Hi Ademar,

According to my test result, I think BZ1048575 has fixed this problem, have not met this bug again.

But as Kevin said (comment 17), they are maybe different bugs, I'm not sure about it.

Thanks,
Cong

Comment 22 CongLi 2014-04-29 02:38:11 UTC

(In reply to CongLi from comment #21)
> (In reply to Ademar Reis from comment #20)
> 
> > Cong, you've opened Bug 1081326, which was closed as dupe of bug 1048575,
> > VERIFIED already. Is that right?
> > 
> > If that's the case, the backport of the leaked clusters patch should be
> > enough to close this BZ.
> 
> Hi Ademar,
> 
> According to my test result, I think BZ1048575 has fixed this problem, have
> not met this bug again.
> 
> But as Kevin said (comment 17), they are maybe different bugs, I'm not sure
> about it.

I mean my problem in comment 3 has been fixed, but I'm not sure whether the origin problem (comment 0) has been fixed.

If the origin problem also has been fixed, I think it's enough to close it.

> Thanks,
> Cong

Comment 23 Kevin Wolf 2014-04-29 08:32:31 UTC

(In reply to CongLi from comment #22)
> I mean my problem in comment 3 has been fixed, but I'm not sure whether the
> origin problem (comment 0) has been fixed.
> 
> If the origin problem also has been fixed, I think it's enough to close it.

Comment 0 will be fixed when we backport the commit that Ademar mentioned.

Comment 26 Miroslav Rezanina 2014-06-17 12:40:22 UTC

Fix included in qemu-kvm-1.5.3-63.el7

Comment 30 huiqingding 2014-08-12 05:23:08 UTC

Test this bug on an amd host using the following version:
qemu-kvm-rhev-2.1.0-1.el7.x86_64
kernel-3.10.0-143.el7.x86_64

Steps to Test:
1. install a win7sp1-32 guest using virtio-scsi disk
# /usr/libexec/qemu-kvm \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M pc  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -device nec-usb-xhci,id=usb1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=win7sp1-32.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:08:09:0a:0b:0c,id=idqiB5iJ,netdev=idlmunRA  \
    -netdev tap,id=idlmunRA,vhost=on,script=/etc/qemu-ifup  \
    -m 2048  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'SandyBridge' \
    -device usb-tablet,id=usb-tablet1  \
    -vnc :10  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=d,menu=off  \
    -no-kvm-pit-reinjection \
    -enable-kvm \
    -monitor stdio \
    -drive file=/home/en_windows_7_ultimate_with_sp1_x86_dvd_u_677460.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw \
    -device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-0,id=ide0-1-0,bus=ide.0,unit=0 \
    -cdrom /home/driver.iso \
    -boot menu=on
2. shutdown the guest when installation finished
3. check the guest image
# qemu-img check win7sp1-32.qcow2 
No errors were found on the image.
84142/655360 = 12.84% allocated, 26.75% fragmented, 0.00% compressed clusters
Image end offset: 5515771904

Comment 34 errata-xmlrpc 2015-03-05 08:03:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0349.html