Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1052093 - qcow2 corruptions (leaked clusters after installing a rhel7 guest using virtio_scsi)
qcow2 corruptions (leaked clusters after installing a rhel7 guest using virti...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
Unspecified Unspecified
urgent Severity high
: rc
: ---
Assigned To: Max Reitz
Virtualization Bugs
: Regression, ZStream
Depends On:
Blocks: 833649 RHEL7.0Virt-PostBeta(z-stream) 1110188
  Show dependency treegraph
 
Reported: 2014-01-13 05:07 EST by ShupingCui
Modified: 2015-03-05 03:03 EST (History)
19 users (show)

See Also:
Fixed In Version: qemu-kvm-1.5.3-63.el7
Doc Type: Bug Fix
Doc Text:
Previously, QEMU did not free pre-allocated zero clusters correctly and the clusters under some circumstances leaked. With this update, pre-allocated zero clusters are freed appropriately and the cluster leaks no longer occur.
Story Points: ---
Clone Of:
: 1110188 (view as bug list)
Environment:
Last Closed: 2015-03-05 03:03:33 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0349 normal SHIPPED_LIVE Important: qemu-kvm security, bug fix, and enhancement update 2015-03-05 07:27:34 EST

  None (edit)
Description ShupingCui 2014-01-13 05:07:54 EST
Description of problem:
leaked clusters were found after install rhel7 guest using virtio_scsi, not found with virtio_blk.

Version-Release number of selected component (if applicable):
Host:
# uname -r
3.10.0-67.el7.x86_64
# rpm -qa | grep qemu-kvm
qemu-kvm-common-rhev-1.5.3-34.el7.x86_64
qemu-kvm-tools-1.5.3-34.el7.x86_64
qemu-kvm-rhev-1.5.3-34.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. install rhel7 guest
/usr/libexec/qemu-kvm \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M pc-q35-rhel7.0.0  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -device intel-hda,bus=pcie.0,addr=02 \
    -device hda-duplex  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140110-101500-bAb4rvXq,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140110-101500-bAb4rvXq,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pcie.0,addr=03  \
    -chardev socket,id=devvs,path=/tmp/virtio_port-vs-20140110-101500-bAb4rvXq,server,nowait \
    -device virtserialport,chardev=devvs,name=vs,id=vs,bus=virtio_serial_pci0.0  \
    -chardev socket,id=seabioslog_id_20140110-101500-bAb4rvXq,path=/tmp/seabios-20140110-101500-bAb4rvXq,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140110-101500-bAb4rvXq,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pcie.0,addr=04 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,addr=05 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.0-64-virtio.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:08:09:0a:0b:0c,id=idqiB5iJ,netdev=idlmunRA,bus=pcie.0,addr=06  \
    -netdev tap,id=idlmunRA,vhost=on,vhostfd=28,fd=27  \
    -m 2048  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'Penryn' \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,media=cdrom,file=/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/isos/linux/RHEL7.0-Server-x86_64.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
    -drive id=drive_fl,if=none,cache=none,snapshot=off,readonly=off,aio=native,file=/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/rhel70-64/ks.vfd \
    -global isa-fdc.driveA=drive_fl \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -kernel '/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/rhel70-64/vmlinuz'  \
    -append 'ks=hd:fd0:/ks.cfg nicdelay=60 console=ttyS0,115200 console=tty0'  \
    -initrd '/usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/rhel70-64/initrd.img'  \
    -spice port=3000,password=123456,addr=0,tls-port=3200,x509-dir=/tmp/spice_x509d,tls-channel=main,tls-channel=inputs,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=d,menu=off  \
    -no-kvm-pit-reinjection \
    -no-shutdown \
    -enable-kvm
2. shutdown the guest when installation finished
3. check the guest image
# qemu-img check /usr/local/staf/test/RHEV/kvm/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.0-64-virtio.qcow2

Actual results:
Leaked cluster 8 refcount=1 reference=0
Leaked cluster 9 refcount=1 reference=0
Leaked cluster 10 refcount=1 reference=0
Leaked cluster 11 refcount=1 reference=0
Leaked cluster 12 refcount=1 reference=0
Leaked cluster 13 refcount=1 reference=0
Leaked cluster 14 refcount=1 reference=0
Leaked cluster 15 refcount=1 reference=0
Leaked cluster 16 refcount=1 reference=0
Leaked cluster 17 refcount=1 reference=0
Leaked cluster 18 refcount=1 reference=0
Leaked cluster 19 refcount=1 reference=0
Leaked cluster 20 refcount=1 reference=0
Leaked cluster 21 refcount=1 reference=0
Leaked cluster 22 refcount=1 reference=0
Leaked cluster 23 refcount=1 reference=0
Leaked cluster 315 refcount=1 reference=0

17 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
60799/327680 = 18.55% allocated, 22.30% fragmented, 0.00% compressed clusters
Image end offset: 3988389888


Expected results:
no leaked clusters found

Additional info:
[root@localhost ~]# qemu-img info /home/kvm_autotest_root/images/RHEL-Server-7.0-64-virtio.qcow2
image: /home/kvm_autotest_root/images/RHEL-Server-7.0-64-virtio.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 3.7G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false


guest can boot up successfully after installation finished.
Comment 2 xhan 2014-01-13 05:39:13 EST
Met this same problem on host:

cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x19
cpu MHz		: 3842.265
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 6783.93
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:
Comment 3 CongLi 2014-01-14 04:02:23 EST
Hit this bug on version:
kernel-3.10.0-67.el7.x86_64
qemu-kvm-rhev-1.5.3-35.el7.x86_64

I have run a test loop about migration using autotest. 

1. At the beginning of the test, the images has no error.

2. After some cases running, met warning 'Leaked clusters were noticed during image check. No data integrity problem was found though.'.

3. Run more cases, the image is corrupted.
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR cluster 105675 refcount=0 reference=1
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR cluster 105676 refcount=0 reference=1
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR cluster 105677 refcount=0 reference=1
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR OFLAG_COPIED data cluster: l2_entry=8000000181b80000 refcount=0
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR OFLAG_COPIED data cluster: l2_entry=8000000181b90000 refcount=0
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR OFLAG_COPIED data cluster: l2_entry=8000000181ba0000 refcount=0
01/14 14:23:37 ERROR|qemu_stora:0431| [stderr] ERROR OFLAG_COPIED data cluster: l2_entry=8000000181bb0000 refcount=0

Thanks,
Cong
Comment 4 CongLi 2014-01-14 04:04:50 EST
(In reply to CongLi from comment #3)
> Hit this bug on version:
> kernel-3.10.0-67.el7.x86_64
> qemu-kvm-rhev-1.5.3-35.el7.x86_64

the driveformat = virtio_blk instead of virtio_scsi.

CML:
/home/staf-kvm-devel/autotest-devel/client/tests/virt/qemu/qemu \
    -S  \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M pc  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140114-142327-dOX77U24,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140114-142327-dOX77U24,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20140114-142327-dOX77U24,path=/tmp/seabios-20140114-142327-dOX77U24,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140114-142327-dOX77U24,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/home/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.0-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
    -device virtio-net-pci,mac=9a:bd:be:bf:c0:c1,id=idqyFAse,netdev=idSIxMtI,bus=pci.0,addr=05  \
    -netdev tap,id=idSIxMtI,vhost=on,vhostfd=23,fd=22  \
    -m 2048  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2  \
    -cpu 'Opteron_G3',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=3000,password=123456,addr=0,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off  \
    -no-kvm-pit-reinjection \
    -enable-kvm
Comment 10 xhan 2014-01-26 03:34:15 EST
Do not hit this bug on qemu-kvm-1.5.3-41.el7.x86_64.

Still exists on qemu-kvm-1.5.3-40.el7.x86_64.
Comment 11 Kevin Wolf 2014-01-27 07:09:41 EST
The only commit between -40 and -41 that looks vaguely related is:

de979b1 scsi-disk: add UNMAP limits to block limits VPD page

The commit message says that this commit makes Linux send UNMAP commands (which
uses bdrv_discard) rather than WRITE SAME (which uses bdrv_write_zeroes). Perhaps
something is wrong in the qcow2 discard code.
Comment 12 Paolo Bonzini 2014-01-28 11:09:59 EST
Please try reproducing on -40 with the additional command line option -global scsi-hd.discard_granularity=0 - thanks!

> Perhaps something is wrong in the qcow2 discard code.

Or write_zeroes.
Comment 17 Kevin Wolf 2014-03-06 11:23:59 EST
Upstream commit 8f730dd2 fixes a cluster leak for overwriting preallocated zero
clusters. qcow2_co_write_zeroes() does create such clusters, so that might be
related to the originally reported leaks.

It still isn't a hint for the corruption case, though.

On the other hand, the report for the corrupted image in comment 3 involved
live migration, so we may be looking at two entirely different bugs here.
Comment 18 CongLi 2014-03-26 23:18:33 EDT
(In reply to Kevin Wolf from comment #17)
> Upstream commit 8f730dd2 fixes a cluster leak for overwriting preallocated
> zero
> clusters. qcow2_co_write_zeroes() does create such clusters, so that might be
> related to the originally reported leaks.
> 
> It still isn't a hint for the corruption case, though.
> 
> On the other hand, the report for the corrupted image in comment 3 involved
> live migration, so we may be looking at two entirely different bugs here.

Hi Kevin,

I have file a new bug according to comment 3:
Bug 1081326 - qcow2 image corrupted after a migration loop
https://bugzilla.redhat.com/show_bug.cgi?id=1081326

But I'm still reproducing it with the latedst build, will update the test result asap.

Thanks,
Cong
Comment 20 Ademar Reis 2014-04-28 17:40:39 EDT
(In reply to Kevin Wolf from comment #17)
> Upstream commit 8f730dd2 fixes a cluster leak for overwriting preallocated
> zero
> clusters. qcow2_co_write_zeroes() does create such clusters, so that might be
> related to the originally reported leaks.
> 

Max: please backport the commit above then.

> It still isn't a hint for the corruption case, though.
> 
> On the other hand, the report for the corrupted image in comment 3 involved
> live migration, so we may be looking at two entirely different bugs here.

Cong, you've opened Bug 1081326, which was closed as dupe of bug 1048575, VERIFIED already. Is that right?

If that's the case, the backport of the leaked clusters patch should be enough to close this BZ.
Comment 21 CongLi 2014-04-28 22:26:57 EDT
(In reply to Ademar Reis from comment #20)

> Cong, you've opened Bug 1081326, which was closed as dupe of bug 1048575,
> VERIFIED already. Is that right?
> 
> If that's the case, the backport of the leaked clusters patch should be
> enough to close this BZ.

Hi Ademar,

According to my test result, I think BZ1048575 has fixed this problem, have not met this bug again.

But as Kevin said (comment 17), they are maybe different bugs, I'm not sure about it.

Thanks,
Cong
Comment 22 CongLi 2014-04-28 22:38:11 EDT
(In reply to CongLi from comment #21)
> (In reply to Ademar Reis from comment #20)
> 
> > Cong, you've opened Bug 1081326, which was closed as dupe of bug 1048575,
> > VERIFIED already. Is that right?
> > 
> > If that's the case, the backport of the leaked clusters patch should be
> > enough to close this BZ.
> 
> Hi Ademar,
> 
> According to my test result, I think BZ1048575 has fixed this problem, have
> not met this bug again.
> 
> But as Kevin said (comment 17), they are maybe different bugs, I'm not sure
> about it.

I mean my problem in comment 3 has been fixed, but I'm not sure whether the origin problem (comment 0) has been fixed.

If the origin problem also has been fixed, I think it's enough to close it.

> Thanks,
> Cong
Comment 23 Kevin Wolf 2014-04-29 04:32:31 EDT
(In reply to CongLi from comment #22)
> I mean my problem in comment 3 has been fixed, but I'm not sure whether the
> origin problem (comment 0) has been fixed.
> 
> If the origin problem also has been fixed, I think it's enough to close it.

Comment 0 will be fixed when we backport the commit that Ademar mentioned.
Comment 26 Miroslav Rezanina 2014-06-17 08:40:22 EDT
Fix included in qemu-kvm-1.5.3-63.el7
Comment 30 huiqingding 2014-08-12 01:23:08 EDT
Test this bug on an amd host using the following version:
qemu-kvm-rhev-2.1.0-1.el7.x86_64
kernel-3.10.0-143.el7.x86_64

Steps to Test:
1. install a win7sp1-32 guest using virtio-scsi disk
# /usr/libexec/qemu-kvm \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M pc  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -device nec-usb-xhci,id=usb1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=win7sp1-32.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:08:09:0a:0b:0c,id=idqiB5iJ,netdev=idlmunRA  \
    -netdev tap,id=idlmunRA,vhost=on,script=/etc/qemu-ifup  \
    -m 2048  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'SandyBridge' \
    -device usb-tablet,id=usb-tablet1  \
    -vnc :10  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=d,menu=off  \
    -no-kvm-pit-reinjection \
    -enable-kvm \
    -monitor stdio \
    -drive file=/home/en_windows_7_ultimate_with_sp1_x86_dvd_u_677460.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw \
    -device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-0,id=ide0-1-0,bus=ide.0,unit=0 \
    -cdrom /home/driver.iso \
    -boot menu=on
2. shutdown the guest when installation finished
3. check the guest image
# qemu-img check win7sp1-32.qcow2 
No errors were found on the image.
84142/655360 = 12.84% allocated, 26.75% fragmented, 0.00% compressed clusters
Image end offset: 5515771904
Comment 34 errata-xmlrpc 2015-03-05 03:03:33 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0349.html

Note You need to log in before you can comment on or make changes to this bug.