Red Hat Bugzilla – Bug 1142331
qemu-img convert intermittently corrupts output images
Last modified: 2015-03-18 05:10:18 EDT
See the Launchpad bug report: https://bugs.launchpad.net/qemu/+bug/1368815 It seems FIEMAP is not working reliably and qemu-img convert is producing junk. I have not tried to reproduce this bug so whether RHEL is affected or not is not yet clear. Nevertheless I'm creating this BZ so someone in our team can investigate and this critical problem does not get forgotten.
Is this something we need an async release for? Bumping up priority since this is a data corrupter. My suggested fix is now upstream: http://git.qemu.org/?p=qemu.git;a=commit;h=38c4d0ae http://git.qemu.org/?p=qemu.git;a=commit;h=7c159037
From a RHEL-OSP point of view I would say yes.
POST as of October 24.
Fix included in qemu-kvm-rhev-2.1.2-11.el7
Hi Stefan, Could you give a method on how to reproduce this issue. QE just try as followings: Version of components: qemu-kvm-rhev-2.1.2-8.el7.x86_64 # cat test.sh #! /bin/sh SRC_PATH=/mnt/RHEL-Server-7.0-64-virtio.qcow2 TMP_PATH=/mnt/test.qcow2 DST_PATH=/mnt/test.raw QEMU_IMG_PATH=qemu-img cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH $DST_PATH && cksum $DST_PATH Steps: 1, mount an ext4 fs block device to /mnt. 2, sh test.sh But after step2, can not reproduce this bz. Could you give some suggestions? Thanks. Best Regards, Jun Li
This is awkward to reproduce. The main thing is the source file must be sparse. The following case was known to trigger the issue on ext4 from linux 2.6 time at least, though I'm not seeing the issue on my 3.17 kernel here on ext4 or xfs. Now the issue is dependent on the kernel generating unwritten extents, so you may need cache pressure or other activity on the file system etc. to trigger this these days. The upstream openstack trigger was with other openstack services hitting the file system also. QEMU_IMG_PATH=qemu-img for i in $(seq 1 2 21); do for j in 1 2 31 100; do perl -e '$n = '$i' * 1024; *F = *STDOUT;' \ -e 'for (1..'$j') { sysseek (*F, $n, 1)' \ -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > f1 $QEMU_IMG_PATH convert -O raw f1 f2 cmp f1 f2 || { echo "data loss i=$i j=$j" >&2; exit 1; } done done
(In reply to Pádraig Brady from comment #9) > This is awkward to reproduce. The main thing is the source file must be > sparse. > The following case was known to trigger the issue on ext4 from linux 2.6 > time at least, though I'm not seeing the issue on my 3.17 kernel here on > ext4 or xfs. I also did not reproduce this issue, tried your script with cache_pressure disable. # cat /proc/sys/vm/swappiness 10 # cat /proc/sys/vm/vfs_cache_pressure 100 # echo 0 > /proc/sys/vm/vfs_cache_pressure # echo 0 > /proc/sys/vm/swappiness # cat /proc/sys/vm/swappiness 0 # cat /proc/sys/vm/vfs_cache_pressure 0 > Now the issue is dependent on the kernel generating unwritten extents, > so you may need cache pressure or other activity on the file system etc. to > trigger this these days. Could you show me details how to cache pressure or other activity on the file system or provide a method to verify this issue, thanks in advance. > The upstream openstack trigger was with other > openstack services hitting the file system also. > > QEMU_IMG_PATH=qemu-img > for i in $(seq 1 2 21); do > for j in 1 2 31 100; do > perl -e '$n = '$i' * 1024; *F = *STDOUT;' \ > -e 'for (1..'$j') { sysseek (*F, $n, 1)' \ > -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > f1 > $QEMU_IMG_PATH convert -O raw f1 f2 > cmp f1 f2 || { echo "data loss i=$i j=$j" >&2; exit 1; } > done > done
This one may be difficult to reproduce. Please just verify that the patch is included in the RPM.
(In reply to Stefan Hajnoczi from comment #11) > This one may be difficult to reproduce. Please just verify that the patch > is included in the RPM. Thanks for your important infos. Verify this issue on qemu-kvm-rhev-2.1.2-14.el7.x86_64. host info: # uname -r && rpm -q qemu-kvm-rhev 3.10.0-205.el7.x86_64 qemu-kvm-rhev-2.1.2-14.el7.x86_64 # rpm -ql --changelog qemu-kvm-rhev-2.1.2-14.el7.x86_64 | grep 1142331 - kvm-block-raw-posix-Fix-disk-corruption-in-try_fiemap.patch [bz#1142331] - kvm-block-raw-posix-use-seek_hole-ahead-of-fiemap.patch [bz#1142331] - kvm-raw-posix-Fix-raw_co_get_block_status-after-EOF.patch [bz#1142331] - kvm-raw-posix-raw_co_get_block_status-return-value.patch [bz#1142331] - kvm-raw-posix-SEEK_HOLE-suffices-get-rid-of-FIEMAP.patch [bz#1142331] - kvm-raw-posix-The-SEEK_HOLE-code-is-flawed-rewrite-it.patch [bz#1142331] - Resolves: bz#1142331 Base on above, the fixed patch has been included in the RPM build. Move to VERIFIED status, please correct me if any mistake, thanks. Best Regards, sluo
Append one question, why this bug fixed patch miss one, where is the kvm-block-raw-posix-Try-both-FIEMAP-and-SEEK_HOLE.patch ? Move back to ON_QA first.
Hi Sluo, this patch is missing from the RHEV backport because that commit was already included in the original upstream 2.1.2 (which qemu-kvm-rhev-2.1.2 is based on). Therefore, it was unnecessary. Max
(In reply to Max Reitz from comment #14) > Hi Sluo, > > this patch is missing from the RHEV backport because that commit was already > included in the original upstream 2.1.2 (which qemu-kvm-rhev-2.1.2 is based > on). Therefore, it was unnecessary. > > Max OK, thanks for your kindly explains, continue to move to VERIFIED status, please correct me if any mistake, thanks. Best Regards, sluo
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0624.html