Bug 1142331

Summary:	qemu-img convert intermittently corrupts output images
Product:	Red Hat Enterprise Linux 7	Reporter:	Stefan Hajnoczi <stefanha>
Component:	qemu-kvm-rhev	Assignee:	Hanna Czenczek <hreitz>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.1	CC:	hhuang, hreitz, huding, juli, juzhang, knoel, kwolf, pbonzini, pbrady, sgordon, sluo, stefanha, tdosek, virt-maint, xfu
Target Milestone:	beta
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	qemu-kvm-rhev-2.1.2-11.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1160237 (view as bug list)		Environment:
Last Closed:	2015-03-05 09:55:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1160237

Description Stefan Hajnoczi 2014-09-16 15:12:01 UTC

See the Launchpad bug report:
https://bugs.launchpad.net/qemu/+bug/1368815

It seems FIEMAP is not working reliably and qemu-img convert is producing junk.

I have not tried to reproduce this bug so whether RHEL is affected or not is not yet clear.  Nevertheless I'm creating this BZ so someone in our team can investigate and this critical problem does not get forgotten.

Comment 2 Pádraig Brady 2014-10-30 16:08:32 UTC

Is this something we need an async release for?
Bumping up priority since this is a data corrupter.

My suggested fix is now upstream:
http://git.qemu.org/?p=qemu.git;a=commit;h=38c4d0ae
http://git.qemu.org/?p=qemu.git;a=commit;h=7c159037

Comment 3 Stephen Gordon 2014-10-30 17:00:39 UTC

From a RHEL-OSP point of view I would say yes.

Comment 4 Hanna Czenczek 2014-10-31 13:36:40 UTC

POST as of October 24.

Comment 6 Miroslav Rezanina 2014-11-21 09:46:05 UTC

Fix included in qemu-kvm-rhev-2.1.2-11.el7

Comment 8 Jun Li 2014-11-27 10:12:24 UTC

Hi Stefan,

  Could you give a method on how to reproduce this issue. QE just try as followings:

Version of components:
qemu-kvm-rhev-2.1.2-8.el7.x86_64

# cat test.sh 
#! /bin/sh
SRC_PATH=/mnt/RHEL-Server-7.0-64-virtio.qcow2
TMP_PATH=/mnt/test.qcow2
DST_PATH=/mnt/test.raw
QEMU_IMG_PATH=qemu-img

cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH $DST_PATH && cksum $DST_PATH

Steps:
1, mount an ext4 fs block device to /mnt.
2, sh test.sh

But after step2, can not reproduce this bz.

Could you give some suggestions? Thanks.


Best Regards,
Jun Li

Comment 9 Pádraig Brady 2014-11-27 15:48:40 UTC

This is awkward to reproduce. The main thing is the source file must be sparse.
The following case was known to trigger the issue on ext4 from linux 2.6 time at least, though I'm not seeing the issue on my 3.17 kernel here on ext4 or xfs.
Now the issue is dependent on the kernel generating unwritten extents,
so you may need cache pressure or other activity on the file system etc. to trigger this these days. The upstream openstack trigger was with other openstack services hitting the file system also.

  QEMU_IMG_PATH=qemu-img
  for i in $(seq 1 2 21); do
    for j in 1 2 31 100; do
      perl -e '$n = '$i' * 1024; *F = *STDOUT;' \
           -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
           -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > f1
      $QEMU_IMG_PATH convert -O raw f1 f2
      cmp f1 f2 || { echo "data loss i=$i j=$j" >&2; exit 1; }
    done
  done

Comment 10 Sibiao Luo 2014-11-28 08:56:46 UTC

(In reply to Pádraig Brady from comment #9)
> This is awkward to reproduce. The main thing is the source file must be
> sparse.
> The following case was known to trigger the issue on ext4 from linux 2.6
> time at least, though I'm not seeing the issue on my 3.17 kernel here on
> ext4 or xfs.
I also did not reproduce this issue, tried your script with cache_pressure disable.
# cat /proc/sys/vm/swappiness
10
# cat /proc/sys/vm/vfs_cache_pressure
100
# echo 0 > /proc/sys/vm/vfs_cache_pressure
# echo 0 > /proc/sys/vm/swappiness
# cat /proc/sys/vm/swappiness
0
# cat /proc/sys/vm/vfs_cache_pressure
0
> Now the issue is dependent on the kernel generating unwritten extents,
> so you may need cache pressure or other activity on the file system etc. to
> trigger this these days.
Could you show me details how to cache pressure or other activity on the file system or provide a method to verify this issue, thanks in advance.

> The upstream openstack trigger was with other
> openstack services hitting the file system also.
> 
>   QEMU_IMG_PATH=qemu-img
>   for i in $(seq 1 2 21); do
>     for j in 1 2 31 100; do
>       perl -e '$n = '$i' * 1024; *F = *STDOUT;' \
>            -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
>            -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > f1
>       $QEMU_IMG_PATH convert -O raw f1 f2
>       cmp f1 f2 || { echo "data loss i=$i j=$j" >&2; exit 1; }
>     done
>   done

Comment 11 Stefan Hajnoczi 2014-11-28 09:24:39 UTC

This one may be difficult to reproduce.  Please just verify that the patch is included in the RPM.

Comment 12 Sibiao Luo 2014-11-28 10:08:02 UTC

(In reply to Stefan Hajnoczi from comment #11)
> This one may be difficult to reproduce.  Please just verify that the patch
> is included in the RPM.

Thanks for your important infos.

Verify this issue on qemu-kvm-rhev-2.1.2-14.el7.x86_64.
host info:
# uname -r && rpm -q qemu-kvm-rhev
3.10.0-205.el7.x86_64
qemu-kvm-rhev-2.1.2-14.el7.x86_64

# rpm -ql --changelog qemu-kvm-rhev-2.1.2-14.el7.x86_64 | grep 1142331
- kvm-block-raw-posix-Fix-disk-corruption-in-try_fiemap.patch [bz#1142331]
- kvm-block-raw-posix-use-seek_hole-ahead-of-fiemap.patch [bz#1142331]
- kvm-raw-posix-Fix-raw_co_get_block_status-after-EOF.patch [bz#1142331]
- kvm-raw-posix-raw_co_get_block_status-return-value.patch [bz#1142331]
- kvm-raw-posix-SEEK_HOLE-suffices-get-rid-of-FIEMAP.patch [bz#1142331]
- kvm-raw-posix-The-SEEK_HOLE-code-is-flawed-rewrite-it.patch [bz#1142331]
- Resolves: bz#1142331

Base on above, the fixed patch has been included in the RPM build. Move to VERIFIED status, please correct me if any mistake, thanks.

Best Regards,
sluo

Comment 13 Sibiao Luo 2014-12-02 06:46:05 UTC

Append one question, why this bug fixed patch miss one, where is the kvm-block-raw-posix-Try-both-FIEMAP-and-SEEK_HOLE.patch ? Move back to ON_QA first.

Comment 14 Hanna Czenczek 2014-12-02 10:09:24 UTC

Hi Sluo,

this patch is missing from the RHEV backport because that commit was already included in the original upstream 2.1.2 (which qemu-kvm-rhev-2.1.2 is based on). Therefore, it was unnecessary.

Max

Comment 15 Sibiao Luo 2014-12-03 06:30:03 UTC

(In reply to Max Reitz from comment #14)
> Hi Sluo,
> 
> this patch is missing from the RHEV backport because that commit was already
> included in the original upstream 2.1.2 (which qemu-kvm-rhev-2.1.2 is based
> on). Therefore, it was unnecessary.
> 
> Max

OK, thanks for your kindly explains, continue to move to VERIFIED status, please correct me if any mistake, thanks.

Best Regards,
sluo

Comment 17 errata-xmlrpc 2015-03-05 09:55:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html