Bug 1142331 - qemu-img convert intermittently corrupts output images
Summary: qemu-img convert intermittently corrupts output images
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: ---
Assignee: Max Reitz
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1160237
TreeView+ depends on / blocked
 
Reported: 2014-09-16 15:12 UTC by Stefan Hajnoczi
Modified: 2015-03-18 09:10 UTC (History)
15 users (show)

Fixed In Version: qemu-kvm-rhev-2.1.2-11.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1160237 (view as bug list)
Environment:
Last Closed: 2015-03-05 09:55:36 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0624 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2015-03-05 14:37:36 UTC
Launchpad 1368815 None None None Never

Internal Links: 1167249

Description Stefan Hajnoczi 2014-09-16 15:12:01 UTC
See the Launchpad bug report:
https://bugs.launchpad.net/qemu/+bug/1368815

It seems FIEMAP is not working reliably and qemu-img convert is producing junk.

I have not tried to reproduce this bug so whether RHEL is affected or not is not yet clear.  Nevertheless I'm creating this BZ so someone in our team can investigate and this critical problem does not get forgotten.

Comment 2 Pádraig Brady 2014-10-30 16:08:32 UTC
Is this something we need an async release for?
Bumping up priority since this is a data corrupter.

My suggested fix is now upstream:
http://git.qemu.org/?p=qemu.git;a=commit;h=38c4d0ae
http://git.qemu.org/?p=qemu.git;a=commit;h=7c159037

Comment 3 Stephen Gordon 2014-10-30 17:00:39 UTC
From a RHEL-OSP point of view I would say yes.

Comment 4 Max Reitz 2014-10-31 13:36:40 UTC
POST as of October 24.

Comment 6 Miroslav Rezanina 2014-11-21 09:46:05 UTC
Fix included in qemu-kvm-rhev-2.1.2-11.el7

Comment 8 Jun Li 2014-11-27 10:12:24 UTC
Hi Stefan,

  Could you give a method on how to reproduce this issue. QE just try as followings:

Version of components:
qemu-kvm-rhev-2.1.2-8.el7.x86_64

# cat test.sh 
#! /bin/sh
SRC_PATH=/mnt/RHEL-Server-7.0-64-virtio.qcow2
TMP_PATH=/mnt/test.qcow2
DST_PATH=/mnt/test.raw
QEMU_IMG_PATH=qemu-img

cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH $DST_PATH && cksum $DST_PATH

Steps:
1, mount an ext4 fs block device to /mnt.
2, sh test.sh

But after step2, can not reproduce this bz.

Could you give some suggestions? Thanks.


Best Regards,
Jun Li

Comment 9 Pádraig Brady 2014-11-27 15:48:40 UTC
This is awkward to reproduce. The main thing is the source file must be sparse.
The following case was known to trigger the issue on ext4 from linux 2.6 time at least, though I'm not seeing the issue on my 3.17 kernel here on ext4 or xfs.
Now the issue is dependent on the kernel generating unwritten extents,
so you may need cache pressure or other activity on the file system etc. to trigger this these days. The upstream openstack trigger was with other openstack services hitting the file system also.

  QEMU_IMG_PATH=qemu-img
  for i in $(seq 1 2 21); do
    for j in 1 2 31 100; do
      perl -e '$n = '$i' * 1024; *F = *STDOUT;' \
           -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
           -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > f1
      $QEMU_IMG_PATH convert -O raw f1 f2
      cmp f1 f2 || { echo "data loss i=$i j=$j" >&2; exit 1; }
    done
  done

Comment 10 Sibiao Luo 2014-11-28 08:56:46 UTC
(In reply to Pádraig Brady from comment #9)
> This is awkward to reproduce. The main thing is the source file must be
> sparse.
> The following case was known to trigger the issue on ext4 from linux 2.6
> time at least, though I'm not seeing the issue on my 3.17 kernel here on
> ext4 or xfs.
I also did not reproduce this issue, tried your script with cache_pressure disable.
# cat /proc/sys/vm/swappiness
10
# cat /proc/sys/vm/vfs_cache_pressure
100
# echo 0 > /proc/sys/vm/vfs_cache_pressure
# echo 0 > /proc/sys/vm/swappiness
# cat /proc/sys/vm/swappiness
0
# cat /proc/sys/vm/vfs_cache_pressure
0
> Now the issue is dependent on the kernel generating unwritten extents,
> so you may need cache pressure or other activity on the file system etc. to
> trigger this these days.
Could you show me details how to cache pressure or other activity on the file system or provide a method to verify this issue, thanks in advance.

> The upstream openstack trigger was with other
> openstack services hitting the file system also.
> 
>   QEMU_IMG_PATH=qemu-img
>   for i in $(seq 1 2 21); do
>     for j in 1 2 31 100; do
>       perl -e '$n = '$i' * 1024; *F = *STDOUT;' \
>            -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
>            -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > f1
>       $QEMU_IMG_PATH convert -O raw f1 f2
>       cmp f1 f2 || { echo "data loss i=$i j=$j" >&2; exit 1; }
>     done
>   done

Comment 11 Stefan Hajnoczi 2014-11-28 09:24:39 UTC
This one may be difficult to reproduce.  Please just verify that the patch is included in the RPM.

Comment 12 Sibiao Luo 2014-11-28 10:08:02 UTC
(In reply to Stefan Hajnoczi from comment #11)
> This one may be difficult to reproduce.  Please just verify that the patch
> is included in the RPM.

Thanks for your important infos.

Verify this issue on qemu-kvm-rhev-2.1.2-14.el7.x86_64.
host info:
# uname -r && rpm -q qemu-kvm-rhev
3.10.0-205.el7.x86_64
qemu-kvm-rhev-2.1.2-14.el7.x86_64

# rpm -ql --changelog qemu-kvm-rhev-2.1.2-14.el7.x86_64 | grep 1142331
- kvm-block-raw-posix-Fix-disk-corruption-in-try_fiemap.patch [bz#1142331]
- kvm-block-raw-posix-use-seek_hole-ahead-of-fiemap.patch [bz#1142331]
- kvm-raw-posix-Fix-raw_co_get_block_status-after-EOF.patch [bz#1142331]
- kvm-raw-posix-raw_co_get_block_status-return-value.patch [bz#1142331]
- kvm-raw-posix-SEEK_HOLE-suffices-get-rid-of-FIEMAP.patch [bz#1142331]
- kvm-raw-posix-The-SEEK_HOLE-code-is-flawed-rewrite-it.patch [bz#1142331]
- Resolves: bz#1142331

Base on above, the fixed patch has been included in the RPM build. Move to VERIFIED status, please correct me if any mistake, thanks.

Best Regards,
sluo

Comment 13 Sibiao Luo 2014-12-02 06:46:05 UTC
Append one question, why this bug fixed patch miss one, where is the kvm-block-raw-posix-Try-both-FIEMAP-and-SEEK_HOLE.patch ? Move back to ON_QA first.

Comment 14 Max Reitz 2014-12-02 10:09:24 UTC
Hi Sluo,

this patch is missing from the RHEV backport because that commit was already included in the original upstream 2.1.2 (which qemu-kvm-rhev-2.1.2 is based on). Therefore, it was unnecessary.

Max

Comment 15 Sibiao Luo 2014-12-03 06:30:03 UTC
(In reply to Max Reitz from comment #14)
> Hi Sluo,
> 
> this patch is missing from the RHEV backport because that commit was already
> included in the original upstream 2.1.2 (which qemu-kvm-rhev-2.1.2 is based
> on). Therefore, it was unnecessary.
> 
> Max

OK, thanks for your kindly explains, continue to move to VERIFIED status, please correct me if any mistake, thanks.

Best Regards,
sluo

Comment 17 errata-xmlrpc 2015-03-05 09:55:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html


Note You need to log in before you can comment on or make changes to this bug.