Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1142331 - qemu-img convert intermittently corrupts output images
qemu-img convert intermittently corrupts output images
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.1
Unspecified Unspecified
high Severity high
: beta
: ---
Assigned To: Max Reitz
Virtualization Bugs
:
Depends On:
Blocks: 1160237
  Show dependency treegraph
 
Reported: 2014-09-16 11:12 EDT by Stefan Hajnoczi
Modified: 2015-03-18 05:10 EDT (History)
15 users (show)

See Also:
Fixed In Version: qemu-kvm-rhev-2.1.2-11.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1160237 (view as bug list)
Environment:
Last Closed: 2015-03-05 04:55:36 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1368815 None None None Never
Red Hat Product Errata RHSA-2015:0624 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2015-03-05 09:37:36 EST

  None (edit)
Description Stefan Hajnoczi 2014-09-16 11:12:01 EDT
See the Launchpad bug report:
https://bugs.launchpad.net/qemu/+bug/1368815

It seems FIEMAP is not working reliably and qemu-img convert is producing junk.

I have not tried to reproduce this bug so whether RHEL is affected or not is not yet clear.  Nevertheless I'm creating this BZ so someone in our team can investigate and this critical problem does not get forgotten.
Comment 2 Pádraig Brady 2014-10-30 12:08:32 EDT
Is this something we need an async release for?
Bumping up priority since this is a data corrupter.

My suggested fix is now upstream:
http://git.qemu.org/?p=qemu.git;a=commit;h=38c4d0ae
http://git.qemu.org/?p=qemu.git;a=commit;h=7c159037
Comment 3 Stephen Gordon 2014-10-30 13:00:39 EDT
From a RHEL-OSP point of view I would say yes.
Comment 4 Max Reitz 2014-10-31 09:36:40 EDT
POST as of October 24.
Comment 6 Miroslav Rezanina 2014-11-21 04:46:05 EST
Fix included in qemu-kvm-rhev-2.1.2-11.el7
Comment 8 Jun Li 2014-11-27 05:12:24 EST
Hi Stefan,

  Could you give a method on how to reproduce this issue. QE just try as followings:

Version of components:
qemu-kvm-rhev-2.1.2-8.el7.x86_64

# cat test.sh 
#! /bin/sh
SRC_PATH=/mnt/RHEL-Server-7.0-64-virtio.qcow2
TMP_PATH=/mnt/test.qcow2
DST_PATH=/mnt/test.raw
QEMU_IMG_PATH=qemu-img

cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH $DST_PATH && cksum $DST_PATH

Steps:
1, mount an ext4 fs block device to /mnt.
2, sh test.sh

But after step2, can not reproduce this bz.

Could you give some suggestions? Thanks.


Best Regards,
Jun Li
Comment 9 Pádraig Brady 2014-11-27 10:48:40 EST
This is awkward to reproduce. The main thing is the source file must be sparse.
The following case was known to trigger the issue on ext4 from linux 2.6 time at least, though I'm not seeing the issue on my 3.17 kernel here on ext4 or xfs.
Now the issue is dependent on the kernel generating unwritten extents,
so you may need cache pressure or other activity on the file system etc. to trigger this these days. The upstream openstack trigger was with other openstack services hitting the file system also.

  QEMU_IMG_PATH=qemu-img
  for i in $(seq 1 2 21); do
    for j in 1 2 31 100; do
      perl -e '$n = '$i' * 1024; *F = *STDOUT;' \
           -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
           -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > f1
      $QEMU_IMG_PATH convert -O raw f1 f2
      cmp f1 f2 || { echo "data loss i=$i j=$j" >&2; exit 1; }
    done
  done
Comment 10 Sibiao Luo 2014-11-28 03:56:46 EST
(In reply to Pádraig Brady from comment #9)
> This is awkward to reproduce. The main thing is the source file must be
> sparse.
> The following case was known to trigger the issue on ext4 from linux 2.6
> time at least, though I'm not seeing the issue on my 3.17 kernel here on
> ext4 or xfs.
I also did not reproduce this issue, tried your script with cache_pressure disable.
# cat /proc/sys/vm/swappiness
10
# cat /proc/sys/vm/vfs_cache_pressure
100
# echo 0 > /proc/sys/vm/vfs_cache_pressure
# echo 0 > /proc/sys/vm/swappiness
# cat /proc/sys/vm/swappiness
0
# cat /proc/sys/vm/vfs_cache_pressure
0
> Now the issue is dependent on the kernel generating unwritten extents,
> so you may need cache pressure or other activity on the file system etc. to
> trigger this these days.
Could you show me details how to cache pressure or other activity on the file system or provide a method to verify this issue, thanks in advance.

> The upstream openstack trigger was with other
> openstack services hitting the file system also.
> 
>   QEMU_IMG_PATH=qemu-img
>   for i in $(seq 1 2 21); do
>     for j in 1 2 31 100; do
>       perl -e '$n = '$i' * 1024; *F = *STDOUT;' \
>            -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
>            -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > f1
>       $QEMU_IMG_PATH convert -O raw f1 f2
>       cmp f1 f2 || { echo "data loss i=$i j=$j" >&2; exit 1; }
>     done
>   done
Comment 11 Stefan Hajnoczi 2014-11-28 04:24:39 EST
This one may be difficult to reproduce.  Please just verify that the patch is included in the RPM.
Comment 12 Sibiao Luo 2014-11-28 05:08:02 EST
(In reply to Stefan Hajnoczi from comment #11)
> This one may be difficult to reproduce.  Please just verify that the patch
> is included in the RPM.

Thanks for your important infos.

Verify this issue on qemu-kvm-rhev-2.1.2-14.el7.x86_64.
host info:
# uname -r && rpm -q qemu-kvm-rhev
3.10.0-205.el7.x86_64
qemu-kvm-rhev-2.1.2-14.el7.x86_64

# rpm -ql --changelog qemu-kvm-rhev-2.1.2-14.el7.x86_64 | grep 1142331
- kvm-block-raw-posix-Fix-disk-corruption-in-try_fiemap.patch [bz#1142331]
- kvm-block-raw-posix-use-seek_hole-ahead-of-fiemap.patch [bz#1142331]
- kvm-raw-posix-Fix-raw_co_get_block_status-after-EOF.patch [bz#1142331]
- kvm-raw-posix-raw_co_get_block_status-return-value.patch [bz#1142331]
- kvm-raw-posix-SEEK_HOLE-suffices-get-rid-of-FIEMAP.patch [bz#1142331]
- kvm-raw-posix-The-SEEK_HOLE-code-is-flawed-rewrite-it.patch [bz#1142331]
- Resolves: bz#1142331

Base on above, the fixed patch has been included in the RPM build. Move to VERIFIED status, please correct me if any mistake, thanks.

Best Regards,
sluo
Comment 13 Sibiao Luo 2014-12-02 01:46:05 EST
Append one question, why this bug fixed patch miss one, where is the kvm-block-raw-posix-Try-both-FIEMAP-and-SEEK_HOLE.patch ? Move back to ON_QA first.
Comment 14 Max Reitz 2014-12-02 05:09:24 EST
Hi Sluo,

this patch is missing from the RHEV backport because that commit was already included in the original upstream 2.1.2 (which qemu-kvm-rhev-2.1.2 is based on). Therefore, it was unnecessary.

Max
Comment 15 Sibiao Luo 2014-12-03 01:30:03 EST
(In reply to Max Reitz from comment #14)
> Hi Sluo,
> 
> this patch is missing from the RHEV backport because that commit was already
> included in the original upstream 2.1.2 (which qemu-kvm-rhev-2.1.2 is based
> on). Therefore, it was unnecessary.
> 
> Max

OK, thanks for your kindly explains, continue to move to VERIFIED status, please correct me if any mistake, thanks.

Best Regards,
sluo
Comment 17 errata-xmlrpc 2015-03-05 04:55:36 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html

Note You need to log in before you can comment on or make changes to this bug.