Bug 920020 - qemu-img delete snapshot causes corruption under high IO load
Summary: qemu-img delete snapshot causes corruption under high IO load
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.4
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 810856
TreeView+ depends on / blocked
 
Reported: 2013-03-11 07:47 UTC by Roman Hodain
Modified: 2018-11-30 19:24 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-20 11:31:45 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Roman Hodain 2013-03-11 07:47:14 UTC
Description of problem:
When creating an internal snapshot of  virtual machine with High IO load and then remove this snapshot. The QCOW2 image is corrupted.

Version-Release number of selected component (if applicable):
   qemu-kvm-0.12.1.2-2.355.el6_4.1.x86_64

How reproducible:
   Always with high IO load

Steps to Reproduce:
1. create VM with QCoW2 disk format
2. Install RHEL on that VM
3.generate IO load
   example:
     for i in `seq 1 20`; do
	dd if=/dev/zero of=/dev/VolGroup/test&
	dd if=/dev/VolGroup/test of=/dev/null&
     done 

4. proceed with the following script on the host system:

    #!/bin/bash -x
    SNAPDATE=`date +%d-%m-%Y`
    TS=`date +%d%m%y-%H%m%S`
    i=/var/lib/libvirt/images/test.img
    qemu-img snapshot -c "$SNAPDATE" $i
    qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot
    virsh suspend test
    qemu-img snapshot -d "$SNAPDATE" $i
    virsh resume test
  
Actual results:
qemu-img check return errors:
   # qemu-img check /var/lib/libvirt/images/test.img 
      ERROR OFLAG_COPIED: l2_offset=8000000000040000 refcount=2
      ERROR OFLAG_COPIED: offset=8000000000050000 refcount=2
      ERROR OFLAG_COPIED: offset=8000000000060000 refcount=2
      ...
      Leaked cluster 4 refcount=2 reference=1
      Leaked cluster 5 refcount=2 reference=1
      ...
      28220 errors were found on the image.
      Data may be corrupted, or further writes to the image may corrupt it.

      28222 leaked clusters were found on the image.
      This means waste of disk space, but no harm to data.



Expected results:
No errors detected

Additional info:

Comment 5 Kevin Wolf 2013-03-20 11:31:45 UTC
(In reply to comment #0)
> 4. proceed with the following script on the host system:
> 
>     #!/bin/bash -x
>     SNAPDATE=`date +%d-%m-%Y`
>     TS=`date +%d%m%y-%H%m%S`
>     i=/var/lib/libvirt/images/test.img
>     qemu-img snapshot -c "$SNAPDATE" $i
>     qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot
>     virsh suspend test
>     qemu-img snapshot -d "$SNAPDATE" $i
>     virsh resume test

qcow2 images must not be used in read-write mode from two processes at the same
time. You can either have them opened either by one read-write process or by
many read-only processes. Having one (paused) read-write process (the running
VM) and additional read-only processes (copying out a snapshot with qemu-img)
may happen to work in practice, but you're on your own and we won't give
support for such attempts.

Additionally, internal snapshots are not supported in RHEL either. If you can
do without support, this is how it _should_ work on an upstream qemu:

  1. Pause the VM
  2. Take an internal snapshot with the 'savevm' command of the qemu monitor
     of the running VM, not with an external qemu-img process. virsh may or may
     not provide an interface for this.
  3. You can resume the VM now
  4. qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot
  5. Pause the VM again
  6. 'delvm' in the qemu monitor
  7. Resume the VM

Note that I said in upstream qemu. This is because RHEL's qemu-img doesn't even
have the -s option. This also means that your customer can't possibly have used
the RHEL 6 version of qemu-img if his script didn't give him errors.

So to summarize, we have three reasons why we can't accept this bug:

- Opening a qcow2 image r/w from two processes is always wrong and corruption
  is expected in such scenarios
- Internal snapshots are unsupported on RHEL
- The customer obviously didn't use RHEL binaries

Closing as NOTABUG for the first and third one. The second one would make it
WONTFIX, if in doubt.


Note You need to log in before you can comment on or make changes to this bug.