Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
When creating an internal snapshot of virtual machine with High IO load and then remove this snapshot. The QCOW2 image is corrupted.
Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.355.el6_4.1.x86_64
How reproducible:
Always with high IO load
Steps to Reproduce:
1. create VM with QCoW2 disk format
2. Install RHEL on that VM
3.generate IO load
example:
for i in `seq 1 20`; do
dd if=/dev/zero of=/dev/VolGroup/test&
dd if=/dev/VolGroup/test of=/dev/null&
done
4. proceed with the following script on the host system:
#!/bin/bash -x
SNAPDATE=`date +%d-%m-%Y`
TS=`date +%d%m%y-%H%m%S`
i=/var/lib/libvirt/images/test.img
qemu-img snapshot -c "$SNAPDATE" $i
qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot
virsh suspend test
qemu-img snapshot -d "$SNAPDATE" $i
virsh resume test
Actual results:
qemu-img check return errors:
# qemu-img check /var/lib/libvirt/images/test.img
ERROR OFLAG_COPIED: l2_offset=8000000000040000 refcount=2
ERROR OFLAG_COPIED: offset=8000000000050000 refcount=2
ERROR OFLAG_COPIED: offset=8000000000060000 refcount=2
...
Leaked cluster 4 refcount=2 reference=1
Leaked cluster 5 refcount=2 reference=1
...
28220 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.
28222 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
Expected results:
No errors detected
Additional info:
(In reply to comment #0)
> 4. proceed with the following script on the host system:
>
> #!/bin/bash -x
> SNAPDATE=`date +%d-%m-%Y`
> TS=`date +%d%m%y-%H%m%S`
> i=/var/lib/libvirt/images/test.img
> qemu-img snapshot -c "$SNAPDATE" $i
> qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot
> virsh suspend test
> qemu-img snapshot -d "$SNAPDATE" $i
> virsh resume test
qcow2 images must not be used in read-write mode from two processes at the same
time. You can either have them opened either by one read-write process or by
many read-only processes. Having one (paused) read-write process (the running
VM) and additional read-only processes (copying out a snapshot with qemu-img)
may happen to work in practice, but you're on your own and we won't give
support for such attempts.
Additionally, internal snapshots are not supported in RHEL either. If you can
do without support, this is how it _should_ work on an upstream qemu:
1. Pause the VM
2. Take an internal snapshot with the 'savevm' command of the qemu monitor
of the running VM, not with an external qemu-img process. virsh may or may
not provide an interface for this.
3. You can resume the VM now
4. qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot
5. Pause the VM again
6. 'delvm' in the qemu monitor
7. Resume the VM
Note that I said in upstream qemu. This is because RHEL's qemu-img doesn't even
have the -s option. This also means that your customer can't possibly have used
the RHEL 6 version of qemu-img if his script didn't give him errors.
So to summarize, we have three reasons why we can't accept this bug:
- Opening a qcow2 image r/w from two processes is always wrong and corruption
is expected in such scenarios
- Internal snapshots are unsupported on RHEL
- The customer obviously didn't use RHEL binaries
Closing as NOTABUG for the first and third one. The second one would make it
WONTFIX, if in doubt.
Description of problem: When creating an internal snapshot of virtual machine with High IO load and then remove this snapshot. The QCOW2 image is corrupted. Version-Release number of selected component (if applicable): qemu-kvm-0.12.1.2-2.355.el6_4.1.x86_64 How reproducible: Always with high IO load Steps to Reproduce: 1. create VM with QCoW2 disk format 2. Install RHEL on that VM 3.generate IO load example: for i in `seq 1 20`; do dd if=/dev/zero of=/dev/VolGroup/test& dd if=/dev/VolGroup/test of=/dev/null& done 4. proceed with the following script on the host system: #!/bin/bash -x SNAPDATE=`date +%d-%m-%Y` TS=`date +%d%m%y-%H%m%S` i=/var/lib/libvirt/images/test.img qemu-img snapshot -c "$SNAPDATE" $i qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot virsh suspend test qemu-img snapshot -d "$SNAPDATE" $i virsh resume test Actual results: qemu-img check return errors: # qemu-img check /var/lib/libvirt/images/test.img ERROR OFLAG_COPIED: l2_offset=8000000000040000 refcount=2 ERROR OFLAG_COPIED: offset=8000000000050000 refcount=2 ERROR OFLAG_COPIED: offset=8000000000060000 refcount=2 ... Leaked cluster 4 refcount=2 reference=1 Leaked cluster 5 refcount=2 reference=1 ... 28220 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 28222 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Expected results: No errors detected Additional info: