Bug 920020
| Summary: | qemu-img delete snapshot causes corruption under high IO load | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Roman Hodain <rhodain> |
| Component: | qemu-kvm | Assignee: | Kevin Wolf <kwolf> |
| Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.4 | CC: | acathrow, areis, bsarathy, dyasny, juzhang, kwolf, michen, mkenneth, rhod, stefanha, virt-maint |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-03-20 11:31:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 810856 | ||
(In reply to comment #0) > 4. proceed with the following script on the host system: > > #!/bin/bash -x > SNAPDATE=`date +%d-%m-%Y` > TS=`date +%d%m%y-%H%m%S` > i=/var/lib/libvirt/images/test.img > qemu-img snapshot -c "$SNAPDATE" $i > qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot > virsh suspend test > qemu-img snapshot -d "$SNAPDATE" $i > virsh resume test qcow2 images must not be used in read-write mode from two processes at the same time. You can either have them opened either by one read-write process or by many read-only processes. Having one (paused) read-write process (the running VM) and additional read-only processes (copying out a snapshot with qemu-img) may happen to work in practice, but you're on your own and we won't give support for such attempts. Additionally, internal snapshots are not supported in RHEL either. If you can do without support, this is how it _should_ work on an upstream qemu: 1. Pause the VM 2. Take an internal snapshot with the 'savevm' command of the qemu monitor of the running VM, not with an external qemu-img process. virsh may or may not provide an interface for this. 3. You can resume the VM now 4. qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot 5. Pause the VM again 6. 'delvm' in the qemu monitor 7. Resume the VM Note that I said in upstream qemu. This is because RHEL's qemu-img doesn't even have the -s option. This also means that your customer can't possibly have used the RHEL 6 version of qemu-img if his script didn't give him errors. So to summarize, we have three reasons why we can't accept this bug: - Opening a qcow2 image r/w from two processes is always wrong and corruption is expected in such scenarios - Internal snapshots are unsupported on RHEL - The customer obviously didn't use RHEL binaries Closing as NOTABUG for the first and third one. The second one would make it WONTFIX, if in doubt. |
Description of problem: When creating an internal snapshot of virtual machine with High IO load and then remove this snapshot. The QCOW2 image is corrupted. Version-Release number of selected component (if applicable): qemu-kvm-0.12.1.2-2.355.el6_4.1.x86_64 How reproducible: Always with high IO load Steps to Reproduce: 1. create VM with QCoW2 disk format 2. Install RHEL on that VM 3.generate IO load example: for i in `seq 1 20`; do dd if=/dev/zero of=/dev/VolGroup/test& dd if=/dev/VolGroup/test of=/dev/null& done 4. proceed with the following script on the host system: #!/bin/bash -x SNAPDATE=`date +%d-%m-%Y` TS=`date +%d%m%y-%H%m%S` i=/var/lib/libvirt/images/test.img qemu-img snapshot -c "$SNAPDATE" $i qemu-img convert -f qcow2 -O qcow2 -s "$SNAPDATE" $i $i-snapshot virsh suspend test qemu-img snapshot -d "$SNAPDATE" $i virsh resume test Actual results: qemu-img check return errors: # qemu-img check /var/lib/libvirt/images/test.img ERROR OFLAG_COPIED: l2_offset=8000000000040000 refcount=2 ERROR OFLAG_COPIED: offset=8000000000050000 refcount=2 ERROR OFLAG_COPIED: offset=8000000000060000 refcount=2 ... Leaked cluster 4 refcount=2 reference=1 Leaked cluster 5 refcount=2 reference=1 ... 28220 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 28222 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Expected results: No errors detected Additional info: