Bug 619330

Summary: Image(snapshot) corrupted after power failure
Product: Red Hat Enterprise Linux 5 Reporter: Golita Yue <gyue>
Component: kernelAssignee: chellwig <chellwig>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: medium    
Version: 5.6CC: chellwig, esandeen, kwolf, michen, mkenneth, rhod, virt-maint
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-31 07:14:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580948    

Description Golita Yue 2010-07-29 09:50:27 UTC
Description of problem:
Image(snapshot) corrupted after power failure during running iozone 

Version-Release number of selected component (if applicable):
kvm-83-191.el5
kernel 2.6.18-203.el5

How reproducible:
50%

Steps to Reproduce:
0. run qemu-img create -f qcow2 -F qcow2 -b rhel5.qcow2  rhel5_snap.qcow2
   to create a snapshot
1. strart snapshot
/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -m 2G -smp 2 -drive file=rhel5_snap.qcow2,if=ide,boot=on,cache=none,format=qcow2,werror=stop -net nic,vlan=0,macaddr=00:20:20:11:8a:2c,model=virtio -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -cpu qemu64,+sse2 -balloon none -boot c -vnc :1 -notify all
2. run iozone in VM
3. cut off the host power during iozone is running
4. boot the host again
5. use 'qemu-img check' to check image and boot up snapshot
  
Actual results:
#qemu-img check rhel5_snap.qcow2
ERROR OFLAG_COPIED: offset=8000000002fe0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000002fb0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000002fc0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000002fd0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000002ff0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000003010000 refcount=0
ERROR OFLAG_COPIED: offset=8000000003020000 refcount=0
ERROR OFLAG_COPIED: offset=8000000003030000 refcount=0
ERROR OFLAG_COPIED: offset=8000000003050000 refcount=0
ERROR OFLAG_COPIED: offset=8000000003060000 refcount=0
ERROR OFLAG_COPIED: offset=8000000003090000 refcount=0
ERROR OFLAG_COPIED: offset=80000000030a0000 refcount=0
ERROR OFLAG_COPIED: offset=80000000030e0000 refcount=0
ERROR OFLAG_COPIED: offset=80000000030c0000 refcount=0
ERROR OFLAG_COPIED: offset=80000000030d0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000003100000 refcount=0
ERROR: invalid cluster offset=0x3100000
ERROR OFLAG_COPIED: offset=8000000003330000 refcount=0
ERROR: invalid cluster offset=0x3330000
ERROR OFLAG_COPIED: offset=8000000003370000 refcount=0
ERROR: invalid cluster offset=0x3370000
ERROR OFLAG_COPIED: offset=8000000003340000 refcount=0
ERROR: invalid cluster offset=0x3340000
ERROR OFLAG_COPIED: offset=8000000003350000 refcount=0
ERROR: invalid cluster offset=0x3350000
ERROR OFLAG_COPIED: offset=8000000003360000 refcount=0
ERROR: invalid cluster offset=0x3360000
ERROR OFLAG_COPIED: offset=8000000003380000 refcount=0
ERROR: invalid cluster offset=0x3380000
ERROR OFLAG_COPIED: offset=80000000033a0000 refcount=0
ERROR: invalid cluster offset=0x33a0000
ERROR OFLAG_COPIED: offset=80000000033e0000 refcount=0
ERROR: invalid cluster offset=0x33e0000
ERROR OFLAG_COPIED: offset=80000000033b0000 refcount=0
ERROR: invalid cluster offset=0x33b0000
ERROR OFLAG_COPIED: offset=80000000033c0000 refcount=0
ERROR: invalid cluster offset=0x33c0000
ERROR OFLAG_COPIED: offset=80000000033d0000 refcount=0
ERROR: invalid cluster offset=0x33d0000
ERROR OFLAG_COPIED: offset=80000000033f0000 refcount=0
ERROR: invalid cluster offset=0x33f0000
ERROR OFLAG_COPIED: offset=8000000003410000 refcount=0
ERROR: invalid cluster offset=0x3410000
ERROR OFLAG_COPIED: offset=8000000003450000 refcount=0
ERROR: invalid cluster offset=0x3450000
ERROR OFLAG_COPIED: offset=8000000003420000 refcount=0
ERROR: invalid cluster offset=0x3420000
ERROR OFLAG_COPIED: offset=8000000003430000 refcount=0
ERROR: invalid cluster offset=0x3430000
ERROR OFLAG_COPIED: offset=8000000003440000 refcount=0
ERROR: invalid cluster offset=0x3440000
ERROR OFLAG_COPIED: offset=8000000003460000 refcount=0
ERROR: invalid cluster offset=0x3460000
ERROR OFLAG_COPIED: offset=8000000003480000 refcount=0
ERROR: invalid cluster offset=0x3480000
ERROR OFLAG_COPIED: offset=80000000034c0000 refcount=0
ERROR: invalid cluster offset=0x34c0000
ERROR OFLAG_COPIED: offset=8000000003490000 refcount=0
ERROR: invalid cluster offset=0x3490000
ERROR OFLAG_COPIED: offset=80000000034a0000 refcount=0
ERROR: invalid cluster offset=0x34a0000
ERROR OFLAG_COPIED: offset=80000000034b0000 refcount=0
ERROR: invalid cluster offset=0x34b0000
ERROR OFLAG_COPIED: offset=80000000034d0000 refcount=0
ERROR: invalid cluster offset=0x34d0000
ERROR OFLAG_COPIED: offset=80000000034f0000 refcount=0
ERROR: invalid cluster offset=0x34f0000
ERROR OFLAG_COPIED: offset=8000000003530000 refcount=0
ERROR: invalid cluster offset=0x3530000
ERROR OFLAG_COPIED: offset=8000000003500000 refcount=0
ERROR: invalid cluster offset=0x3500000
ERROR OFLAG_COPIED: offset=8000000003510000 refcount=0
ERROR: invalid cluster offset=0x3510000
ERROR OFLAG_COPIED: offset=8000000003520000 refcount=0
ERROR: invalid cluster offset=0x3520000
ERROR OFLAG_COPIED: offset=8000000003540000 refcount=0
ERROR: invalid cluster offset=0x3540000
ERROR OFLAG_COPIED: offset=8000000003560000 refcount=0
ERROR: invalid cluster offset=0x3560000
ERROR OFLAG_COPIED: offset=80000000035e0000 refcount=0
ERROR: invalid cluster offset=0x35e0000
ERROR OFLAG_COPIED: offset=80000000030b0000 refcount=0
ERROR OFLAG_COPIED: offset=80000000031e0000 refcount=0
ERROR: invalid cluster offset=0x31e0000
ERROR cluster 759 refcount=1 reference=0
ERROR cluster 763 refcount=0 reference=1
ERROR cluster 764 refcount=0 reference=1
ERROR cluster 765 refcount=0 reference=1
ERROR cluster 766 refcount=0 reference=1
ERROR cluster 767 refcount=0 reference=1
ERROR cluster 769 refcount=0 reference=1
ERROR cluster 770 refcount=0 reference=1
ERROR cluster 771 refcount=0 reference=1
ERROR cluster 773 refcount=0 reference=1
ERROR cluster 774 refcount=0 reference=1
ERROR cluster 777 refcount=0 reference=1
ERROR cluster 778 refcount=0 reference=1
ERROR cluster 779 refcount=0 reference=1
ERROR cluster 780 refcount=0 reference=1
ERROR cluster 781 refcount=0 reference=1
ERROR cluster 782 refcount=0 reference=1
101 errors were found on the image.

Cannot boot up the snapshot after power failure :

(qemu) # VM is stopped due to disk write error: virtio0: Invalid argument
# VM is stopped due to disk write error: virtio0: Invalid argument
(qemu) info status
VM status: paused


Expected results:
no error message when check the image
Can boot up the snapshot after power failure

Additional info:
When I used the base image to run above steps, the error message still can be fond.

Comment 2 RHEL Program Management 2011-01-11 21:15:34 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 3 RHEL Program Management 2011-01-11 22:52:42 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 4 Kevin Wolf 2011-01-14 16:32:16 UTC
Can you still reproduce this?

Comment 5 Golita Yue 2011-01-17 06:22:01 UTC
(In reply to comment #4)
> Can you still reproduce this?

Too long ago bug.

Do you need me test this again ?

Comment 6 Kevin Wolf 2011-01-17 11:16:49 UTC
It would be helpful. Another thing that might be worth trying is to do the same on RHEL 6 and check if the problem still exists.

Comment 9 Golita Yue 2011-01-27 10:57:52 UTC
(In reply to comment #6)
> It would be helpful. Another thing that might be worth trying is to do the same
> on RHEL 6 and check if the problem still exists.

Tested 5 times on RHEL6 with the same steps in Description, didn't reproduced.
kernel 2.6.32-94.el6.x86_64
qemu-kvm-0.12.1.2-2.129.el6.x86_64

Comment 13 Kevin Wolf 2011-03-17 09:29:49 UTC
Golita, on which file system did you do your tests?

Comment 14 Golita Yue 2011-03-23 03:05:01 UTC
(In reply to comment #13)
> Golita, on which file system did you do your tests?

ext3 (Default installation for host, so the fstype is default.)

Comment 18 chellwig@redhat.com 2011-08-12 00:02:20 UTC
Unfortunately data loss with ext3 in RHEL5 is completely expected if you have a volatile write cache on your disk.  Did you run this test on a normal consumer SATA disk or some more expensive equipment.

What does a:

  cat /sys/block/sda/device/scsi_disk/0\:0\:0\:0/cache_type

on your system say?  (adapt the path if the test filesystem is on a different device)

Comment 19 Ronen Hod 2011-10-31 07:14:28 UTC
Since we didn't find the capacity to fix it for 5.8, it doesn't make sense to
continue dragging it to the next version. Closing.