Bug 533390 - RHEL5.4 VM image corruption with an IDE v-disk
Summary: RHEL5.4 VM image corruption with an IDE v-disk
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.4
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 531827
Blocks: 540406
TreeView+ depends on / blocked
 
Reported: 2009-11-06 15:08 UTC by Dan Yasny
Modified: 2023-09-14 01:18 UTC (History)
15 users (show)

Fixed In Version: kvm-83-136.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 07:52:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport from guest (461.92 KB, application/x-bzip)
2009-11-06 15:24 UTC, Dan Yasny
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0271 0 normal SHIPPED_LIVE Important: kvm security, bug fix and enhancement update 2010-03-29 13:19:48 UTC

Description Dan Yasny 2009-11-06 15:08:25 UTC
Description of problem:
corruption similar to IT#348848 / BZ#531827 on a IDE device.  Since we've found some bugs that need to be addressed specific to VirtIO we're going to open a new case to work on IDE specific failures:

found another instance of this, but using the IDE interface this time:

Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113668
Nov  4 08:00:55 222f-Cow kernel: Aborting journal on device hda2.
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113669
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113670
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113671
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113672
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113673
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113674
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113675
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113676
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113677
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113678
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113679
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_free_blocks_sb: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_free_blocks_sb: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_truncate: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_orphan_del: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_delete_inode: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: __journal_remove_journal_head: freeing b_committed_data
Nov  4 08:00:56 222f-Cow last message repeated 4 times
Nov  4 08:00:56 222f-Cow kernel: ext3_abort called.
Nov  4 08:00:56 222f-Cow kernel: EXT3-fs error (device hda2): ext3_journal_start_sb: Detected aborted journal
Nov  4 08:00:56 222f-Cow kernel: Remounting filesystem read-only


The guest is using a templated disk image, the same one as several other guests on this host. I've attached a sosreport from the guest.

Version-Release number of selected component (if applicable):
rhev-sp215

How reproducible:
random

Steps to Reproduce:
1. load a VM (in this case - compiling gcc in a loop)
2. generate an NFS storage outage
3.
  
Actual results:
see above

Expected results:
VM should pause and not get corrupted

Additional info:

Comment 1 Dan Yasny 2009-11-06 15:24:12 UTC
Created attachment 367832 [details]
sosreport from guest

Comment 11 Miya Chen 2009-12-28 08:50:04 UTC
Test with build kvm-83-137 again, VM can stop on read errors.

steps:

1. create a local nfs server in host, then
mount localhost:/root/test-nfs /mnt -o rw,soft,timeo=1,retrans=0
cd /mnt
qemu-img create test-533390.qcow2 -f qcow2 10G


2. start guest:
/usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -usbdevice tablet -cpu qemu64,+sse2
-drive
file=RHEL-Server-5.4-64-virtio.qcow2,if=ide,format=qcow2,cache=off,werror=stop
-smp 2 -m 2G -vnc :1 -net nic,macaddr=20:20:20:11:12:56,model=e1000,vlan=0
-net tap,script=/etc/qemu-ifup,vlan=0 -monitor stdio -drive
file=/mnt/test-533390.qcow2,if=ide,format=qcow2,cache=off,werror=stop


3. in guest:
fdisk /dev/hdb
mkfs.ext3 /dev/hdb1
dd if=/dev/hdb1 of=/dev/null


4. in host:
service nfs stop

5. In host dmesg:
nfs: server localhost not responding, timed out


6. In qemu monitor
(qemu)info status
VM status: paused  

7. in host:
service nfs start

8. In qemu monitor
(qemu)c 

Guest works fine after restore.

Comment 15 Chris Ward 2010-02-11 10:22:04 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 16 Miya Chen 2010-03-02 07:13:08 UTC
Test in kvm-83-160.el5 with both raw and qcow2, guest can stop on read error. (Tried 5 times for each format)

steps:
1. mount nfs server and create test disk:
# mount 10.66.91.156:/root/test-nfs /mnt -o rw,soft,timeo=1,retrans=0
# qemu-img create test-552487.raw -f raw 200M
Formatting 'test-552487.raw', fmt=raw, size=204800 kB
# qemu-io test-552487.raw
qemu-io> write -P 97 0 50M
wrote 52428800/52428800 bytes at offset 0
50 MiB, 1 ops; 0.0000 sec (69.333 MiB/sec and 1.3867 ops/sec)
qemu-io> write -P 98 50M 50M
wrote 52428800/52428800 bytes at offset 52428800
50 MiB, 1 ops; 0.0000 sec (75.489 MiB/sec and 1.5098 ops/sec)
qemu-io> write -P 99 100M 50M
wrote 52428800/52428800 bytes at offset 104857600
50 MiB, 1 ops; 0.0000 sec (74.699 MiB/sec and 1.4940 ops/sec)
qemu-io> write -P 100 150M 50M
wrote 52428800/52428800 bytes at offset 157286400
50 MiB, 1 ops; 0.0000 sec (74.988 MiB/sec and 1.4998 ops/sec)
qemu-io> quit
# md5sum test-552487.raw
ab5593b62c6e9fb1448e778bdd3c4d00  test-552487.raw

2.start guest:
/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 4G -drive file=RHEL-Server-5.4-64-virtio.qcow2,if=virtio,boot=on -net nic,vlan=0,macaddr=20:88:99:11:20:61,model=e1000 -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -cpu qemu64,+sse2 -vnc :10 -monitor stdio -notify all -M rhel5.5.0 -startdate now -drive file=/mnt/test-552487.raw,cache=off,if=ide,werror=stop

3. in guest:
dd if=/dev/hda of=/dev/null

4. in host:
service nfs stop

5. In host dmesg:
nfs: server localhost not responding, timed out


6. In qemu monitor
(qemu) # VM is stopped due to disk write error: ide0-hd0: Input/output error
(qemu)info status
VM status: paused  

7. in host:
service nfs start

8. In qemu monitor
(qemu)c 

9. Tried for 5 times, and then check:
# md5sum test-552487.raw
ab5593b62c6e9fb1448e778bdd3c4d00  test-552487.raw

Comment 19 errata-xmlrpc 2010-03-30 07:52:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0271.html

Comment 20 Red Hat Bugzilla 2023-09-14 01:18:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.