Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 533390 - RHEL5.4 VM image corruption with an IDE v-disk [NEEDINFO]
RHEL5.4 VM image corruption with an IDE v-disk
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm (Show other bugs)
5.4
All Linux
urgent Severity medium
: rc
: ---
Assigned To: Kevin Wolf
Virtualization Bugs
: ZStream
Depends On: 531827
Blocks: 540406
  Show dependency treegraph
 
Reported: 2009-11-06 10:08 EST by Dan Yasny
Modified: 2018-10-20 00:01 EDT (History)
15 users (show)

See Also:
Fixed In Version: kvm-83-136.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 03:52:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
cward: needinfo? (dyasny)


Attachments (Terms of Use)
sosreport from guest (461.92 KB, application/x-bzip)
2009-11-06 10:24 EST, Dan Yasny
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0271 normal SHIPPED_LIVE Important: kvm security, bug fix and enhancement update 2010-03-29 09:19:48 EDT

  None (edit)
Description Dan Yasny 2009-11-06 10:08:25 EST
Description of problem:
corruption similar to IT#348848 / BZ#531827 on a IDE device.  Since we've found some bugs that need to be addressed specific to VirtIO we're going to open a new case to work on IDE specific failures:

found another instance of this, but using the IDE interface this time:

Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113668
Nov  4 08:00:55 222f-Cow kernel: Aborting journal on device hda2.
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113669
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113670
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113671
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113672
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113673
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113674
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113675
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113676
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113677
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113678
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113679
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_free_blocks_sb: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_free_blocks_sb: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_truncate: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_orphan_del: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_delete_inode: Journal has aborted
Nov  4 08:00:55 222f-Cow kernel: __journal_remove_journal_head: freeing b_committed_data
Nov  4 08:00:56 222f-Cow last message repeated 4 times
Nov  4 08:00:56 222f-Cow kernel: ext3_abort called.
Nov  4 08:00:56 222f-Cow kernel: EXT3-fs error (device hda2): ext3_journal_start_sb: Detected aborted journal
Nov  4 08:00:56 222f-Cow kernel: Remounting filesystem read-only


The guest is using a templated disk image, the same one as several other guests on this host. I've attached a sosreport from the guest.

Version-Release number of selected component (if applicable):
rhev-sp215

How reproducible:
random

Steps to Reproduce:
1. load a VM (in this case - compiling gcc in a loop)
2. generate an NFS storage outage
3.
  
Actual results:
see above

Expected results:
VM should pause and not get corrupted

Additional info:
Comment 1 Dan Yasny 2009-11-06 10:24:12 EST
Created attachment 367832 [details]
sosreport from guest
Comment 11 Miya Chen 2009-12-28 03:50:04 EST
Test with build kvm-83-137 again, VM can stop on read errors.

steps:

1. create a local nfs server in host, then
mount localhost:/root/test-nfs /mnt -o rw,soft,timeo=1,retrans=0
cd /mnt
qemu-img create test-533390.qcow2 -f qcow2 10G


2. start guest:
/usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -usbdevice tablet -cpu qemu64,+sse2
-drive
file=RHEL-Server-5.4-64-virtio.qcow2,if=ide,format=qcow2,cache=off,werror=stop
-smp 2 -m 2G -vnc :1 -net nic,macaddr=20:20:20:11:12:56,model=e1000,vlan=0
-net tap,script=/etc/qemu-ifup,vlan=0 -monitor stdio -drive
file=/mnt/test-533390.qcow2,if=ide,format=qcow2,cache=off,werror=stop


3. in guest:
fdisk /dev/hdb
mkfs.ext3 /dev/hdb1
dd if=/dev/hdb1 of=/dev/null


4. in host:
service nfs stop

5. In host dmesg:
nfs: server localhost not responding, timed out


6. In qemu monitor
(qemu)info status
VM status: paused  

7. in host:
service nfs start

8. In qemu monitor
(qemu)c 

Guest works fine after restore.
Comment 15 Chris Ward 2010-02-11 05:22:04 EST
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.
Comment 16 Miya Chen 2010-03-02 02:13:08 EST
Test in kvm-83-160.el5 with both raw and qcow2, guest can stop on read error. (Tried 5 times for each format)

steps:
1. mount nfs server and create test disk:
# mount 10.66.91.156:/root/test-nfs /mnt -o rw,soft,timeo=1,retrans=0
# qemu-img create test-552487.raw -f raw 200M
Formatting 'test-552487.raw', fmt=raw, size=204800 kB
# qemu-io test-552487.raw
qemu-io> write -P 97 0 50M
wrote 52428800/52428800 bytes at offset 0
50 MiB, 1 ops; 0.0000 sec (69.333 MiB/sec and 1.3867 ops/sec)
qemu-io> write -P 98 50M 50M
wrote 52428800/52428800 bytes at offset 52428800
50 MiB, 1 ops; 0.0000 sec (75.489 MiB/sec and 1.5098 ops/sec)
qemu-io> write -P 99 100M 50M
wrote 52428800/52428800 bytes at offset 104857600
50 MiB, 1 ops; 0.0000 sec (74.699 MiB/sec and 1.4940 ops/sec)
qemu-io> write -P 100 150M 50M
wrote 52428800/52428800 bytes at offset 157286400
50 MiB, 1 ops; 0.0000 sec (74.988 MiB/sec and 1.4998 ops/sec)
qemu-io> quit
# md5sum test-552487.raw
ab5593b62c6e9fb1448e778bdd3c4d00  test-552487.raw

2.start guest:
/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 4G -drive file=RHEL-Server-5.4-64-virtio.qcow2,if=virtio,boot=on -net nic,vlan=0,macaddr=20:88:99:11:20:61,model=e1000 -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -cpu qemu64,+sse2 -vnc :10 -monitor stdio -notify all -M rhel5.5.0 -startdate now -drive file=/mnt/test-552487.raw,cache=off,if=ide,werror=stop

3. in guest:
dd if=/dev/hda of=/dev/null

4. in host:
service nfs stop

5. In host dmesg:
nfs: server localhost not responding, timed out


6. In qemu monitor
(qemu) # VM is stopped due to disk write error: ide0-hd0: Input/output error
(qemu)info status
VM status: paused  

7. in host:
service nfs start

8. In qemu monitor
(qemu)c 

9. Tried for 5 times, and then check:
# md5sum test-552487.raw
ab5593b62c6e9fb1448e778bdd3c4d00  test-552487.raw
Comment 19 errata-xmlrpc 2010-03-30 03:52:29 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0271.html

Note You need to log in before you can comment on or make changes to this bug.