Description of problem: corruption similar to IT#348848 / BZ#531827 on a IDE device. Since we've found some bugs that need to be addressed specific to VirtIO we're going to open a new case to work on IDE specific failures: found another instance of this, but using the IDE interface this time: Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113668 Nov 4 08:00:55 222f-Cow kernel: Aborting journal on device hda2. Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113669 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113670 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113671 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113672 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113673 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113674 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113675 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113676 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113677 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113678 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2): ext3_free_blocks_sb: bit already cleared for block 113679 Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_free_blocks_sb: Journal has aborted Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_free_blocks_sb: Journal has aborted Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_truncate: Journal has aborted Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_orphan_del: Journal has aborted Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_reserve_inode_write: Journal has aborted Nov 4 08:00:55 222f-Cow kernel: EXT3-fs error (device hda2) in ext3_delete_inode: Journal has aborted Nov 4 08:00:55 222f-Cow kernel: __journal_remove_journal_head: freeing b_committed_data Nov 4 08:00:56 222f-Cow last message repeated 4 times Nov 4 08:00:56 222f-Cow kernel: ext3_abort called. Nov 4 08:00:56 222f-Cow kernel: EXT3-fs error (device hda2): ext3_journal_start_sb: Detected aborted journal Nov 4 08:00:56 222f-Cow kernel: Remounting filesystem read-only The guest is using a templated disk image, the same one as several other guests on this host. I've attached a sosreport from the guest. Version-Release number of selected component (if applicable): rhev-sp215 How reproducible: random Steps to Reproduce: 1. load a VM (in this case - compiling gcc in a loop) 2. generate an NFS storage outage 3. Actual results: see above Expected results: VM should pause and not get corrupted Additional info:
Created attachment 367832 [details] sosreport from guest
Test with build kvm-83-137 again, VM can stop on read errors. steps: 1. create a local nfs server in host, then mount localhost:/root/test-nfs /mnt -o rw,soft,timeo=1,retrans=0 cd /mnt qemu-img create test-533390.qcow2 -f qcow2 10G 2. start guest: /usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -usbdevice tablet -cpu qemu64,+sse2 -drive file=RHEL-Server-5.4-64-virtio.qcow2,if=ide,format=qcow2,cache=off,werror=stop -smp 2 -m 2G -vnc :1 -net nic,macaddr=20:20:20:11:12:56,model=e1000,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -monitor stdio -drive file=/mnt/test-533390.qcow2,if=ide,format=qcow2,cache=off,werror=stop 3. in guest: fdisk /dev/hdb mkfs.ext3 /dev/hdb1 dd if=/dev/hdb1 of=/dev/null 4. in host: service nfs stop 5. In host dmesg: nfs: server localhost not responding, timed out 6. In qemu monitor (qemu)info status VM status: paused 7. in host: service nfs start 8. In qemu monitor (qemu)c Guest works fine after restore.
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative.
Test in kvm-83-160.el5 with both raw and qcow2, guest can stop on read error. (Tried 5 times for each format) steps: 1. mount nfs server and create test disk: # mount 10.66.91.156:/root/test-nfs /mnt -o rw,soft,timeo=1,retrans=0 # qemu-img create test-552487.raw -f raw 200M Formatting 'test-552487.raw', fmt=raw, size=204800 kB # qemu-io test-552487.raw qemu-io> write -P 97 0 50M wrote 52428800/52428800 bytes at offset 0 50 MiB, 1 ops; 0.0000 sec (69.333 MiB/sec and 1.3867 ops/sec) qemu-io> write -P 98 50M 50M wrote 52428800/52428800 bytes at offset 52428800 50 MiB, 1 ops; 0.0000 sec (75.489 MiB/sec and 1.5098 ops/sec) qemu-io> write -P 99 100M 50M wrote 52428800/52428800 bytes at offset 104857600 50 MiB, 1 ops; 0.0000 sec (74.699 MiB/sec and 1.4940 ops/sec) qemu-io> write -P 100 150M 50M wrote 52428800/52428800 bytes at offset 157286400 50 MiB, 1 ops; 0.0000 sec (74.988 MiB/sec and 1.4998 ops/sec) qemu-io> quit # md5sum test-552487.raw ab5593b62c6e9fb1448e778bdd3c4d00 test-552487.raw 2.start guest: /usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 4G -drive file=RHEL-Server-5.4-64-virtio.qcow2,if=virtio,boot=on -net nic,vlan=0,macaddr=20:88:99:11:20:61,model=e1000 -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -cpu qemu64,+sse2 -vnc :10 -monitor stdio -notify all -M rhel5.5.0 -startdate now -drive file=/mnt/test-552487.raw,cache=off,if=ide,werror=stop 3. in guest: dd if=/dev/hda of=/dev/null 4. in host: service nfs stop 5. In host dmesg: nfs: server localhost not responding, timed out 6. In qemu monitor (qemu) # VM is stopped due to disk write error: ide0-hd0: Input/output error (qemu)info status VM status: paused 7. in host: service nfs start 8. In qemu monitor (qemu)c 9. Tried for 5 times, and then check: # md5sum test-552487.raw ab5593b62c6e9fb1448e778bdd3c4d00 test-552487.raw
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0271.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days