Bug 626684
Summary: | Filesystem corruption in both xfs & ext4 with KVM guest | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michael Hagmann <michael.hagmann> | ||||||||||||||
Component: | kvm | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||
Priority: | low | ||||||||||||||||
Version: | 13 | CC: | anton, aquini, berrange, clalance, dougsland, ehabkost, extras-orphan, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, markmc, quintela, virt-maint | ||||||||||||||
Target Milestone: | --- | ||||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2010-08-27 16:37:55 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Created attachment 440564 [details]
Host Sosreport Enif
Created attachment 440565 [details]
Raid controller Logs Host System
(the raid controller logs are all from months ago, but that's ok) So it shut down on this path: sys_open do_sys_open do_filp_open vfs_create xfs_vn_create xfs_vn_mknod xfs_create xfs_trans_cancel We got the error & shut down due to canceling a dirty transaction. It's not clear where in xfs_create things failed, but it's interesting that you've gone down the mknod path, i.e. creating a device special file. This rings a bell for me but I can't remember why. :( Is it always failing in mknod? If you do cp -v can you see what file it's copying and provide details (name, permissions, major/minor etc)? If you mount/unmount the filesystem that shut down, then run xfs_repair (-n for a dry run) does it find corruption? -Eric Hi Eric no Idea why I have some special files in my data / home ? I try to rerun this test. xfs_repair found errors but I have to put away the xfs logs. In the meantime I try another test. I build with my 4 2 TB Disks two Raid1 with 2TB brutto and format it with ext4 ( assumption that the xfs filesystem is bad or the Disksize is to much) The disks are presentend to the Host System Enif ( Fedora13 ) and one is exported to the Guest Scheat( Fedora13 ) with LVM on it and then ext4 as a disk. [root@enif ~]# pvs PV VG Fmt Attr PSize PFree /dev/sda3 vg_local lvm2 a- 288.01g 68.01g /dev/sdb vg_data1 lvm2 a- 1.82t 0 /dev/sdc vg_data2 lvm2 a- 1.82t 0 [root@enif ~]# vgs VG #PV #LV #SN Attr VSize VFree vg_data1 1 1 0 wz--n- 1.82t 0 vg_data2 1 1 0 wz--n- 1.82t 0 vg_local 1 4 0 wz--n- 288.01g 68.01g [root@enif ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert lv_data1 vg_data1 -wi-ao 1.82t lv_data2 vg_data2 -wi-ao 1.82t lv_enif_crash vg_local -wi-ao 10.00g lv_enif_root vg_local -wi-ao 30.00g lv_old_enif_root vg_local -wi-a- 30.00g lv_virt vg_local -wi-ao 150.00g [root@enif ~]# config snip from kvm xml <disk type='block' device='disk'> <driver name='qemu' type='raw'/> <source dev='/dev/mapper/vg_data1-lv_data1'/> <target dev='vdb' bus='virtio'/> </disk> and now I copy the data to a 2TB disk inside the guest and the other 2TB on the host as follow: - mount over nfs the data to Host and to guest - rsync the data from nfs mounted share to the attached disk result: - on the Host enif no problem all data are there wihout error! - on the KVM Guest Scheat lot of problems !! - fsck run very long EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #67502197: rec_len is too small for name_len - offset=0, inode=8388608, rec_len=16, name_len=128 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #69206100: inode out of bounds - offset=0, inode=4294967295, rec_len=4096, name_len=255 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #69206099: inode out of bounds - offset=0, inode=4294967295, rec_len=4096, name_len=255 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #70516741: inode out of bounds - offset=0, inode=4294967295, rec_len=4096, name_len=255 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #70516739: inode out of bounds - offset=0, inode=4294967295, rec_len=4096, name_len=255 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #69206101: inode out of bounds - offset=0, inode=4294967295, rec_len=4096, name_len=255 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73400854: rec_len is too small for name_len - offset=0, inode=12582912, rec_len=16, name_len=192 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531481: directory entry across blocks - offset=0, inode=262672436, rec_len=248684, name_len=169 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531484: directory entry across blocks - offset=0, inode=3230352654, rec_len=119404, name_len=187 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531941: directory entry across blocks - offset=0, inode=1284650619, rec_len=56464, name_len=143 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531488: directory entry across blocks - offset=0, inode=2094283311, rec_len=176972, name_len=73 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531945: directory entry across blocks - offset=0, inode=4200826031, rec_len=195440, name_len=11 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531949: directory entry across blocks - offset=0, inode=2307910799, rec_len=40712, name_len=108 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73533776: rec_len is too small for name_len - offset=0, inode=12582912, rec_len=16, name_len=192 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531955: directory entry across blocks - offset=0, inode=2974024306, rec_len=45480, name_len=211 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531954: directory entry across blocks - offset=0, inode=2359655960, rec_len=64764, name_len=19 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531953: directory entry across blocks - offset=0, inode=2773650414, rec_len=40312, name_len=7 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73531957: directory entry across blocks - offset=0, inode=2563861061, rec_len=125588, name_len=200 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory #73269736: inode out of bounds - offset=0, inode=4294967295, rec_len=4096, name_len=2 IMHO it looks like the virtualisation with KVM made some problem Mike Created attachment 441261 [details]
Controller Logs
Created attachment 441262 [details]
SOSreport KVM Host Enif
Created attachment 441263 [details]
SOSreport Guest System Scheat
no Idea what was going wrong befor, fsck found lot of errors and restarted a few times: Directory inode 71174624, block #4, offset 0: directory corrupted Salvage? yes Entry 'M-d^Zm_^\Fu|M-}}M-cM-WM-k>M-;M-^M-qM-*GM-VM-dM-]M-X[M-^TM-^GkM-EM-PM-tM-$M-FM-^Z^HM-^u^HM-9%^TM-^GM-+M-jFU@b-' in ??? (71174624) references inode 1310720 in group 159 where _INODE_UNINIT is set. Fix? yes Entry 'M-d^Zm_^\Fu|M-}}M-cM-WM-k>M-;M-^M-qM-*GM-VM-dM-]M-X[M-^TM-^GkM-EM-PM-tM-$M-FM-^Z^HM-^u^HM-9%^TM-^GM-+M-jFU@b-' in ??? (71174624) has deleted/unused inode 1310720. Clear? yes Directory inode 71177913, block #6, offset 0: directory corrupted Salvage? yes Directory inode 73273050, block #1, offset 0: directory corrupted Salvage? yes Directory inode 71177913, block #11, offset 0: directory corrupted Salvage? yes Entry '^E)^AM-^IM-CM-x^@^@^AM-`^GM-lM-^A^@^@M-^]M-l@[M-^LM-^[M-,^NM-W?^[M-^B^M?w GM-^D^L@^HM-OM-#M-B^R^TM-nM-^YM-^BRM-]' in ??? (71570850) has invalid inode #: 3120627712. Clear? yes Directory inode 71570850, block #2, offset 1604: directory corrupted Salvage? yes Directory inode 71571305, block #8, offset 0: directory corrupted Salvage? yes Directory inode 71177644, block #8, offset 0: directory corrupted Salvage? yes Restarting e2fsck from the beginning... Group descriptor 7 checksum is invalid. FIXED. Group descriptor 159 checksum is invalid. FIXED. Group descriptor 6055 checksum is invalid. FIXED. Group descriptor 8768 checksum is invalid. FIXED. Group descriptor 8784 checksum is invalid. FIXED. /dev/vdb contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes llegal block number passed to ext2fs_test_block_bitmap #3341098265 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #1023337331 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #2457660973 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #3334471979 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #666343607 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #3543293982 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #2248889662 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #615565675 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #3403100805 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #2778754129 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #3211825734 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #777616486 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #823570077 for multiply claimed block map Multiply-claimed block(s) in inode 77463665: 26810069 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks What do you think Eric should I replace the lvm layer ? I really have no Idea what's the Problem. thanks Mike ok, sorry I was wrong about mknod. Ignore that part. In any case, yes, this looks a lot like a kvm storage/setup error, not a filesystem error. Any chance you have something else accessing the backing storage for the guest? You may need to talk to some KVM folks to see if there is anything wrong with your setup or any known bugs... I'm changing the subject since it's not xfs-specific. ok yes on the Host ( the disks are in the host ) I try to find someone Mike that's very bad. As far as I know Fedora13 is base for RHEL6 and we evaluate KVM as a sucessor for Vmware. but with this Problems I don't feel very comfortable. Mike anyone from the KVM Guys that could help with this Problem ? thanks Mike > and now I copy the data to a 2TB disk inside the guest and the other 2TB on the > host as follow: Ah the magic phrase "2TB disk guest disk". Might well be hitting this bug: "2tb virtio disk gets massively corrupted filesystems " https://bugzilla.redhat.com/show_bug.cgi?id=605757 thanks ! update in progress .... Installed: kernel.x86_64 0:2.6.33.8-149.fc13 Updated: SDL.x86_64 0:1.2.14-7.fc13 augeas-libs.x86_64 0:0.7.3-1.fc13 cronie.x86_64 0:1.4.5-2.fc13 cronie-anacron.x86_64 0:1.4.5-2.fc13 curl.x86_64 0:7.20.1-4.fc13 dbus-glib.x86_64 0:0.86-4.fc13 gnupg2.x86_64 0:2.0.14-6.fc13 gpxe-roms-qemu.noarch 0:1.0.1-1.fc13 grubby.x86_64 0:7.0.16-1.fc13 kernel-headers.x86_64 0:2.6.33.8-149.fc13 libcurl.x86_64 0:7.20.1-4.fc13 libudev.x86_64 0:153-3.fc13 libusb.x86_64 0:0.1.12-23.fc13 linux-firmware.noarch 0:20100806-4.fc13 mc.x86_64 1:4.7.3-1.fc13 nss.x86_64 0:3.12.6-12.fc13 nss-sysinit.x86_64 0:3.12.6-12.fc13 openldap.x86_64 0:2.4.21-10.fc13 patch.x86_64 0:2.6.1-4.fc13 qemu.x86_64 2:0.12.5-1.fc13 qemu-common.x86_64 2:0.12.5-1.fc13 qemu-img.x86_64 2:0.12.5-1.fc13 qemu-kvm.x86_64 2:0.12.5-1.fc13 qemu-system-arm.x86_64 2:0.12.5-1.fc13 qemu-system-cris.x86_64 2:0.12.5-1.fc13 qemu-system-m68k.x86_64 2:0.12.5-1.fc13 qemu-system-mips.x86_64 2:0.12.5-1.fc13 qemu-system-ppc.x86_64 2:0.12.5-1.fc13 qemu-system-sh4.x86_64 2:0.12.5-1.fc13 qemu-system-sparc.x86_64 2:0.12.5-1.fc13 qemu-system-x86.x86_64 2:0.12.5-1.fc13 qemu-user.x86_64 2:0.12.5-1.fc13 ruby-libs.x86_64 0:1.8.6.399-6.fc13 seabios-bin.noarch 0:0.6.0-1.fc13 selinux-policy.noarch 0:3.7.19-49.fc13 selinux-policy-targeted.noarch 0:3.7.19-49.fc13 system-config-firewall-base.noarch 0:1.2.27-1.fc13 udev.x86_64 0:153-3.fc13 yum.noarch 0:3.2.28-3.fc13 Complete! Now it looks OK ! [root@scheat ~]# umount /export/data1/ [root@scheat ~]# fsck /dev/vdb fsck from util-linux-ng 2.17.2 e2fsck 1.41.10 (10-Feb-2009) /dev/vdb: clean, 960601/122077184 files, 160360170/488278016 blocks [root@scheat ~]# no problem at all Mike also from the host no problem any more [root@enif data2]# fsck /dev/mapper/vg_data1-lv_data1 fsck from util-linux-ng 2.17.2 e2fsck 1.41.10 (10-Feb-2009) /dev/mapper/vg_data1-lv_data1: clean, 960601/122077184 files, 160360170/488278016 blocks [root@enif data2]# thanks Mike *** This bug has been marked as a duplicate of bug 605757 *** |
Created attachment 440563 [details] Guest Sosreport Scheat Description of problem: I have a Fedora 13 Kvm Host (Enif, 2 Core/6GB Mem) with a Fedora KVM Guest ( Scheat, 1 Core / 2GB Memory) . the Guest has a 4 TB LVM Lun ( from 3Ware 3690SA-8i Controller RAID 10 ) attached with xfs. I try to copy a complete dir with cp -a data data-test then after minutes the xfs shutdown Aug 23 23:47:08 scheat kernel: Pid: 13255, comm: cp Not tainted 2.6.33.6-147.2.4.fc13.x86_64 #1 Aug 23 23:47:08 scheat kernel: Call Trace: Aug 23 23:47:08 scheat kernel: [<ffffffffa009c3aa>] xfs_error_report+0x3c/0x3e [xfs] Aug 23 23:47:08 scheat kernel: [<ffffffffa00b80d9>] ? xfs_create+0x4b8/0x547 [xfs] Aug 23 23:47:08 scheat kernel: [<ffffffffa00b39d8>] xfs_trans_cancel+0x5f/0xea [xfs] Aug 23 23:47:08 scheat kernel: [<ffffffffa00b80d9>] xfs_create+0x4b8/0x547 [xfs] Aug 23 23:47:08 scheat kernel: [<ffffffffa00c120b>] xfs_vn_mknod+0xd0/0x16d [xfs] Aug 23 23:47:08 scheat kernel: [<ffffffffa00c12c3>] xfs_vn_create+0xb/0xd [xfs] Aug 23 23:47:08 scheat kernel: [<ffffffff81109e66>] vfs_create+0x73/0x95 Aug 23 23:47:08 scheat kernel: [<ffffffff8110c445>] do_filp_open+0x36c/0xad5 Aug 23 23:47:08 scheat kernel: [<ffffffff8120396d>] ? might_fault+0x1c/0x1e Aug 23 23:47:08 scheat kernel: [<ffffffff81114fdd>] ? alloc_fd+0x76/0x11f Aug 23 23:47:08 scheat kernel: [<ffffffff810ff79a>] do_sys_open+0x5e/0x10a Aug 23 23:47:08 scheat kernel: [<ffffffff810ff86f>] sys_open+0x1b/0x1d Aug 23 23:47:08 scheat kernel: [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Aug 23 23:47:08 scheat kernel: xfs_force_shutdown(vdb,0x8) called from line 1163 of file fs/xfs/xfs_trans.c. Return address = 0xffffffffa00b39f1 Aug 23 23:47:08 scheat kernel: Filesystem "vdb": Corruption of in-memory data detected. Shutting down filesystem: vdb Aug 23 23:47:08 scheat kernel: Please umount the filesystem, and rectify the problem(s) Aug 23 23:47:11 scheat kernel: Filesystem "vdb": xfs_log_force: error 5 returned. Aug 23 23:47:41 scheat kernel: Filesystem "vdb": xfs_log_force: error 5 returned. Aug 23 23:48:05 scheat abrtd: Can't load '/usr/lib64/abrt/libKerneloopsScanner.so': /usr/lib64/abrt/libKerneloopsScanner.so: cannot open shared object file: No such file or directory Aug 23 23:48:05 scheat abrtd: Plugin 'KerneloopsScanner' is not registered Aug 23 23:48:11 scheat kernel: Filesystem "vdb": xfs_log_force: error 5 returned. Aug 23 23:48:41 scheat kernel: Filesystem "vdb": xfs_log_force: error 5 returned. Aug 23 23:49:11 scheat kernel: Filesystem "vdb": xfs_log_force: error 5 returned. Aug 23 23:49:41 scheat kernel: Filesystem "vdb": xfs_log_force: error 5 returned. Version-Release number of selected component (if applicable): How reproducible: cp -a data data-test Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: