Description of problem: I created and mounted an ext2 filesystem on a 100G lv. I then started genesis to that filesystem and then attempted two 100M snapshots. During the second attempt, ext2 reported corruption errors and the snap cmd hung. [root@link-10 ~]# lvscan ACTIVE Original '/dev/mirror_9_9574/lvol0' [100.00 GB] inherit ACTIVE Snapshot '/dev/mirror_9_9574/lvol1' [100.00 MB] inherit [root@link-10 ~]# lvcreate -s -L 100M /dev/mirror_9_9574/lvol0 [HANG] Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 13303808 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 13303809 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_free_blocks: Freeing blocks in system zones - Block = 13303810, count = 7 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 13664256 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 13303810 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 9043968 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 2064580 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_free_blocks: Freeing blocks in system zones - Block = 2064581, count = 7 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 13303811 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 13664260 Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_free_blocks: Freeing blocks in system zones - Block = 13664261, count = 7 Nov 14 12:23:05 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 13303812 Nov 14 12:23:05 link-10 kernel: EXT2-fs error (device dm-1): ext2_free_blocks: Freeing blocks in system zones - Block = 13303813, count = 7 Nov 14 12:23:05 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 13303820 Nov 14 12:23:05 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block: Allocating block in system zone - block = 9043972 Version-Release number of selected component (if applicable): [root@link-10 ~]# lvcreate --version LVM version: 2.02.00 (2005-11-10) Library version: 1.02.00 (2005-11-10) Driver version: 4.4.0 [root@link-10 tmp]# rpm -qa | grep device device-mapper-1.02.00-1.0.RHEL4 How reproducible: will attempt
This is easily reproduced without genesis, all it takes is a looping file copy.
Found a userspace bug that might cause this.
Currently it reloads device tables every time - even when the existing table is the same. The FIXME in the code need resolving, ideally comparing the live table with the new one required, and suppressing the reload operation if they are identical.
Also, there's a missing ordering relation that needs to resume the new snapshot LV before resuming the original LV.
dm cvs code now suppresses reloading tables that haven't changed
Ordering changed so snapshot origins now get resumed last (i.e. *after* new snapshots of them begin).
Need to retest with lvm2 2.02.01 and dm 1.02.01 which I'll release shortly.
I still see corruption with the new rpms. [root@link-08 bin]# rpm -q device-mapper device-mapper-1.02.02-1.0.RHEL4 [root@link-08 bin]# rpm -q lvm2 lvm2-2.02.01-1.1.RHEL4 Buffer I/O error on device dm-5, logical block 0 lost page write due to I/O error on dm-5 Buffer I/O error on device dm-7, logical block 0 lost page write due to I/O error on dm-7 Buffer I/O error on device dm-5, logical block 0 lost page write due to I/O error on dm-5 Buffer I/O error on device dm-7, logical block 0 lost page write due to I/O error on dm-7 EXT3-fs error (device dm-5): read_inode_bitmap: Cannot read inode bitmap - block_group = 0, inode_bitmap = 642 Aborting journal on device dm-5. Buffer I/O error on device dm-5, logical block 1161 lost page write due to I/O error on dm-5 Buffer I/O error on device dm-5, logical block 1161 Buffer I/O error on device dm-5, logical block 0 lost page write due to I/O error on dm-5 EXT3-fs error (device dm-5) in ext3_new_inode: IO failure Buffer I/O error on device dm-5, logical block 0 lost page write due to I/O error on dm-5 EXT3-fs error (device dm-5) in ext3_create: IO failure Buffer I/O error on device dm-5, logical block 0 lost page write due to I/O error on dm-5 EXT3-fs error (device dm-7): read_inode_bitmap: Cannot read inode bitmap - block_group = 0, inode_bitmap = 642 Aborting journal on device dm-7. Buffer I/O error on device dm-7, logical block 1161 lost page write due to I/O error on dm-7 EXT3-fs error (device dm-7) in ext3_new_inode: IO failure EXT3-fs error (device dm-7) in ext3_create: IO failure lost page write due to I/O error on dm-5 printk: 17 messages suppressed. Buffer I/O error on device dm-5, logical block 0 lost page write due to I/O error on dm-5 device-mapper: Could not create kcopyd client device-mapper: error adding target to table [root@link-08 bin]# touch /mnt/snap2/foo touch: cannot touch `/mnt/snap2/foo': Read-only file system EXT3-fs error (device dm-7): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only Dec 20 09:39:35 link-08 kernel: ext3_abort called. Dec 20 09:39:35 link-08 kernel: EXT3-fs error (device dm-7): ext3_journal_start_sb: Detected aborted journal Dec 20 09:39:35 link-08 kernel: Remounting filesystem read-only ext3_abort called. [root@link-08 bin]# mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) none on /proc type proc (rw) none on /sys type sysfs (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) usbfs on /proc/bus/usb type usbfs (rw) /dev/hda5 on /boot type ext3 (rw) none on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/mapper/snapper-snap1 on /mnt/snap1 type ext3 (rw) /dev/mapper/snapper-snap2 on /mnt/snap2 type ext3 (rw) /dev/mapper/snapper-origin on /mnt/origin type ext3 (rw) /dev/mapper/snapper-snap3 on /mnt/snap3 type ext3 (rw)
And exactly which kernel?
[root@link-08 bin]# uname -ar Linux link-08 2.6.9-25.ELsmp #1 SMP Mon Dec 12 17:29:54 EST 2005 x86_64 x86_64 x86_64 GNU/Linux
Once Jason has incorporated the two snapshot patches I sent him yesterday in connection with bug 172839 and bug 177620 please repeat the test next week with his new kernel. If that fails, we can try out some further kernel patches.
This appears to be fixed with the latest kern/rpms [root@link-08 bin]# uname -ar Linux link-08 2.6.9-28.ELsmp #1 SMP Fri Jan 13 17:08:22 EST 2006 x86_64 x86_64 x86_64 GNU/Linux [root@link-08 bin]# rpm -q lvm2 lvm2-2.02.01-1.3.RHEL4 [root@link-08 bin]# rpm -q device-mapper device-mapper-1.02.02-3.0.RHEL4
So one or other of the kernel patches fixed it after the lvm2 packages were also fixed. Closing.