Bug 173209

Summary: filesystem corruption while creating 2nd snapshot with I/O to origin
Product: Red Hat Enterprise Linux 4 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Alasdair Kergon <agk>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-01-19 17:57:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2005-11-14 23:39:15 UTC
Description of problem:
I created and mounted an ext2 filesystem on a 100G lv. I then started genesis to
that filesystem and then attempted two 100M snapshots. During the second
attempt, ext2 reported corruption errors and the snap cmd hung.

[root@link-10 ~]# lvscan
  ACTIVE   Original '/dev/mirror_9_9574/lvol0' [100.00 GB] inherit
  ACTIVE   Snapshot '/dev/mirror_9_9574/lvol1' [100.00 MB] inherit
[root@link-10 ~]# lvcreate -s -L 100M /dev/mirror_9_9574/lvol0
[HANG]

Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 13303808
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 13303809
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_free_blocks:
Freeing blocks in system zones - Block = 13303810, count = 7
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 13664256
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 13303810
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 9043968
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 2064580
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_free_blocks:
Freeing blocks in system zones - Block = 2064581, count = 7
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 13303811
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 13664260
Nov 14 12:23:04 link-10 kernel: EXT2-fs error (device dm-1): ext2_free_blocks:
Freeing blocks in system zones - Block = 13664261, count = 7
Nov 14 12:23:05 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 13303812
Nov 14 12:23:05 link-10 kernel: EXT2-fs error (device dm-1): ext2_free_blocks:
Freeing blocks in system zones - Block = 13303813, count = 7
Nov 14 12:23:05 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 13303820
Nov 14 12:23:05 link-10 kernel: EXT2-fs error (device dm-1): ext2_new_block:
Allocating block in system zone - block = 9043972

Version-Release number of selected component (if applicable):
[root@link-10 ~]# lvcreate --version
  LVM version:     2.02.00 (2005-11-10)
  Library version: 1.02.00 (2005-11-10)
  Driver version:  4.4.0
[root@link-10 tmp]# rpm -qa | grep device
device-mapper-1.02.00-1.0.RHEL4


How reproducible:
will attempt

Comment 1 Corey Marthaler 2005-11-14 23:51:02 UTC
This is easily reproduced without genesis, all it takes is a looping file copy.

Comment 2 Alasdair Kergon 2005-11-15 18:00:41 UTC
Found a userspace bug that might cause this.

Comment 3 Alasdair Kergon 2005-11-15 18:05:16 UTC
Currently it reloads device tables every time - even when the existing table is
the same.  The FIXME in the code need resolving, ideally comparing the live
table with the new one required, and suppressing the reload operation if they
are identical.

Comment 4 Alasdair Kergon 2005-11-22 19:32:14 UTC
Also, there's a missing ordering relation that needs to resume the new snapshot
LV before resuming the original LV.

Comment 5 Alasdair Kergon 2005-11-22 19:33:37 UTC
dm cvs code now suppresses reloading tables that haven't changed

Comment 6 Alasdair Kergon 2005-11-22 19:59:00 UTC
Ordering changed so snapshot origins now get resumed last (i.e. *after* new
snapshots of them begin).

Comment 7 Alasdair Kergon 2005-11-23 16:07:02 UTC
Need to retest with lvm2 2.02.01 and dm 1.02.01 which I'll release shortly.

Comment 8 Corey Marthaler 2005-12-20 21:03:59 UTC
I still see corruption with the new rpms.

[root@link-08 bin]# rpm -q device-mapper
device-mapper-1.02.02-1.0.RHEL4
[root@link-08 bin]# rpm -q lvm2
lvm2-2.02.01-1.1.RHEL4


Buffer I/O error on device dm-5, logical block 0
lost page write due to I/O error on dm-5
Buffer I/O error on device dm-7, logical block 0
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-5, logical block 0
lost page write due to I/O error on dm-5
Buffer I/O error on device dm-7, logical block 0
lost page write due to I/O error on dm-7
EXT3-fs error (device dm-5): read_inode_bitmap: Cannot read inode bitmap -
block_group = 0, inode_bitmap = 642
Aborting journal on device dm-5.
Buffer I/O error on device dm-5, logical block 1161
lost page write due to I/O error on dm-5
Buffer I/O error on device dm-5, logical block 1161
Buffer I/O error on device dm-5, logical block 0
lost page write due to I/O error on dm-5
EXT3-fs error (device dm-5) in ext3_new_inode: IO failure
Buffer I/O error on device dm-5, logical block 0
lost page write due to I/O error on dm-5
EXT3-fs error (device dm-5) in ext3_create: IO failure
Buffer I/O error on device dm-5, logical block 0
lost page write due to I/O error on dm-5
EXT3-fs error (device dm-7): read_inode_bitmap: Cannot read inode bitmap -
block_group = 0, inode_bitmap = 642
Aborting journal on device dm-7.
Buffer I/O error on device dm-7, logical block 1161
lost page write due to I/O error on dm-7
EXT3-fs error (device dm-7) in ext3_new_inode: IO failure
EXT3-fs error (device dm-7) in ext3_create: IO failure
lost page write due to I/O error on dm-5
printk: 17 messages suppressed.
Buffer I/O error on device dm-5, logical block 0
lost page write due to I/O error on dm-5
device-mapper: Could not create kcopyd client
device-mapper: error adding target to table

[root@link-08 bin]# touch /mnt/snap2/foo
touch: cannot touch `/mnt/snap2/foo': Read-only file system

EXT3-fs error (device dm-7): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
Dec 20 09:39:35 link-08 kernel: ext3_abort called.
Dec 20 09:39:35 link-08 kernel: EXT3-fs error (device dm-7):
ext3_journal_start_sb: Detected aborted journal
Dec 20 09:39:35 link-08 kernel: Remounting filesystem read-only
ext3_abort called.


[root@link-08 bin]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/hda5 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/mapper/snapper-snap1 on /mnt/snap1 type ext3 (rw)
/dev/mapper/snapper-snap2 on /mnt/snap2 type ext3 (rw)
/dev/mapper/snapper-origin on /mnt/origin type ext3 (rw)
/dev/mapper/snapper-snap3 on /mnt/snap3 type ext3 (rw)


Comment 9 Alasdair Kergon 2005-12-20 21:13:26 UTC
And exactly which kernel?

Comment 10 Corey Marthaler 2005-12-20 21:46:05 UTC
[root@link-08 bin]# uname -ar
Linux link-08 2.6.9-25.ELsmp #1 SMP Mon Dec 12 17:29:54 EST 2005 x86_64 x86_64
x86_64 GNU/Linux


Comment 11 Alasdair Kergon 2006-01-13 20:07:23 UTC
Once Jason has incorporated the two snapshot patches I sent him yesterday in
connection with bug 172839 and bug 177620 please repeat the test next week with
his new kernel.

If that fails, we can try out some further kernel patches.


Comment 12 Corey Marthaler 2006-01-18 21:55:17 UTC
This appears to be fixed with the latest kern/rpms

[root@link-08 bin]# uname -ar
Linux link-08 2.6.9-28.ELsmp #1 SMP Fri Jan 13 17:08:22 EST 2006 x86_64 x86_64
x86_64 GNU/Linux
[root@link-08 bin]# rpm -q lvm2
lvm2-2.02.01-1.3.RHEL4
[root@link-08 bin]# rpm -q device-mapper
device-mapper-1.02.02-3.0.RHEL4


Comment 13 Alasdair Kergon 2006-01-19 17:57:25 UTC
So one or other of the kernel patches fixed it after the lvm2 packages were also
fixed.  Closing.