Bug 174742

Summary:	corrupted snapshot filesystems: Error reading/writing snapshot
Product:	Red Hat Enterprise Linux 4	Reporter:	Corey Marthaler <cmarthal>
Component:	lvm2	Assignee:	Alasdair Kergon <agk>
Status:	CLOSED NOTABUG	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.3	CC:	agk, jbrassow, kanderso, rkenna
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-01-25 19:11:41 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Corey Marthaler 2005-12-01 20:43:20 UTC

Description of problem:
Unlike bz 173209 where the origin fs becomes corrupt, here the 2nd snapshot of
that fs becomes corrupt

# snapper test output:

Creating origin volume
  Logical volume "origin" created

Placing an ext filesystem on origin volume
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
1310720 inodes, 2621440 blocks
131072 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2684354560
80 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Mounting origin volume

Writing files ABC to origin volume

Making 1st snapshot of origin volume
  Rounding up size to full physical extent 52.00 MB
  Logical volume "snap1" created

Mounting 1st snap volume

Checking files on 1st snap volume
/mnt/snap1/A
/mnt/snap1/B
/mnt/snap1/C

Writing files DEF to origin volume

Making 2nd snapshot of origin volume
  Rounding up size to full physical extent 52.00 MB
  Logical volume "snap2" created

Mounting 2nd snap volume
/dev/snapper/snap2: Input/output error
mount: /dev/snapper/snap2: can't read superblock
couldn't mount 2nd snap on /mnt/snap2


SYSLOG after 2nd snap mount attempt:
Dec  1 14:29:37 morph-01 kernel: attempt to access beyond end of device
Dec  1 14:29:37 morph-01 kernel: dm-6: rw=17, want=2736122528, limit=106496
Dec  1 14:29:37 morph-01 kernel: device-mapper: Error reading/writing snapshot
Dec  1 14:29:38 morph-01 kernel: device-mapper: snapshot is marked invalid
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 0
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 1
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 2
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 3
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 4
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 5
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 6
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 7
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 8
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 9
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 10
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 11
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 12
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 13
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 14
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 15
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 0
Dec  1 14:29:38 morph-01 kernel: Buffer I/O error on device dm-7, logical block 8
Dec  1 14:29:38 morph-01 kernel: FAT: unable to read boot sector


Version-Release number of selected component (if applicable):
device-mapper-1.02.01-1.0.RHEL4
lvm2-2.02.01-1.0.RHEL4


How reproducible:
I've only seen this twice now.

Comment 1 Alasdair Kergon 2005-12-12 19:56:19 UTC

I've just got an 'end of device' error after writing a single large piece of
data to the origin that was too big for the snapshot.

Comment 2 Alasdair Kergon 2005-12-12 21:08:40 UTC

But haven't succeeded in reproducing it:-(

Comment 3 Jonathan Earl Brassow 2005-12-14 01:16:01 UTC

See comments 17 & 18 of bug 174636

I've been unable to reproduce this on kernel 2.6.9-25.EL

Comment 4 Alasdair Kergon 2006-01-13 20:34:02 UTC

Is this reproducible enough that you can leave a test running for a certain
period of time and be fairly sure you'll see it happen?

Are the snapshots being mounted read-write?
If so, can you change the mount options to mount them *read only* and see if you
can still get it to go wrong?
Try with the 1st read-write like now and the 2nd read-only.
Then with both read-only.

Comment 5 Corey Marthaler 2006-01-18 21:22:35 UTC

I'm still seeing the filesystem corruption when running the test snapper on the
lastest kern/rpms. Though, not necessarily after the second snapshot attempt but
after creating a few snaps, and then recreating the origin and then attempting
to write to the snaps. 

[root@link-08 ~]# uname -ar
Linux link-08 2.6.9-28.ELsmp #1 SMP Fri Jan 13 17:08:22 EST 2006 x86_64 x86_64
x86_64 GNU/Linux
[root@link-08 ~]# rpm -q device-mapper
device-mapper-1.02.02-3.0.RHEL4
[root@link-08 ~]# rpm -q lvm2
lvm2-2.02.01-1.3.RHEL4
[root@link-08 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.01-1.2.RHEL4


The snapshots are obviously being mounted read-write and are able to be written
to before the failures start.

/dev/mapper/snapper-snap1 on /mnt/snap1 type ext3 (rw)
/dev/mapper/snapper-snap2 on /mnt/snap2 type ext3 (rw)
/dev/mapper/snapper-origin on /mnt/origin type ext3 (rw)
/dev/mapper/snapper-snap3 on /mnt/snap3 type ext3 (rw)

I can try this read-only like you suggest and then see if I get read errors...

Comment 6 Alasdair Kergon 2006-01-19 18:06:26 UTC

Yes, please try read-only.

We need to try to narrow down what could be happening.  Try adding 'sync' or
'sleep' at multiple places in the test and see if it goes away.

How many CPUs in the test machine?

Can you attach the actual sequence of commands you're running when you see the
problem?  Does everything run in a single sequence, or are multiple commands
issued in parallel?

Comment 7 Alasdair Kergon 2006-01-19 18:45:11 UTC

(I'm wondering if this could be the first reproducible test for bug 175830 - but
need to find out more of its characteristics first - there are still other
simpler things that might be wrong giving similar symptoms.)

Comment 8 Corey Marthaler 2006-01-23 20:35:08 UTC

I've got this narrowed down to re'mkfsing a snapshoted origin with a mounted
shapshot. This will work whetheror not the snapshot is mounted read only or read
write.

A. Write to an origin fs
B. Snapshot the origin
C. Mount the snapshot fs
C. Remake the origin fs
D. Attempt to write to the snapshot (if rw)
or
D. Attempt to umount and then remount the snapshot (if ro)

[the snapshot filesystem will them be corrupt]

Comment 10 Alasdair Kergon 2006-01-25 15:53:07 UTC

I've still not managed to reproduce this yet.

Can you post the actual script you're using, as I had to assume the order of the
steps you didn't list.  (eg Can't do the second C without unmounting the origin
first - but at which point do you do that unmount?)

Comment 11 Alasdair Kergon 2006-01-25 15:55:32 UTC

Also try repeating the test with 'sync; sleep 10' inserted between every command.

Comment 12 Alasdair Kergon 2006-01-25 16:13:19 UTC

Aha!  By changing the placement of the umount I've found something that goes
wrong...

Comment 13 Alasdair Kergon 2006-01-25 16:33:05 UTC

But only the "unchecked fs" warnings on bug 178705 - not identified any actual
snapshot corruption yet.

Comment 14 Alasdair Kergon 2006-01-25 16:40:49 UTC

And I don't see the warning with ext3.

Comment 15 Alasdair Kergon 2006-01-25 17:23:40 UTC

As well the sync;sleep test, can you reproduce with ext3 with a journal rather
than ext2?

Comment 16 Corey Marthaler 2006-01-25 17:37:36 UTC

I can't try anything right now as my machines are dedicated for regression
testing of RHEL4U3. 

I was also able to hit this by mkfs'ing an *unmounted* snapshot. So here are the
more detailed steps I did to reproduce this every time.


A. Write to an origin fs
# echo "AAAAAAAAAAAAAAAAAAAAAAAAAAAA" > /mnt/origin
# echo "BBBBBBBBBBBBBBBBBBBBBBBBBBBB" > /mnt/origin
# echo "CCCCCCCCCCCCCCCCCCCCCCCCCCCC" > /mnt/origin

B. Snapshot the origin
# lvcreate -s /dev/snapper/origin -L 50M -n snap1

C. Mount the snapshot fs
# mount /dev/snapper/snap1 /mnt/snap1

D. Unmount origin
# umount /mnt/origin

E. Remake the origin fs
# mkfs /dev/snapper/origin

F. Choose one of the following:

F1. Attempt to write to the snapshot (if rw)
# echo "DDDDDDDDDDDDDDDDDDDDDDDD" > /mnt/snap1

or

F2. Attempt to umount and then remount the snapshot (if ro)
# umount /mnt/snap1
# mount /dev/snapper/snap1 /mnt/snap1

or 

F3. Mkfs a snapshot
# umount /mnt/snap1
# mkfs /dev/snapper/snap1

Comment 17 Alasdair Kergon 2006-01-25 17:43:20 UTC

# echo "AAAAAAAAAAAAAAAAAAAAAAAAAAAA" > /mnt/origin
# echo "BBBBBBBBBBBBBBBBBBBBBBBBBBBB" > /mnt/origin
# echo "CCCCCCCCCCCCCCCCCCCCCCCCCCCC" > /mnt/origin

/mnt/origin/A B and C ?

Comment 18 Alasdair Kergon 2006-01-25 17:44:56 UTC

With F1, F2 and F3 how does the corruption show up?
Do you have to run something else to see it, or do the commands or kernel give
errors?

Comment 19 Corey Marthaler 2006-01-25 17:51:32 UTC

yes. 

# echo "AAAAAAAAAAAAAAAAAAAAAAAAAAAA" > /mnt/origin/A

The corruption shows up just as mentioned in original bug report.
You wait like 1 - 3 seconds and then boom tons of I/O errors and ext errors in
the log and input/output errors on any read/write attempt of the snapshot.

Comment 20 Alasdair Kergon 2006-01-25 19:08:27 UTC

From "1310720 inodes, 2621440 blocks" I deduce the origin is about 10GB.  From
comment 16 the snapshot is 50MB.  The mkfs is making 160MB of changes to the
origin.  That doesn't fit into 50MB, so the snapshot simply gets dropped.

The fix here is to allocate more space for the snapshot in the test script.
Running out of space is a hazard when you're using snapshots - it's important to
monitor the amount of free disk space left (e.g. with 'lvs') and extend them if
necessary before they fill up.  This is mentioned in the HOWTO
http://www.tldp.org/HOWTO/LVM-HOWTO/snapshots_backup.html and elsewhere.

Maybe one day there'll be automatic handling of situations like this.
In the mean time, we can close this one - not a bug.