152162 – LVM snapshots over md raid1 cause corruption

Bug 152162 - LVM snapshots over md raid1 cause corruption

Summary: LVM snapshots over md raid1 cause corruption

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Alasdair Kergon
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	164696 (view as bug list)
Depends On:
Blocks:	156322
TreeView+	depends on / blocked

Reported:	2005-03-25 14:51 UTC by Chris Adams
Modified:	2007-11-30 22:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHSA-2005-514
Clone Of:
Environment:
Last Closed:	2005-10-05 12:53:37 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
lvm command output (12.60 KB, text/plain) 2005-03-29 20:08 UTC, Chris Adams	no flags	Details
script to demonstrate the problem (3.10 KB, text/plain) 2005-06-21 22:44 UTC, Chris Adams	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2005:514	0	qe-ready	SHIPPED_LIVE	Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 2	2005-10-05 04:00:00 UTC

Description Chris Adams 2005-03-25 14:51:27 UTC

I use LVM snapshots as part of my backup process (to get "moment in time"
backups).  I just set up a new FC3 (with all updates) system, and when I create
a snapshot, I start getting random corruption of reads from the "real"
(non-snapshot) filesystem.

For example, I created a snapshot of /usr and mounted it at /backup/usr.  Once I
did that, I tried to just dump a copy of /backup/usr to /dev/null with:

find /backup/usr | time cpio -oa -H crc > /dev/null

I immediately got an error that "time" was an invalid binary.  I removed the
snapshot and rebooted, and everything is okay (it seems that files buffered,
such as /usr/sbin/lvm, continue to work).

The system has two SATA drives (ata_piix driver), mirrored with Linux software
RAID, and the LV sitting on top of /dev/md1.

When I had this happen with the root filesystem, the system eventually crashed;
I was unable to remove the snapshot (umount wouldn't run).  Also, the system
wouldn't boot with a snapshot of /; it paniced trying to run init (I checked and
dm-snapshot.ko is included in the initramfs and appears to be loading okay).  I
had to boot a rescue CD and remove the snapshot of / (and then it booted just fine).

Comment 1 Chris Adams 2005-03-25 17:08:36 UTC

Just to rule out the common things, I booted a non-SMP kernel as well as the
latest kernel in updates-testing; both give the same result.

The snapshot doesn't have to be mounted either; as soon as I "lvcreate", things
go bad.  Removing the snapshot doesn't fix things (probably because the bad data
is buffered); I have to reboot.

Comment 2 Alasdair Kergon 2005-03-29 18:15:18 UTC

  Snapshots of / etc. aren't supported properly yet: you might have more success
if you mount noatime.   [e.g. kernel blocks updating atime on hotplug binary]

  What version of device-mapper/lvm2 packages are you using?  Using the most
up-to-date ones (1.01.01 / 2.01.08) might help:  some older versions can get
confused with md.  [Do not have md_component_detection set to 0 in lvm.conf]

Comment 3 Chris Adams 2005-03-29 18:23:35 UTC

It isn't just /; if I lvcreate a snapshot of /usr, I have the same problem.

I'm running the latest FC3 versions of device-mapper (1.00.19-2) and lvm2
(2.00.25-1.01).  Will the versions from rawhide apply cleanly to an otherwise
FC3 system or will they need a rebuild?

Does RHEL4 have the same problem with root snapshots?  My RHEL servers are all
still running RHEL3 but I was looking at RHEL4 for another new server.

Comment 4 Alasdair Kergon 2005-03-29 18:42:10 UTC

Not the *same* problem - I'm suggesting you're seeing *different* problems:-)

Rawhide versions should be fine on FC3.

RHEL4 suffers from similar problems.

Comment 5 Chris Adams 2005-03-29 18:56:16 UTC

Okay, I see the exact same _symptoms_ whether I snapshot / or /usr; binaries
that have not been run (i.e. not buffered) prior to making the snapshot won't
run after making the snapshot until the system is rebooted.

I tried installing the rawhide device-mapper and lvm2 packages on FC3 and got a
long train of dependencies (it looks like it starts with the fact that
device-mapper now links with libreadline).  Also, since the libdevmapper.so
library changed version, I can't build RPMs without a scratch system (I can
build device-mapper-1.01.00-1.1, but I can't install it because of dependencies
from the older lvm2, but I can't build the newer lvm2 until the newer
device-mapper is installed).

Comment 6 Alasdair Kergon 2005-03-29 19:03:25 UTC

Hmmm.  You should just need the new libdevmapper.h file from the dm package in
order to build the lvm2 package the first time.  [then after installing them
both together, rebuild with new library]

Comment 7 Alasdair Kergon 2005-03-29 19:07:12 UTC

No, runtime linking may still fail; maybe force install your newly-built dm
package but make sure that it doesn't delete libdevmapper.so.1.00 (or put it
back afterwards)

Comment 8 Chris Adams 2005-03-29 19:46:43 UTC

Okay, I got them built and installed; no change.  I rebuilt the initrd (to make
sure the new lvm was used to set things up) and rebooted, but as soon as I
create the snapshot of /usr, attempting to run a binary from /usr/bin (that
hadn't been touched since boot) failed.

Comment 9 Alasdair Kergon 2005-03-29 19:50:55 UTC

If the /usr LV is getting activated by the initrd rather than initscripts, then
you may need to run mkinird to build a new initrd with the updated lvm2 version.

Comment 10 Alasdair Kergon 2005-03-29 19:53:28 UTC

Next thing then will be to post the output of 'pvs -v' 'vgs -v' and 'lvs -v' here,
and also the (long) output of the problematic lvcreate with '-vvvv' added to the
command.

Comment 11 Alasdair Kergon 2005-03-29 19:56:38 UTC

And what filesystem (ext3?) mounted with what options?

Comment 12 Chris Adams 2005-03-29 20:08:56 UTC

Created attachment 112428 [details]
lvm command output

Comment 13 Chris Adams 2005-03-29 20:10:01 UTC

Read what I wrote - I did rebuild the initrd just to make sure.

The corruption appears to be on block boundaries.  I just took a snapshot of
/usr (after a reboot) and then copied /usr/share/doc/fedora-release-3/GPL from
/usr to /tmp/GPL-usr-after-snap.  I mounted the snapshot on /mnt and copied
/mnt/share/doc/fedora-release-3/GPL to /tmp/GPL-snap.  Then I removed the
snapshot and rebooted.

After a clean boot, I copied /usr/share/doc/fedora-release-3/GPL to /tmp/GPL-usr
and compared.  The two post-snapshot versions chop off the first 16384 bytes and
add garbage to the end (that is different between the two files).  I've tarred
up the three versions and put them at:

http://hiwaay.net/~cmadams/misc/snapshot-GPL.tar.gz

if you are interested.

The filesystems are all ext3 with default options, except that /usr is mounted
read only.

Command output attached.

Comment 14 Alasdair Kergon 2005-03-29 20:15:17 UTC

Hardware: all   -  What platform?

Comment 15 Alasdair Kergon 2005-03-29 20:22:48 UTC

What is the version of the newest kernel you tried from updates-testing?

Comment 16 Chris Adams 2005-03-29 20:29:15 UTC

Penguin Computing Relion 1Z server
Supermicro P4SC8 motherboard
Intel Pentium 4 2.8 GHz CPU (HyperThreading enabled)
512MB RAM
Western Digital 80G SATA hard drives (two)
US Robotics v.92 PCI modem (model 5610)

The modem is the only add-in card; everything else is integrated in the motherboard:
dual Intel gigabit ethernet interfaces
ATI Rage XL video
Intel 6300ESB SATA Storage Controller

The system is running headless with a serial console (so no X).

I tried 2.6.11-1.7_FC3 (SMP).  It didn't make any difference so I switched back
to the latest released kernel, 2.6.10-1.770_FC3 (SMP).

Comment 17 Alasdair Kergon 2005-03-29 20:32:19 UTC

What you've posted and tried so far has eliminated lots of known issues.

To simplify your testing, I suggest you try to reproduce the problem directly on
a block device i.e. see if you can eliminate the filesystem from the problem.

e.g. Create new LV, tar something onto it directly; blockdev --flushbufs;
snapshot it; see if you can reproduce the corruption.

There were some dm/md issues a month or two ago (causing oopses rather than
corruption) - I'll check if they got fixed in that kernel.

Comment 18 Chris Adams 2005-03-29 20:56:45 UTC

Okay, I created a 100M LV "test".  I tarred /boot onto it, did a blockdev
--flushbufs, and created a snapshot.  I did a tar -xf under a temp directory (no
errors) and diffed it against /boot (no difference).  Just to make sure, I
removed the snapshot, rebooted, recreated the snapshot, and compared - again, no
problem.

So, it would seem to be filesystem related somehow.  I tried to create a little
ext2 (instead of ext3) filesystem and test, and after I create the snapshot and
try to diff, I get from the kernel:

attempt to access beyond end of device
dm-6: rw=0, want=317711250, limit=262144
Buffer I/O error on device dm-6, logical block 158855624
attempt to access beyond end of device
dm-6: rw=0, want=317711282, limit=262144
Buffer I/O error on device dm-6, logical block 158855640
attempt to access beyond end of device
dm-6: rw=0, want=317711146, limit=262144
Buffer I/O error on device dm-6, logical block 158855572
attempt to access beyond end of device
dm-6: rw=0, want=317711250, limit=262144
Buffer I/O error on device dm-6, logical block 158855624
attempt to access beyond end of device
dm-6: rw=0, want=317711250, limit=262144
Buffer I/O error on device dm-6, logical block 158855624

Comment 19 Chris Adams 2005-03-30 15:01:30 UTC

I decided to try some additional filesystems.

With XFS, JFS, and reiserfs, I got similar results to ext3 (corrupted file
reads, apparently on block boundaries).  In each case, a filesystem was created
with the default options of mkfs.<fstype> (I'm not really familiar with any of
them so I just assumed the defaults were okay).

I then tried VFAT.  I had to create a filesystem in a file and then dd it to the
device (mkfs.vfat doesn't understand LVM :) ).  It appears to work fine; I can
copy files to it, umount it, flush the device, remount it, snapshot it, and
everything still diffs with no changes.

Comment 20 Chris Adams 2005-03-30 15:30:12 UTC

One other thing: thinking about this, it does seem likely that it is something
at the filesystem layer.  I have no problem going through the directories (I can
"find <mnt> -print" with no errors); the only problem comes in when I try to
read files.  If the problem was just in the LVM or DM layer, directories would
be corrupted as well.

Comment 21 Alasdair Kergon 2005-03-30 15:32:48 UTC

Other things - which tests above had md involvement?
ie was everything on top of md, or were some with/some without?

Comment 22 Chris Adams 2005-03-30 15:34:38 UTC

All tests were on top of md.  The drives have 2 partitions each: one for /boot
and one for an LVM PV.  The partitions are then mirrored between the drives;
/boot sits on /dev/md0 and /dev/md1 is the LVM PV.

Comment 23 Alasdair Kergon 2005-03-30 15:49:49 UTC

Can you reproduce any of the problems without md?

Comment 24 Chris Adams 2005-03-30 15:51:16 UTC

Okay, I broke the /boot mirror (removed the second drive) and created a new
volume group with that partition as its only PV.  I don't seem to have any
problem with snapshots on the non-md device.

I then created a RAID1 device (with one drive "missing") out of that partition
and did the same tests; I got the failure I saw before.  The problem does appear
to be related to LVM on top of MD.

Comment 25 Alasdair Kergon 2005-03-30 16:09:39 UTC

Is there any chance you can test this with the latest Linus kernel to see if the
problem is still there?

Comment 26 Chris Adams 2005-03-30 16:22:34 UTC

Do you mean 2.6.12-rc1 or a bk snapshot (if a bk snap, I don't remember where to
fetch those from, so please send me a URL)?

Comment 27 Alasdair Kergon 2005-03-30 16:47:14 UTC

try 2.6.12-rc1

Comment 28 Chris Adams 2005-03-30 21:07:51 UTC

Same problem with 2.6.12-rc1.

Comment 29 Alasdair Kergon 2005-03-31 14:02:31 UTC

Asking on dm-devel mailing list if anyone else has tried this md raid1/snapshot
combination recently.

Comment 30 Chris Adams 2005-03-31 15:06:29 UTC

FYI: I tried the initial FC3 release kernel, 2.6.9-1.667, and it worked with no
corruption.

Comment 31 Chris Adams 2005-06-21 22:43:58 UTC

Any update on this?  I tried again with 2.6.11-1.27_FC3 and it still fails.

I wrote a script to demonstrate the problem; just run it and specify an unused
partition as an option.

Comment 32 Chris Adams 2005-06-21 22:44:35 UTC

Created attachment 115787 [details]
script to demonstrate the problem

Comment 33 Dave Jones 2005-07-15 20:12:58 UTC

An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 34 Gino LV. Ledesma 2005-07-15 21:43:43 UTC

As an addition, this bug also affects RHEL 4.x (Centos 4.0/4.1).

I updated to the suggested kernel (kernel-2.6.12-1.1372_FC3) and the problem
still persists. In my case, a simple matter of doing lvcreate -L 512M -s -n test
/dev/VolGroup01/LogVolRoot is enough to cause the /bin and /usr/bin applications
to be corrupted.

To replicate this scenario much faster, I set it up on a VMWare host. It seems
that after the snapshot is created, attempts to access the disk cause a lot of
disk I/O for one of the disks in the cluster (I have 2 virtual disks to form
RAID 1). Eventually, the drive is reset (hda: dma_timer_expiry: dma status ==
0x21; timeout error). This doesn't happen on a real host though (disk access
much faster).

I don't see anything odd/strange in dmesg or syslog, but none of the apps are
usable.

Comment 35 Chris Adams 2005-07-16 02:59:20 UTC

This is still a problem.  Output from my test script:

# ./raid-lvm-corrupt.sh /dev/sda9
Using RAID device /dev/md0
Using volume group lvtest0
This will DESTROY the contents of /dev/sda9
Are you sure you want to continue? (y/N) y

Creating RAID device
Creating LVM
Creating filesystem
Mounting and testing
---> Compare failed; LVM/FS corruption
Unmounting and testing
Non-snapshot compare is okay
Cleaning up

Comment 36 Gino LV. Ledesma 2005-07-21 01:09:29 UTC

I modified the shell script to do a diff before the snapshot is made and it
seems that the results are cached:

diff -ur /boot /lvmtest <-- ok
lvm -s -n ...
diff -ur /boot /lvmtest <-- passes (cached from previous diff); but in reality bad

Initially, giving it the entire disk (/dev/hdc) instead of a partition "seems to
work" but problems creep up later (after reboot, for example).

Comment 37 Chris Adams 2005-07-24 00:37:20 UTC

I did some testing with Linus kernels, and it appears to be something in
patch-2.6.11-rc2.bz2.  I'm tracking through the 2.6.11-rc1-bk* patches now.

Comment 38 Alasdair Kergon 2005-07-24 01:03:34 UTC

Patches that might be related, suggesting areas of code to check (as well as dm):

# ChangeSet                                                                    
                                                     
#   2005/02/11 20:42:01-08:00 dmo                                     
                                                     
#   [PATCH] raid5 and raid6 fixes to current bk tree                           
                                                     
#                                                                              
                                                     
#   This fixes the raid5 and raid6 prolems that crept in with the recent       
                                                     
#   introduction of "bi_max_vecs".                                             
                                                     
#                                                                              
                                                     
#   Since raid5/raid6 allocate their own bio's, they need to make sure         
                                                     
#   that bi_max_vecs is updated along with bi_vcnt.                            
                                                     


cfq changes e.g. http://linux.bkbits.net:8080/linux-2.6/cset@1.3192.9.1 fixed a
read-ahead bug - maybe still other bugs here [which I/O scheduler are you using?]

Also try some recent dm.c patches I sent to linux-kernel, not all yet upstream
(but in latest -mm).

Subject: [PATCH] device-mapper: [1/4] Fix deadlocks in core                    
                                                     
(3 of them, not 4)
plus
[PATCH] device-mapper: Fix target suspension oops.


And for good measure
Subject: [PATCH] device-mapper snapshots: Handle origin extension
Subject: [PATCH] device-mapper: Fix dm_swap_table error cases

Comment 39 Alasdair Kergon 2005-07-24 01:07:20 UTC

Gino, which filesystem(s)?

Comment 40 Chris Adams 2005-07-24 02:49:49 UTC

I think the problem appeared with patch-2.6.11-rc1-bk3.

I'm using whatever the default I/O scheduler is (I'm not changing anything).

I'll try the latest -mm.

Comment 41 Chris Adams 2005-07-24 03:33:11 UTC

2.6.13-rc3-mm1 still fails.

Comment 42 John Marco 2005-07-27 22:34:55 UTC

About two months ago, I had a similar meltdown using EVMS 2.5.1.
My root volume was an LVM2 volume on a MD Raid 1 mirror.  When I
snapshotted the root volume and tried copy the FS on the mounted 
snapshot, the copy frozed part way through, and many/most files
on my real root volume became corrupted or missing.

This was on a Gentoo 2.6.11-gentoo-r6, I think.  After reboot
the root volume was so messed up, I had to blow it away and reinstall
it.  Fortunately, there wasn't any critical data on it yet. 
When I looked at the smoldering remains from the Gentoo LiveCD,
evmsn showed the root snapshot, but in the details It looked as if
the backing store was somehow set to the root volume and not the 
snapshot region I created for backing store.  I wonder if there could
be a problem with the LVM2 utilities/libraries that is causing snapshot
creation to get confused and to accidentally use the _original_
volume's region as backing store.

One other note: I have two production servers running at work using
2.6.10-hardened and EVMS 2.5.1 with no problems.  One is using MD
for Raid 1+0, and the other using MD for Raid 1.  Both have nightly
backup scripts that automatically snapshot the root and other volumes
with no corruption.  I suspect the problems surfaced somewhere in 2.6.11.

Comment 43 Alasdair Kergon 2005-07-28 16:05:37 UTC

http://bugzilla.kernel.org/show_bug.cgi?id=4946 ?

> --- devel/fs/bio.c~bio_clone-fix      2005-07-28 00:39:40.000000000 -0700
> +++ devel-akpm/fs/bio.c       2005-07-28 01:02:34.000000000 -0700
> @@ -261,6 +261,7 @@ inline void __bio_clone(struct bio *bio,
>        */
>       bio->bi_vcnt = bio_src->bi_vcnt;
>       bio->bi_size = bio_src->bi_size;
> +     bio->bi_idx = bio_src->bi_idx;
>       bio_phys_segments(q, bio);
>       bio_hw_segments(q, bio);
>  }

Comment 44 Chris Adams 2005-07-28 18:21:05 UTC

I think this fixes it for me.  My script runs and passes now.

Comment 45 Gino LV. Ledesma 2005-07-28 21:44:18 UTC

This patch works for me, too. I got the 2.6.12-1372FC3 kernel and applied above patch. I am able to 
execute the lvsnaptest.sh script and my own snapshot tests involving / and /home/db (for MySQL) are 
working fine. :-)

Thanks a lot!

Comment 46 Gino LV. Ledesma 2005-07-28 21:44:54 UTC

This patch works for me, too. I got the 2.6.12-1372FC3 kernel and applied above patch. I am able to 
execute the lvsnaptest.sh script and my own snapshot tests involving / and /home/db (for MySQL) are 
working fine. :-)

Thanks a lot!

Comment 47 Bastien Nocera 2005-08-05 10:31:42 UTC

*** Bug 164696 has been marked as a duplicate of this bug. ***

Comment 50 Jason Baron 2005-08-09 18:49:11 UTC

*** Bug 164696 has been marked as a duplicate of this bug. ***

Comment 51 Chris Adams 2005-08-10 03:37:28 UTC

The 2.6.12-1.1373_FC3 in updates-testing fixes this problem for me.

Comment 54 Red Hat Bugzilla 2005-10-05 12:53:38 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-514.html

Note You need to log in before you can comment on or make changes to this bug.