Bug 152162
Summary: | LVM snapshots over md raid1 cause corruption | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Chris Adams <linux> | ||||||
Component: | kernel | Assignee: | Alasdair Kergon <agk> | ||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3 | CC: | gledesma, jmarco, simon.matter, ville.lindfors | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHSA-2005-514 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-10-05 12:53:37 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 156322 | ||||||||
Attachments: |
|
Description
Chris Adams
2005-03-25 14:51:27 UTC
Just to rule out the common things, I booted a non-SMP kernel as well as the latest kernel in updates-testing; both give the same result. The snapshot doesn't have to be mounted either; as soon as I "lvcreate", things go bad. Removing the snapshot doesn't fix things (probably because the bad data is buffered); I have to reboot. Snapshots of / etc. aren't supported properly yet: you might have more success if you mount noatime. [e.g. kernel blocks updating atime on hotplug binary] What version of device-mapper/lvm2 packages are you using? Using the most up-to-date ones (1.01.01 / 2.01.08) might help: some older versions can get confused with md. [Do not have md_component_detection set to 0 in lvm.conf] It isn't just /; if I lvcreate a snapshot of /usr, I have the same problem. I'm running the latest FC3 versions of device-mapper (1.00.19-2) and lvm2 (2.00.25-1.01). Will the versions from rawhide apply cleanly to an otherwise FC3 system or will they need a rebuild? Does RHEL4 have the same problem with root snapshots? My RHEL servers are all still running RHEL3 but I was looking at RHEL4 for another new server. Not the *same* problem - I'm suggesting you're seeing *different* problems:-) Rawhide versions should be fine on FC3. RHEL4 suffers from similar problems. Okay, I see the exact same _symptoms_ whether I snapshot / or /usr; binaries that have not been run (i.e. not buffered) prior to making the snapshot won't run after making the snapshot until the system is rebooted. I tried installing the rawhide device-mapper and lvm2 packages on FC3 and got a long train of dependencies (it looks like it starts with the fact that device-mapper now links with libreadline). Also, since the libdevmapper.so library changed version, I can't build RPMs without a scratch system (I can build device-mapper-1.01.00-1.1, but I can't install it because of dependencies from the older lvm2, but I can't build the newer lvm2 until the newer device-mapper is installed). Hmmm. You should just need the new libdevmapper.h file from the dm package in order to build the lvm2 package the first time. [then after installing them both together, rebuild with new library] No, runtime linking may still fail; maybe force install your newly-built dm package but make sure that it doesn't delete libdevmapper.so.1.00 (or put it back afterwards) Okay, I got them built and installed; no change. I rebuilt the initrd (to make sure the new lvm was used to set things up) and rebooted, but as soon as I create the snapshot of /usr, attempting to run a binary from /usr/bin (that hadn't been touched since boot) failed. If the /usr LV is getting activated by the initrd rather than initscripts, then you may need to run mkinird to build a new initrd with the updated lvm2 version. Next thing then will be to post the output of 'pvs -v' 'vgs -v' and 'lvs -v' here, and also the (long) output of the problematic lvcreate with '-vvvv' added to the command. And what filesystem (ext3?) mounted with what options? Created attachment 112428 [details]
lvm command output
Read what I wrote - I did rebuild the initrd just to make sure. The corruption appears to be on block boundaries. I just took a snapshot of /usr (after a reboot) and then copied /usr/share/doc/fedora-release-3/GPL from /usr to /tmp/GPL-usr-after-snap. I mounted the snapshot on /mnt and copied /mnt/share/doc/fedora-release-3/GPL to /tmp/GPL-snap. Then I removed the snapshot and rebooted. After a clean boot, I copied /usr/share/doc/fedora-release-3/GPL to /tmp/GPL-usr and compared. The two post-snapshot versions chop off the first 16384 bytes and add garbage to the end (that is different between the two files). I've tarred up the three versions and put them at: http://hiwaay.net/~cmadams/misc/snapshot-GPL.tar.gz if you are interested. The filesystems are all ext3 with default options, except that /usr is mounted read only. Command output attached. Hardware: all - What platform? What is the version of the newest kernel you tried from updates-testing? Penguin Computing Relion 1Z server Supermicro P4SC8 motherboard Intel Pentium 4 2.8 GHz CPU (HyperThreading enabled) 512MB RAM Western Digital 80G SATA hard drives (two) US Robotics v.92 PCI modem (model 5610) The modem is the only add-in card; everything else is integrated in the motherboard: dual Intel gigabit ethernet interfaces ATI Rage XL video Intel 6300ESB SATA Storage Controller The system is running headless with a serial console (so no X). I tried 2.6.11-1.7_FC3 (SMP). It didn't make any difference so I switched back to the latest released kernel, 2.6.10-1.770_FC3 (SMP). What you've posted and tried so far has eliminated lots of known issues. To simplify your testing, I suggest you try to reproduce the problem directly on a block device i.e. see if you can eliminate the filesystem from the problem. e.g. Create new LV, tar something onto it directly; blockdev --flushbufs; snapshot it; see if you can reproduce the corruption. There were some dm/md issues a month or two ago (causing oopses rather than corruption) - I'll check if they got fixed in that kernel. Okay, I created a 100M LV "test". I tarred /boot onto it, did a blockdev --flushbufs, and created a snapshot. I did a tar -xf under a temp directory (no errors) and diffed it against /boot (no difference). Just to make sure, I removed the snapshot, rebooted, recreated the snapshot, and compared - again, no problem. So, it would seem to be filesystem related somehow. I tried to create a little ext2 (instead of ext3) filesystem and test, and after I create the snapshot and try to diff, I get from the kernel: attempt to access beyond end of device dm-6: rw=0, want=317711250, limit=262144 Buffer I/O error on device dm-6, logical block 158855624 attempt to access beyond end of device dm-6: rw=0, want=317711282, limit=262144 Buffer I/O error on device dm-6, logical block 158855640 attempt to access beyond end of device dm-6: rw=0, want=317711146, limit=262144 Buffer I/O error on device dm-6, logical block 158855572 attempt to access beyond end of device dm-6: rw=0, want=317711250, limit=262144 Buffer I/O error on device dm-6, logical block 158855624 attempt to access beyond end of device dm-6: rw=0, want=317711250, limit=262144 Buffer I/O error on device dm-6, logical block 158855624 I decided to try some additional filesystems. With XFS, JFS, and reiserfs, I got similar results to ext3 (corrupted file reads, apparently on block boundaries). In each case, a filesystem was created with the default options of mkfs.<fstype> (I'm not really familiar with any of them so I just assumed the defaults were okay). I then tried VFAT. I had to create a filesystem in a file and then dd it to the device (mkfs.vfat doesn't understand LVM :) ). It appears to work fine; I can copy files to it, umount it, flush the device, remount it, snapshot it, and everything still diffs with no changes. One other thing: thinking about this, it does seem likely that it is something at the filesystem layer. I have no problem going through the directories (I can "find <mnt> -print" with no errors); the only problem comes in when I try to read files. If the problem was just in the LVM or DM layer, directories would be corrupted as well. Other things - which tests above had md involvement? ie was everything on top of md, or were some with/some without? All tests were on top of md. The drives have 2 partitions each: one for /boot and one for an LVM PV. The partitions are then mirrored between the drives; /boot sits on /dev/md0 and /dev/md1 is the LVM PV. Can you reproduce any of the problems without md? Okay, I broke the /boot mirror (removed the second drive) and created a new volume group with that partition as its only PV. I don't seem to have any problem with snapshots on the non-md device. I then created a RAID1 device (with one drive "missing") out of that partition and did the same tests; I got the failure I saw before. The problem does appear to be related to LVM on top of MD. Is there any chance you can test this with the latest Linus kernel to see if the problem is still there? Do you mean 2.6.12-rc1 or a bk snapshot (if a bk snap, I don't remember where to fetch those from, so please send me a URL)? try 2.6.12-rc1 Same problem with 2.6.12-rc1. Asking on dm-devel mailing list if anyone else has tried this md raid1/snapshot combination recently. FYI: I tried the initial FC3 release kernel, 2.6.9-1.667, and it worked with no corruption. Any update on this? I tried again with 2.6.11-1.27_FC3 and it still fails. I wrote a script to demonstrate the problem; just run it and specify an unused partition as an option. Created attachment 115787 [details]
script to demonstrate the problem
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. As an addition, this bug also affects RHEL 4.x (Centos 4.0/4.1). I updated to the suggested kernel (kernel-2.6.12-1.1372_FC3) and the problem still persists. In my case, a simple matter of doing lvcreate -L 512M -s -n test /dev/VolGroup01/LogVolRoot is enough to cause the /bin and /usr/bin applications to be corrupted. To replicate this scenario much faster, I set it up on a VMWare host. It seems that after the snapshot is created, attempts to access the disk cause a lot of disk I/O for one of the disks in the cluster (I have 2 virtual disks to form RAID 1). Eventually, the drive is reset (hda: dma_timer_expiry: dma status == 0x21; timeout error). This doesn't happen on a real host though (disk access much faster). I don't see anything odd/strange in dmesg or syslog, but none of the apps are usable. This is still a problem. Output from my test script: # ./raid-lvm-corrupt.sh /dev/sda9 Using RAID device /dev/md0 Using volume group lvtest0 This will DESTROY the contents of /dev/sda9 Are you sure you want to continue? (y/N) y Creating RAID device Creating LVM Creating filesystem Mounting and testing ---> Compare failed; LVM/FS corruption Unmounting and testing Non-snapshot compare is okay Cleaning up I modified the shell script to do a diff before the snapshot is made and it seems that the results are cached: diff -ur /boot /lvmtest <-- ok lvm -s -n ... diff -ur /boot /lvmtest <-- passes (cached from previous diff); but in reality bad Initially, giving it the entire disk (/dev/hdc) instead of a partition "seems to work" but problems creep up later (after reboot, for example). I did some testing with Linus kernels, and it appears to be something in patch-2.6.11-rc2.bz2. I'm tracking through the 2.6.11-rc1-bk* patches now. Patches that might be related, suggesting areas of code to check (as well as dm): # ChangeSet # 2005/02/11 20:42:01-08:00 dmo # [PATCH] raid5 and raid6 fixes to current bk tree # # This fixes the raid5 and raid6 prolems that crept in with the recent # introduction of "bi_max_vecs". # # Since raid5/raid6 allocate their own bio's, they need to make sure # that bi_max_vecs is updated along with bi_vcnt. cfq changes e.g. http://linux.bkbits.net:8080/linux-2.6/cset@1.3192.9.1 fixed a read-ahead bug - maybe still other bugs here [which I/O scheduler are you using?] Also try some recent dm.c patches I sent to linux-kernel, not all yet upstream (but in latest -mm). Subject: [PATCH] device-mapper: [1/4] Fix deadlocks in core (3 of them, not 4) plus [PATCH] device-mapper: Fix target suspension oops. And for good measure Subject: [PATCH] device-mapper snapshots: Handle origin extension Subject: [PATCH] device-mapper: Fix dm_swap_table error cases Gino, which filesystem(s)? I think the problem appeared with patch-2.6.11-rc1-bk3. I'm using whatever the default I/O scheduler is (I'm not changing anything). I'll try the latest -mm. 2.6.13-rc3-mm1 still fails. About two months ago, I had a similar meltdown using EVMS 2.5.1. My root volume was an LVM2 volume on a MD Raid 1 mirror. When I snapshotted the root volume and tried copy the FS on the mounted snapshot, the copy frozed part way through, and many/most files on my real root volume became corrupted or missing. This was on a Gentoo 2.6.11-gentoo-r6, I think. After reboot the root volume was so messed up, I had to blow it away and reinstall it. Fortunately, there wasn't any critical data on it yet. When I looked at the smoldering remains from the Gentoo LiveCD, evmsn showed the root snapshot, but in the details It looked as if the backing store was somehow set to the root volume and not the snapshot region I created for backing store. I wonder if there could be a problem with the LVM2 utilities/libraries that is causing snapshot creation to get confused and to accidentally use the _original_ volume's region as backing store. One other note: I have two production servers running at work using 2.6.10-hardened and EVMS 2.5.1 with no problems. One is using MD for Raid 1+0, and the other using MD for Raid 1. Both have nightly backup scripts that automatically snapshot the root and other volumes with no corruption. I suspect the problems surfaced somewhere in 2.6.11. http://bugzilla.kernel.org/show_bug.cgi?id=4946 ? > --- devel/fs/bio.c~bio_clone-fix 2005-07-28 00:39:40.000000000 -0700 > +++ devel-akpm/fs/bio.c 2005-07-28 01:02:34.000000000 -0700 > @@ -261,6 +261,7 @@ inline void __bio_clone(struct bio *bio, > */ > bio->bi_vcnt = bio_src->bi_vcnt; > bio->bi_size = bio_src->bi_size; > + bio->bi_idx = bio_src->bi_idx; > bio_phys_segments(q, bio); > bio_hw_segments(q, bio); > } I think this fixes it for me. My script runs and passes now. This patch works for me, too. I got the 2.6.12-1372FC3 kernel and applied above patch. I am able to execute the lvsnaptest.sh script and my own snapshot tests involving / and /home/db (for MySQL) are working fine. :-) Thanks a lot! This patch works for me, too. I got the 2.6.12-1372FC3 kernel and applied above patch. I am able to execute the lvsnaptest.sh script and my own snapshot tests involving / and /home/db (for MySQL) are working fine. :-) Thanks a lot! *** Bug 164696 has been marked as a duplicate of this bug. *** *** Bug 164696 has been marked as a duplicate of this bug. *** The 2.6.12-1.1373_FC3 in updates-testing fixes this problem for me. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-514.html |