157629 – kernel: dm snapshot oops when using 1024k chunksize

Bug 157629 - kernel: dm snapshot oops when using 1024k chunksize

Summary: kernel: dm snapshot oops when using 1024k chunksize

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	device-mapper-obsolete
Sub Component:
Version:	4
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Alasdair Kergon
QA Contact:
Docs Contact:
URL:	http://www.dsb.net/~schweizer/kernel2...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-05-13 08:38 UTC by Benjamin Schweizer
Modified:	2008-03-17 16:18 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-03-17 16:18:50 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Benjamin Schweizer 2005-05-13 08:38:06 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1

Description of problem:
I've created my volumes with lvm1 and converted them to lvm2 metadata (I've rebooted after that). When I try to create a snapshot of any filesystem (root and others) the kernel oops. This behaviour can be reproduced.

I've a (bad) screenshot at http://www.flickr.com/photos/moongate_moblog/13583066/

Version-Release number of selected component (if applicable):
kernel-2.6.11-1.1290_FC4, lvm2-2.01.08-2.1, device-mapper-1.01.01-1.0

How reproducible:
Always

Steps to Reproduce:
1. setup lvm1 with kernel 2.4 and convert the metadata to lvm2 and reboot (I assume that this also happens with a plain lvm2 setup but that was not tested up to now)
2. create a snapshot: lvcreate -L 1024M -c 1024k -s -n srv-snapshot /dev/storage/srv
  

Actual Results:  The kernel oops (see http://www.flickr.com/photos/moongate_moblog/13583066/) and the lvm metadata gets corrupted. Now lvm crashes now on every boot and causes another kernel panic.

Expected Results:  A snapshot should be created or some error should be reported.

Additional info:

http://www.flickr.com/photos/moongate_moblog/13583066/
(I can reproduce this, so I can post a better/hi-res screenshot on demand)

Comment 1 Benjamin Schweizer 2005-05-13 08:46:18 UTC

There seems to be a similar bug reported on Kerneltrap
(http://kerneltrap.org/node/4458). However the filesystem in this bug is XFS
where I'm using ext3fs.

Comment 2 Benjamin Schweizer 2005-05-13 13:37:41 UTC

I've found another bugreport
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152162) that describes lvm
problems in combination with md raid1. Chris finaly installed kernel-2.6.9-1.667
which solved his problems. After booting into 2.6.9-1.667-i686, lvcreate
segfaulted (no kernel panic) and umounting failed on shutdown. However the
effect on this scenario was quite similar: the volumes got corrupted and further
reboots end with kernel oops.

Comment 3 Alasdair Kergon 2005-05-13 19:02:01 UTC

Please try to capture the full details of the oops in readable form.

Also confirm the setup:
  No software raid - just lvm?
  The snapshot is of a data volume, not a system partition like / or /var ?

Comment 4 Alasdair Kergon 2005-05-13 19:03:22 UTC

And how much memory in machine, what architecture, how big is the filesystem etc. ?

Comment 5 Benjamin Schweizer 2005-05-18 16:14:21 UTC

Hello Alasdair,

I've uploaded another screenshot of better quality to
http://www.dsb.net/~schweizer/kernel26-lvm2-snapshot.jpg
I've installed the latest kernel therefor which is kernel.i686 2.6.11-1.1312_FC4. 
The bug is similar; however I've figured out that this is not a total freeze. I
can type unless there is a disk access.

The software setup is as follows: hda1=/boot, hda2=lvm (one single vg called
'storage', no softraid/md/crypto/whatever). The volume group is splited into the
logical volumes /root,/usr,/var,/home,/srv and swap. /srv was snapshoted with an
initial size of 4gigs and a snapshot size of 2gigs. The volume usage was less
than 5 percent and no measurable IOs occured. The oops appeared immediately.
I've to mention again that the lvm was created as lvm1 and then converted with
vgconvert to lvm2.

I've done this on two separate machines. 
- Pentium III w/ 500 MHz, 256megs RAM and 250gig ATA disk
(this is the machine that I've token the screenshot from)
- Pentium 4 w/ 3 GHz, 1gig RAM and 120gig ATA disk
Both machines are tested and operate well. I would exclude any hardware faults.

Comment 6 Benjamin Schweizer 2005-07-15 07:12:19 UTC

Hmm, still no response. Do I have to reopen this bugreport withing another release?

Comment 7 Alasdair Kergon 2005-07-29 20:33:15 UTC

From looking at the oops, this might (a) be something fixed in 2.6.13-rc4
upstream or (b) another manifestation of bug 132057.

So try 2.6.13-rc4; if it still fails you'll have to wait for the larger-scale
snapshot changes to be made, I'm afraid.

Comment 8 Benjamin Schweizer 2005-09-22 12:02:08 UTC

Well, I've done some testing on SLES9. This distro suffers from the same
problems. Furthermore, I've tested against Linux 2.6.13.2 - no change so far.

Comment 9 Benjamin Schweizer 2005-09-23 15:52:40 UTC

For the archive:
Finally, I've found out that the crash occurs on all tested kernels
(2.6.5-sles9, 2.6.12-fc4, 2.6.11-fc4-betas and 2.6.13.2-vanilla) as long as I
use a chunk size of 1024k (which is max). I've successfully created a snapshot
on 2.6.5-sles9 using a chunk size of 512k and using default chunk size. I assume
an out of bounds error .

Comment 10 Christian Iseli 2007-01-22 10:56:53 UTC

This report targets the FC3 or FC4 products, which have now been EOL'd.

Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?

Thanks.

Comment 11 Matthew Miller 2007-04-06 15:25:53 UTC

Fedora Core 3 and Fedora Core 4 are no longer supported. If you could retest
this issue on a current release or on the latest development / test version, we
would appreciate that. Otherwise, this bug will be marked as CANTFIX one month
from now. Thanks for your help and for your patience.

Comment 12 petrosyan 2008-02-16 02:51:21 UTC

Fedora Core 4 is no longer maintained.

Setting status to "INSUFFICIENT_DATA". If you can reproduce this bug in the
current Fedora release, please reopen this bug and assign it to the
corresponding Fedora version.

Comment 13 Benjamin Schweizer 2008-02-27 07:16:27 UTC

This bug was fixed in newer kernels; agk provided a patch that adressed this
issue, it is available at
ftp://ftp.sickos.org/pub/linux/linux2.6.14rc2-dm-snapshot-chunksize-fix.patch

Note You need to log in before you can comment on or make changes to this bug.