Bug 157629

Summary: kernel: dm snapshot oops when using 1024k chunksize
Product: [Fedora] Fedora Reporter: Benjamin Schweizer <mail>
Component: device-mapper-obsoleteAssignee: Alasdair Kergon <agk>
Status: CLOSED UPSTREAM QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: agk, dwysocha, mattdm, mbroz
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i686   
OS: Linux   
URL: http://www.dsb.net/~schweizer/kernel26-lvm2-snapshot.jpg
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-17 16:18:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Benjamin Schweizer 2005-05-13 08:38:06 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1

Description of problem:
I've created my volumes with lvm1 and converted them to lvm2 metadata (I've rebooted after that). When I try to create a snapshot of any filesystem (root and others) the kernel oops. This behaviour can be reproduced.

I've a (bad) screenshot at http://www.flickr.com/photos/moongate_moblog/13583066/

Version-Release number of selected component (if applicable):
kernel-2.6.11-1.1290_FC4, lvm2-2.01.08-2.1, device-mapper-1.01.01-1.0

How reproducible:
Always

Steps to Reproduce:
1. setup lvm1 with kernel 2.4 and convert the metadata to lvm2 and reboot (I assume that this also happens with a plain lvm2 setup but that was not tested up to now)
2. create a snapshot: lvcreate -L 1024M -c 1024k -s -n srv-snapshot /dev/storage/srv
  

Actual Results:  The kernel oops (see http://www.flickr.com/photos/moongate_moblog/13583066/) and the lvm metadata gets corrupted. Now lvm crashes now on every boot and causes another kernel panic.

Expected Results:  A snapshot should be created or some error should be reported.

Additional info:

http://www.flickr.com/photos/moongate_moblog/13583066/
(I can reproduce this, so I can post a better/hi-res screenshot on demand)

Comment 1 Benjamin Schweizer 2005-05-13 08:46:18 UTC
There seems to be a similar bug reported on Kerneltrap
(http://kerneltrap.org/node/4458). However the filesystem in this bug is XFS
where I'm using ext3fs. 

Comment 2 Benjamin Schweizer 2005-05-13 13:37:41 UTC
I've found another bugreport
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152162) that describes lvm
problems in combination with md raid1. Chris finaly installed kernel-2.6.9-1.667
which solved his problems. After booting into 2.6.9-1.667-i686, lvcreate
segfaulted (no kernel panic) and umounting failed on shutdown. However the
effect on this scenario was quite similar: the volumes got corrupted and further
reboots end with kernel oops.

Comment 3 Alasdair Kergon 2005-05-13 19:02:01 UTC
Please try to capture the full details of the oops in readable form.

Also confirm the setup:
  No software raid - just lvm?
  The snapshot is of a data volume, not a system partition like / or /var ?

Comment 4 Alasdair Kergon 2005-05-13 19:03:22 UTC
And how much memory in machine, what architecture, how big is the filesystem etc. ?

Comment 5 Benjamin Schweizer 2005-05-18 16:14:21 UTC
Hello Alasdair,

I've uploaded another screenshot of better quality to
http://www.dsb.net/~schweizer/kernel26-lvm2-snapshot.jpg
I've installed the latest kernel therefor which is kernel.i686 2.6.11-1.1312_FC4. 
The bug is similar; however I've figured out that this is not a total freeze. I
can type unless there is a disk access.

The software setup is as follows: hda1=/boot, hda2=lvm (one single vg called
'storage', no softraid/md/crypto/whatever). The volume group is splited into the
logical volumes /root,/usr,/var,/home,/srv and swap. /srv was snapshoted with an
initial size of 4gigs and a snapshot size of 2gigs. The volume usage was less
than 5 percent and no measurable IOs occured. The oops appeared immediately.
I've to mention again that the lvm was created as lvm1 and then converted with
vgconvert to lvm2.

I've done this on two separate machines. 
- Pentium III w/ 500 MHz, 256megs RAM and 250gig ATA disk
(this is the machine that I've token the screenshot from)
- Pentium 4 w/ 3 GHz, 1gig RAM and 120gig ATA disk
Both machines are tested and operate well. I would exclude any hardware faults.

Comment 6 Benjamin Schweizer 2005-07-15 07:12:19 UTC
Hmm, still no response. Do I have to reopen this bugreport withing another release?

Comment 7 Alasdair Kergon 2005-07-29 20:33:15 UTC
From looking at the oops, this might (a) be something fixed in 2.6.13-rc4
upstream or (b) another manifestation of bug 132057.

So try 2.6.13-rc4; if it still fails you'll have to wait for the larger-scale
snapshot changes to be made, I'm afraid.

Comment 8 Benjamin Schweizer 2005-09-22 12:02:08 UTC
Well, I've done some testing on SLES9. This distro suffers from the same
problems. Furthermore, I've tested against Linux 2.6.13.2 - no change so far.

Comment 9 Benjamin Schweizer 2005-09-23 15:52:40 UTC
For the archive:
Finally, I've found out that the crash occurs on all tested kernels
(2.6.5-sles9, 2.6.12-fc4, 2.6.11-fc4-betas and 2.6.13.2-vanilla) as long as I
use a chunk size of 1024k (which is max). I've successfully created a snapshot
on 2.6.5-sles9 using a chunk size of 512k and using default chunk size. I assume
an out of bounds error .

Comment 10 Christian Iseli 2007-01-22 10:56:53 UTC
This report targets the FC3 or FC4 products, which have now been EOL'd.

Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?

Thanks.

Comment 11 Matthew Miller 2007-04-06 15:25:53 UTC
Fedora Core 3 and Fedora Core 4 are no longer supported. If you could retest
this issue on a current release or on the latest development / test version, we
would appreciate that. Otherwise, this bug will be marked as CANTFIX one month
from now. Thanks for your help and for your patience.


Comment 12 petrosyan 2008-02-16 02:51:21 UTC
Fedora Core 4 is no longer maintained.

Setting status to "INSUFFICIENT_DATA". If you can reproduce this bug in the
current Fedora release, please reopen this bug and assign it to the
corresponding Fedora version.

Comment 13 Benjamin Schweizer 2008-02-27 07:16:27 UTC
This bug was fixed in newer kernels; agk provided a patch that adressed this
issue, it is available at
ftp://ftp.sickos.org/pub/linux/linux2.6.14rc2-dm-snapshot-chunksize-fix.patch