Red Hat Bugzilla – Bug 157629
kernel: dm snapshot oops when using 1024k chunksize
Last modified: 2008-03-17 12:18:50 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1
Description of problem:
I've created my volumes with lvm1 and converted them to lvm2 metadata (I've rebooted after that). When I try to create a snapshot of any filesystem (root and others) the kernel oops. This behaviour can be reproduced.
I've a (bad) screenshot at http://www.flickr.com/photos/moongate_moblog/13583066/
Version-Release number of selected component (if applicable):
kernel-2.6.11-1.1290_FC4, lvm2-2.01.08-2.1, device-mapper-1.01.01-1.0
Steps to Reproduce:
1. setup lvm1 with kernel 2.4 and convert the metadata to lvm2 and reboot (I assume that this also happens with a plain lvm2 setup but that was not tested up to now)
2. create a snapshot: lvcreate -L 1024M -c 1024k -s -n srv-snapshot /dev/storage/srv
Actual Results: The kernel oops (see http://www.flickr.com/photos/moongate_moblog/13583066/) and the lvm metadata gets corrupted. Now lvm crashes now on every boot and causes another kernel panic.
Expected Results: A snapshot should be created or some error should be reported.
(I can reproduce this, so I can post a better/hi-res screenshot on demand)
There seems to be a similar bug reported on Kerneltrap
(http://kerneltrap.org/node/4458). However the filesystem in this bug is XFS
where I'm using ext3fs.
I've found another bugreport
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152162) that describes lvm
problems in combination with md raid1. Chris finaly installed kernel-2.6.9-1.667
which solved his problems. After booting into 2.6.9-1.667-i686, lvcreate
segfaulted (no kernel panic) and umounting failed on shutdown. However the
effect on this scenario was quite similar: the volumes got corrupted and further
reboots end with kernel oops.
Please try to capture the full details of the oops in readable form.
Also confirm the setup:
No software raid - just lvm?
The snapshot is of a data volume, not a system partition like / or /var ?
And how much memory in machine, what architecture, how big is the filesystem etc. ?
I've uploaded another screenshot of better quality to
I've installed the latest kernel therefor which is kernel.i686 2.6.11-1.1312_FC4.
The bug is similar; however I've figured out that this is not a total freeze. I
can type unless there is a disk access.
The software setup is as follows: hda1=/boot, hda2=lvm (one single vg called
'storage', no softraid/md/crypto/whatever). The volume group is splited into the
logical volumes /root,/usr,/var,/home,/srv and swap. /srv was snapshoted with an
initial size of 4gigs and a snapshot size of 2gigs. The volume usage was less
than 5 percent and no measurable IOs occured. The oops appeared immediately.
I've to mention again that the lvm was created as lvm1 and then converted with
vgconvert to lvm2.
I've done this on two separate machines.
- Pentium III w/ 500 MHz, 256megs RAM and 250gig ATA disk
(this is the machine that I've token the screenshot from)
- Pentium 4 w/ 3 GHz, 1gig RAM and 120gig ATA disk
Both machines are tested and operate well. I would exclude any hardware faults.
Hmm, still no response. Do I have to reopen this bugreport withing another release?
From looking at the oops, this might (a) be something fixed in 2.6.13-rc4
upstream or (b) another manifestation of bug 132057.
So try 2.6.13-rc4; if it still fails you'll have to wait for the larger-scale
snapshot changes to be made, I'm afraid.
Well, I've done some testing on SLES9. This distro suffers from the same
problems. Furthermore, I've tested against Linux 126.96.36.199 - no change so far.
For the archive:
Finally, I've found out that the crash occurs on all tested kernels
(2.6.5-sles9, 2.6.12-fc4, 2.6.11-fc4-betas and 188.8.131.52-vanilla) as long as I
use a chunk size of 1024k (which is max). I've successfully created a snapshot
on 2.6.5-sles9 using a chunk size of 512k and using default chunk size. I assume
an out of bounds error .
This report targets the FC3 or FC4 products, which have now been EOL'd.
Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?
Fedora Core 3 and Fedora Core 4 are no longer supported. If you could retest
this issue on a current release or on the latest development / test version, we
would appreciate that. Otherwise, this bug will be marked as CANTFIX one month
from now. Thanks for your help and for your patience.
Fedora Core 4 is no longer maintained.
Setting status to "INSUFFICIENT_DATA". If you can reproduce this bug in the
current Fedora release, please reopen this bug and assign it to the
corresponding Fedora version.
This bug was fixed in newer kernels; agk provided a patch that adressed this
issue, it is available at