Red Hat Bugzilla – Bug 111735
Kernel oops (maybe during LVM snapshot creation)
Last modified: 2007-04-18 13:00:02 EDT
Last week, I updated my systems with the do_brk() errata kernels. One
system crashed last night.
I did rebuild the kernel RPM with the patch from bugzilla #97843 added
and LVM_VFS_ENHANCEMENT defined, and I _think_ that the crash may have
happened when a snapshot was being made (but I'm not sure).
The system is a Penguin Computing dual PIII with 1G RAM, an AMI
MegaRAID adapter with multiple drives striped and mirrored (presented
as one logical drive to the system). All filesystems are ext3, and
all except /boot are on LVM. The system runs sendmail (with a custom
cf file), OpenLDAP, and cucipop (with some local mods); it is a
"sidelined spam" server (stores suspected spam for users to check if
they want) that receives about 5-6G of spam a day. The volume with
the spam storage and sendmail queue is not snapshotted (and not backed
up), but all other volumes are backed up over the network - a script
is called that creates a snapshot, mounts it (as ext2), and when the
backup is done another script is called to unmount and remove the
I will attach the decoded kernel oops. Please let me know if there is
anything I can do to help.
Created attachment 96423 [details]
Hmm. There's definitely no footprint associated with LVM in this
oops. It's a panic walking a list where, historically, we usually
only see problems if there is hardware memory corruption going on. A
memtest86 run might be useful; but for a corrupt list seen on a
home-built kernel, I'm not sure that there's any useful debugging we
can do without more information.
When I looked at the logs, it looked like this oops happened shortly
after an LVM snapshot was created, but that may just have been
coincidence; I didn't look at the order of events (the oops came 2
seconds _before_ an LVM snapshot was mounted, but then the system froze).
At this point, I don't think it is hardware. We've got two Penguin
Computing Relion servers (dual PIII 1.13GHz; one has 1G RAM and the
other 2G RAM) still running RHL 8.0. Since upgrading to 2.4.20-24.8
(with the LVM patch from RH Bugzilla added), one (where this oops came
from) has crashed once, and the other has crashed a couple of times
(but no oops logged; we're working on getting a serial console server
so we can capture any oops on the console). Both systems have run for
a long time before that with no trouble (one for about 6-8 months, one
for about a year and a half).
If there's any more information I can provide, I'll be happy to. The
only reason I'm running a home-built kernel is because the patch
necessary for LVM snapshots to be useful with ext3 has not made its
way into the errata kernels, and I need that patch to make good
backups. These systems will eventually be migrated to RHEL ES (I
thought about building a kernel from the RHEL 3 ES errata SRPM, but I
don't know if that would work right on 8.0 since the kernel has NPTL
patches and such), but I can't do that right this minute.
We have had the exact same problem with LVM and snapshots since Red
Hat kernel 2.4.18-27. Our machine has 8GB of RAM and the only way I
could get snapshots to work was to artificially limit the system RAM
with mem=6000M on the kernel command line.
Looking at various Google searches, many people think this problem is
related to a bug in the kernel VMM. I'm attaching a kernel BUG()
output that I receive when trying to use snapshots with 2.4.18-27
which points vmalloc.c.
If I use 2.4.20-28, I no longer get a kernel BUG(), but now I receive:
lvcreate -- ERROR "Cannot allocate memory" creating VGDA for
"/dev/vg00/lvsnap" in kernel
Probably because __vmalloc() in mm/vmalloc.c no longer returns BUG()
as it did in 2.4.18-27, but now it just returns NULL.
Here are the specs for my machine:
2x2.6GHz Xeon with HT
I really need 8GB of RAM not 6000M.
Created attachment 97350 [details]
Kernel BUG when creating snapshots with 2.4.18-27
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/