Bug 111735 - Kernel oops (maybe during LVM snapshot creation)
Kernel oops (maybe during LVM snapshot creation)
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
8.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-12-09 10:24 EST by Chris Adams
Modified: 2007-04-18 13:00 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:41:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
kysmoops output (3.71 KB, text/plain)
2003-12-09 10:24 EST, Chris Adams
no flags Details
Kernel BUG when creating snapshots with 2.4.18-27 (2.32 KB, text/plain)
2004-01-29 16:34 EST, Andrew Rechenberg
no flags Details

  None (edit)
Description Chris Adams 2003-12-09 10:24:17 EST
Last week, I updated my systems with the do_brk() errata kernels.  One
system crashed last night.

I did rebuild the kernel RPM with the patch from bugzilla #97843 added
and LVM_VFS_ENHANCEMENT defined, and I _think_ that the crash may have
happened when a snapshot was being made (but I'm not sure).

The system is a Penguin Computing dual PIII with 1G RAM, an AMI
MegaRAID adapter with multiple drives striped and mirrored (presented
as one logical drive to the system).  All filesystems are ext3, and
all except /boot are on LVM.  The system runs sendmail (with a custom
cf file), OpenLDAP, and cucipop (with some local mods); it is a
"sidelined spam" server (stores suspected spam for users to check if
they want) that receives about 5-6G of spam a day.  The volume with
the spam storage and sendmail queue is not snapshotted (and not backed
up), but all other volumes are backed up over the network - a script
is called that creates a snapshot, mounts it (as ext2), and when the
backup is done another script is called to unmount and remove the
snapshot.

I will attach the decoded kernel oops.  Please let me know if there is
anything I can do to help.
Comment 1 Chris Adams 2003-12-09 10:24:55 EST
Created attachment 96423 [details]
kysmoops output
Comment 2 Stephen Tweedie 2003-12-11 10:49:24 EST
Hmm.  There's definitely no footprint associated with LVM in this
oops.  It's a panic walking a list where, historically, we usually
only see problems if there is hardware memory corruption going on.  A
memtest86 run might be useful; but for a corrupt list seen on a
home-built kernel, I'm not sure that there's any useful debugging we
can do without more information.
Comment 3 Chris Adams 2003-12-19 11:04:00 EST
When I looked at the logs, it looked like this oops happened shortly
after an LVM snapshot was created, but that may just have been
coincidence; I didn't look at the order of events (the oops came 2
seconds _before_ an LVM snapshot was mounted, but then the system froze).

At this point, I don't think it is hardware.  We've got two Penguin
Computing Relion servers (dual PIII 1.13GHz; one has 1G RAM and the
other 2G RAM) still running RHL 8.0.  Since upgrading to 2.4.20-24.8
(with the LVM patch from RH Bugzilla added), one (where this oops came
from) has crashed once, and the other has crashed a couple of times
(but no oops logged; we're working on getting a serial console server
so we can capture any oops on the console).  Both systems have run for
a long time before that with no trouble (one for about 6-8 months, one
for about a year and a half).

If there's any more information I can provide, I'll be happy to.  The
only reason I'm running a home-built kernel is because the patch
necessary for LVM snapshots to be useful with ext3 has not made its
way into the errata kernels, and I need that patch to make good
backups.  These systems will eventually be migrated to RHEL ES (I
thought about building a kernel from the RHEL 3 ES errata SRPM, but I
don't know if that would work right on 8.0 since the kernel has NPTL
patches and such), but I can't do that right this minute.
Comment 4 Andrew Rechenberg 2004-01-29 16:33:14 EST
We have had the exact same problem with LVM and snapshots since Red
Hat kernel 2.4.18-27.  Our machine has 8GB of RAM and the only way I
could get snapshots to work was to artificially limit the system RAM
with mem=6000M on the kernel command line.  

Looking at various Google searches, many people think this problem is
related to a bug in the kernel VMM.  I'm attaching a kernel BUG()
output that I receive when trying to use snapshots with 2.4.18-27
which points vmalloc.c.  

If I use 2.4.20-28, I no longer get a kernel BUG(), but now I receive:

lvcreate -- ERROR "Cannot allocate memory" creating VGDA for
"/dev/vg00/lvsnap" in kernel

Probably because __vmalloc() in mm/vmalloc.c no longer returns BUG()
as it did in 2.4.18-27, but now it just returns NULL.  

Here are the specs for my machine:

Dell PE4600
2x2.6GHz Xeon with HT
8GB RAM
RH 7.3
Kernel 2.4.20-28

I really need 8GB of RAM not 6000M.
Comment 5 Andrew Rechenberg 2004-01-29 16:34:28 EST
Created attachment 97350 [details]
Kernel BUG when creating snapshots with 2.4.18-27
Comment 6 Bugzilla owner 2004-09-30 11:41:44 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.