288931 – Kernel panic in list_debug.c:31

Bug 288931 - Kernel panic in list_debug.c:31

Summary: Kernel panic in list_debug.c:31

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Larry Woodman
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-09-13 07:47 UTC by Cristian Seres
Modified:	2007-12-18 15:09 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-12-18 15:09:14 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
backtrace from crash utility (923 bytes, text/plain) 2007-09-13 07:47 UTC, Cristian Seres	no flags	Details
Lsmod output (1.60 KB, text/plain) 2007-09-13 07:48 UTC, Cristian Seres	no flags	Details
ps from crash utility (2.30 KB, text/plain) 2007-09-13 07:50 UTC, Cristian Seres	no flags	Details
sys output from crash utility (537 bytes, text/plain) 2007-09-13 07:55 UTC, Cristian Seres	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	215214	0	medium	CLOSED	kernel: kernel BUG at lib/list_debug.c:31!	2021-02-22 00:41:40 UTC

Description Cristian Seres 2007-09-13 07:47:14 UTC

Description of problem:
Kernel will panic frequently on a system with Adaptec 2410SA RAID controller.

Version-Release number of selected component (if applicable):
2.6.18-8.1.8.el5

How reproducible:
Crash will occur about once a day, but under high load usually within a couple
of hours. All hardware excluding hard drives, Adaptec 2410SA RAID controller and
power supply have been replaced.

Steps to Reproduce:
1. No special requirements, normal operation
  
Actual results:
Kernel panic sooner or later. It was possible to obtain vmcore using kdump via ssh.

Expected results:
No kernel panic.

Additional info:
The system is actually CentOS 5. I can send vmcore (640 MB).

Comment 1 Cristian Seres 2007-09-13 07:47:14 UTC

Created attachment 194361 [details]
backtrace from crash utility

Comment 2 Cristian Seres 2007-09-13 07:48:34 UTC

Created attachment 194371 [details]
Lsmod output

Comment 3 Cristian Seres 2007-09-13 07:50:24 UTC

Created attachment 194381 [details]
ps from crash utility

Comment 4 Cristian Seres 2007-09-13 07:55:07 UTC

Created attachment 194391 [details]
sys output from crash utility

Comment 6 Larry Woodman 2007-11-26 17:11:23 UTC

For some reason the anon_vma's list_head is corrupt.  If someone attaches a core
file or sends the location of one I'd like to look at it.  At this point I cant
reproduce the problem so please help!!!

Larry Woodman

Comment 7 Cristian Seres 2007-12-07 13:44:11 UTC

Vmcore available at http://c.seres.fi/vmcore.gz .

The machine has been running without a crash for a couple of months now.

Comment 8 Larry Woodman 2007-12-13 15:32:17 UTC

The problem is that the anon_vma->head is a list of related vmas which are
linked together via the vma->anon_vma_node.  The last vma(tail) of that list
does not point back to the anon_vma->head.  Instead, the
vma->anon_vma_node->prev contains
0x48 which is some sort of list corruption which triggers this BUG():
--------------------------------------------------------------------------
        if (unlikely(prev->next != next)) {
                printk(KERN_ERR "list_add corruption. prev->next should be %p,
but was %p\n",
                        next, prev->next);
                BUG();
        }
---------------------------------------------------------------------------

What has changed over the past couple months since you have seen a crash,
anything???

Larry Woodman

Comment 9 Cristian Seres 2007-12-14 12:27:40 UTC

Here is an incomplete list of changes that have been made since 2007-09-13 when
I submitted the bug (timing may be inaccurate, some of these may be earlier
changes).

- Installed memtest and bonnie++, ran tests
- Disabled and uninstalled stor_agent (from Adaptec StorMan-2.12-00.src.rpm)
- Temporarily disabled services apmd, dkms_autoinstaller, kudzu, lm_sensors,
smartd, gpm and auditd. They have been enabled without problems.
- Configured ntpd to use internal time servers instead of unreachable public
servers (the machine is in isolated network without access to Internet)
- Bacula MySQL tables have been checked and repaired
- Bacula configuration has been changed many times, but no changes have been
made before the system became stable again, I think.

File /etc/sysconfig/hwconf has modification date 2007-09-18, about five days
about the bug report.

Comment 10 Larry Woodman 2007-12-14 16:43:21 UTC

So is this still a problem or has the panic stopped happening?  I have never
seen a similar panic from any other customers or partners, I'm just tryint to
get an idea just how prolific this is.

Larry Woodman

Comment 11 Cristian Seres 2007-12-17 15:00:44 UTC

No panic for a couple of months. Perhaps it was a hardware problem after all.

Comment 12 Larry Woodman 2007-12-18 15:09:14 UTC

OK, I'm going to close the bug due to INSUFFICIENT_DATA.  PLease reopen this BZ
if you ever see the problem again.

Larry Woodman

Note You need to log in before you can comment on or make changes to this bug.