Bug 288931 - Kernel panic in list_debug.c:31
Summary: Kernel panic in list_debug.c:31
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Larry Woodman
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-09-13 07:47 UTC by Cristian Seres
Modified: 2007-12-18 15:09 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-12-18 15:09:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
backtrace from crash utility (923 bytes, text/plain)
2007-09-13 07:47 UTC, Cristian Seres
no flags Details
Lsmod output (1.60 KB, text/plain)
2007-09-13 07:48 UTC, Cristian Seres
no flags Details
ps from crash utility (2.30 KB, text/plain)
2007-09-13 07:50 UTC, Cristian Seres
no flags Details
sys output from crash utility (537 bytes, text/plain)
2007-09-13 07:55 UTC, Cristian Seres
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 215214 0 medium CLOSED kernel: kernel BUG at lib/list_debug.c:31! 2021-02-22 00:41:40 UTC

Description Cristian Seres 2007-09-13 07:47:14 UTC
Description of problem:
Kernel will panic frequently on a system with Adaptec 2410SA RAID controller.

Version-Release number of selected component (if applicable):
2.6.18-8.1.8.el5

How reproducible:
Crash will occur about once a day, but under high load usually within a couple
of hours. All hardware excluding hard drives, Adaptec 2410SA RAID controller and
power supply have been replaced.

Steps to Reproduce:
1. No special requirements, normal operation
  
Actual results:
Kernel panic sooner or later. It was possible to obtain vmcore using kdump via ssh.

Expected results:
No kernel panic.

Additional info:
The system is actually CentOS 5. I can send vmcore (640 MB).

Comment 1 Cristian Seres 2007-09-13 07:47:14 UTC
Created attachment 194361 [details]
backtrace from crash utility

Comment 2 Cristian Seres 2007-09-13 07:48:34 UTC
Created attachment 194371 [details]
Lsmod output

Comment 3 Cristian Seres 2007-09-13 07:50:24 UTC
Created attachment 194381 [details]
ps from crash utility

Comment 4 Cristian Seres 2007-09-13 07:55:07 UTC
Created attachment 194391 [details]
sys output from crash utility

Comment 6 Larry Woodman 2007-11-26 17:11:23 UTC
For some reason the anon_vma's list_head is corrupt.  If someone attaches a core
file or sends the location of one I'd like to look at it.  At this point I cant
reproduce the problem so please help!!!

Larry Woodman


Comment 7 Cristian Seres 2007-12-07 13:44:11 UTC
Vmcore available at http://c.seres.fi/vmcore.gz .

The machine has been running without a crash for a couple of months now.

Comment 8 Larry Woodman 2007-12-13 15:32:17 UTC
The problem is that the anon_vma->head is a list of related vmas which are
linked together via the vma->anon_vma_node.  The last vma(tail) of that list
does not point back to the anon_vma->head.  Instead, the
vma->anon_vma_node->prev contains
0x48 which is some sort of list corruption which triggers this BUG():
--------------------------------------------------------------------------
        if (unlikely(prev->next != next)) {
                printk(KERN_ERR "list_add corruption. prev->next should be %p,
but was %p\n",
                        next, prev->next);
                BUG();
        }
---------------------------------------------------------------------------

What has changed over the past couple months since you have seen a crash,
anything???

Larry Woodman


Comment 9 Cristian Seres 2007-12-14 12:27:40 UTC
Here is an incomplete list of changes that have been made since 2007-09-13 when
I submitted the bug (timing may be inaccurate, some of these may be earlier
changes).

- Installed memtest and bonnie++, ran tests
- Disabled and uninstalled stor_agent (from Adaptec StorMan-2.12-00.src.rpm)
- Temporarily disabled services apmd, dkms_autoinstaller, kudzu, lm_sensors,
smartd, gpm and auditd. They have been enabled without problems.
- Configured ntpd to use internal time servers instead of unreachable public
servers (the machine is in isolated network without access to Internet)
- Bacula MySQL tables have been checked and repaired
- Bacula configuration has been changed many times, but no changes have been
made before the system became stable again, I think.

File /etc/sysconfig/hwconf has modification date 2007-09-18, about five days
about the bug report.

Comment 10 Larry Woodman 2007-12-14 16:43:21 UTC
So is this still a problem or has the panic stopped happening?  I have never
seen a similar panic from any other customers or partners, I'm just tryint to
get an idea just how prolific this is.

Larry Woodman


Comment 11 Cristian Seres 2007-12-17 15:00:44 UTC
No panic for a couple of months. Perhaps it was a hardware problem after all.

Comment 12 Larry Woodman 2007-12-18 15:09:14 UTC
OK, I'm going to close the bug due to INSUFFICIENT_DATA.  PLease reopen this BZ
if you ever see the problem again.

Larry Woodman


Note You need to log in before you can comment on or make changes to this bug.