Bug 288931 - Kernel panic in list_debug.c:31
Kernel panic in list_debug.c:31
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Larry Woodman
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-09-13 03:47 EDT by Cristian Seres
Modified: 2007-12-18 10:09 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-12-18 10:09:14 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
backtrace from crash utility (923 bytes, text/plain)
2007-09-13 03:47 EDT, Cristian Seres
no flags Details
Lsmod output (1.60 KB, text/plain)
2007-09-13 03:48 EDT, Cristian Seres
no flags Details
ps from crash utility (2.30 KB, text/plain)
2007-09-13 03:50 EDT, Cristian Seres
no flags Details
sys output from crash utility (537 bytes, text/plain)
2007-09-13 03:55 EDT, Cristian Seres
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 215214 None None None Never

  None (edit)
Description Cristian Seres 2007-09-13 03:47:14 EDT
Description of problem:
Kernel will panic frequently on a system with Adaptec 2410SA RAID controller.

Version-Release number of selected component (if applicable):
2.6.18-8.1.8.el5

How reproducible:
Crash will occur about once a day, but under high load usually within a couple
of hours. All hardware excluding hard drives, Adaptec 2410SA RAID controller and
power supply have been replaced.

Steps to Reproduce:
1. No special requirements, normal operation
  
Actual results:
Kernel panic sooner or later. It was possible to obtain vmcore using kdump via ssh.

Expected results:
No kernel panic.

Additional info:
The system is actually CentOS 5. I can send vmcore (640 MB).
Comment 1 Cristian Seres 2007-09-13 03:47:14 EDT
Created attachment 194361 [details]
backtrace from crash utility
Comment 2 Cristian Seres 2007-09-13 03:48:34 EDT
Created attachment 194371 [details]
Lsmod output
Comment 3 Cristian Seres 2007-09-13 03:50:24 EDT
Created attachment 194381 [details]
ps from crash utility
Comment 4 Cristian Seres 2007-09-13 03:55:07 EDT
Created attachment 194391 [details]
sys output from crash utility
Comment 6 Larry Woodman 2007-11-26 12:11:23 EST
For some reason the anon_vma's list_head is corrupt.  If someone attaches a core
file or sends the location of one I'd like to look at it.  At this point I cant
reproduce the problem so please help!!!

Larry Woodman
Comment 7 Cristian Seres 2007-12-07 08:44:11 EST
Vmcore available at http://c.seres.fi/vmcore.gz .

The machine has been running without a crash for a couple of months now.
Comment 8 Larry Woodman 2007-12-13 10:32:17 EST
The problem is that the anon_vma->head is a list of related vmas which are
linked together via the vma->anon_vma_node.  The last vma(tail) of that list
does not point back to the anon_vma->head.  Instead, the
vma->anon_vma_node->prev contains
0x48 which is some sort of list corruption which triggers this BUG():
--------------------------------------------------------------------------
        if (unlikely(prev->next != next)) {
                printk(KERN_ERR "list_add corruption. prev->next should be %p,
but was %p\n",
                        next, prev->next);
                BUG();
        }
---------------------------------------------------------------------------

What has changed over the past couple months since you have seen a crash,
anything???

Larry Woodman
Comment 9 Cristian Seres 2007-12-14 07:27:40 EST
Here is an incomplete list of changes that have been made since 2007-09-13 when
I submitted the bug (timing may be inaccurate, some of these may be earlier
changes).

- Installed memtest and bonnie++, ran tests
- Disabled and uninstalled stor_agent (from Adaptec StorMan-2.12-00.src.rpm)
- Temporarily disabled services apmd, dkms_autoinstaller, kudzu, lm_sensors,
smartd, gpm and auditd. They have been enabled without problems.
- Configured ntpd to use internal time servers instead of unreachable public
servers (the machine is in isolated network without access to Internet)
- Bacula MySQL tables have been checked and repaired
- Bacula configuration has been changed many times, but no changes have been
made before the system became stable again, I think.

File /etc/sysconfig/hwconf has modification date 2007-09-18, about five days
about the bug report.
Comment 10 Larry Woodman 2007-12-14 11:43:21 EST
So is this still a problem or has the panic stopped happening?  I have never
seen a similar panic from any other customers or partners, I'm just tryint to
get an idea just how prolific this is.

Larry Woodman
Comment 11 Cristian Seres 2007-12-17 10:00:44 EST
No panic for a couple of months. Perhaps it was a hardware problem after all.
Comment 12 Larry Woodman 2007-12-18 10:09:14 EST
OK, I'm going to close the bug due to INSUFFICIENT_DATA.  PLease reopen this BZ
if you ever see the problem again.

Larry Woodman

Note You need to log in before you can comment on or make changes to this bug.