Description of problem: Kernel will panic frequently on a system with Adaptec 2410SA RAID controller. Version-Release number of selected component (if applicable): 2.6.18-8.1.8.el5 How reproducible: Crash will occur about once a day, but under high load usually within a couple of hours. All hardware excluding hard drives, Adaptec 2410SA RAID controller and power supply have been replaced. Steps to Reproduce: 1. No special requirements, normal operation Actual results: Kernel panic sooner or later. It was possible to obtain vmcore using kdump via ssh. Expected results: No kernel panic. Additional info: The system is actually CentOS 5. I can send vmcore (640 MB).
Created attachment 194361 [details] backtrace from crash utility
Created attachment 194371 [details] Lsmod output
Created attachment 194381 [details] ps from crash utility
Created attachment 194391 [details] sys output from crash utility
For some reason the anon_vma's list_head is corrupt. If someone attaches a core file or sends the location of one I'd like to look at it. At this point I cant reproduce the problem so please help!!! Larry Woodman
Vmcore available at http://c.seres.fi/vmcore.gz . The machine has been running without a crash for a couple of months now.
The problem is that the anon_vma->head is a list of related vmas which are linked together via the vma->anon_vma_node. The last vma(tail) of that list does not point back to the anon_vma->head. Instead, the vma->anon_vma_node->prev contains 0x48 which is some sort of list corruption which triggers this BUG(): -------------------------------------------------------------------------- if (unlikely(prev->next != next)) { printk(KERN_ERR "list_add corruption. prev->next should be %p, but was %p\n", next, prev->next); BUG(); } --------------------------------------------------------------------------- What has changed over the past couple months since you have seen a crash, anything??? Larry Woodman
Here is an incomplete list of changes that have been made since 2007-09-13 when I submitted the bug (timing may be inaccurate, some of these may be earlier changes). - Installed memtest and bonnie++, ran tests - Disabled and uninstalled stor_agent (from Adaptec StorMan-2.12-00.src.rpm) - Temporarily disabled services apmd, dkms_autoinstaller, kudzu, lm_sensors, smartd, gpm and auditd. They have been enabled without problems. - Configured ntpd to use internal time servers instead of unreachable public servers (the machine is in isolated network without access to Internet) - Bacula MySQL tables have been checked and repaired - Bacula configuration has been changed many times, but no changes have been made before the system became stable again, I think. File /etc/sysconfig/hwconf has modification date 2007-09-18, about five days about the bug report.
So is this still a problem or has the panic stopped happening? I have never seen a similar panic from any other customers or partners, I'm just tryint to get an idea just how prolific this is. Larry Woodman
No panic for a couple of months. Perhaps it was a hardware problem after all.
OK, I'm going to close the bug due to INSUFFICIENT_DATA. PLease reopen this BZ if you ever see the problem again. Larry Woodman