Red Hat Bugzilla – Bug 112436
aacraid + kernel2.4.21-4.0.1.ELsmp + x86_64 == crash
Last modified: 2007-11-30 17:06:59 EST
Description of problem:
When using Adaptec 2200s raid card with IBM e325 dual opteron
system, buiding a filesystem on the disk array causes the machine
to lock up with:
Northbridge Machine Check exception b40000000005001b 0
Northbridge status b40000000005001b
GART error 11
Lost an northbridge error
NB error address 00000000eff60000
MCE at EIP ffffffff8010de3e ESP ffffffff80633fc8
CPU 0: Machine Check Exception: 0000000000000000
Kernel panic: Unable to continue
In idle task - not syncing
Have tried both the stock smp kernel with driver 1.1.2 and a
custom kernel with driver 1.1.4 (from adaptec's sources), same
Version-Release number of selected component (if applicable):
IBM e325, dual opteron, 5g memory. Adaptec 2200s card firmware
4.0-4. Aacraid driver in the redhat-supplied 2.4.21-4.0.1.ELsmp
kernel is 1.1.2, adaptec has driver source on their web site for
1.1.4 which does behave better than 1.1.2 but still has this problem.
Steps to Reproduce:
1.Build an external RAID group (mine is 550GB) using the adaptec
bios utility. Allow the array initialization to complete.
2. Use fdisk to create a partition of the entire device
3. Run mkfs -t ext3 /dev/sdb1
5. Note system crash on console
My raid is 9 x 73G Seagate drives split across 2 channels
built as raid5.
I had also opened service request 277987 on this, a tech there
responded suggesting to add "nomce" to the kernel command. This
is effective is eliminating the crash and the systems appear
stable. I am not familiar with the details of the machine check
exception but if this is a valid fix then please close this ticket.
my customer has also reported a similar case since updated kernel to
2.4.21-20.EL.x86_64 and using Optron64.
Sep 7 04:02:23 opteron kernel: Northbridge status a60000010005001b
Sep 7 04:02:23 opteron kernel: GART error 11
Sep 7 04:02:23 opteron kernel: Lost an northbridge error
Sep 7 04:02:23 opteron kernel: NB status: unrecoverable
Sep 7 04:02:23 opteron kernel: NB error address 00000000fbf61258
Sep 7 04:02:23 opteron kernel: Error uncorrected
Documentation for AMD Opteron MCE architecture may be found at
This appears to decode to be a GART TLB Error with a valid cause
address of 00000000fbf61258.
Given the address (it's very near where I would expect mmio space
would be allocated) I would take a look in /proc/iomem and see if the
controller in question has memory near this address.
Thanks for the information David.
Please check /proc/iomem to see if the Adaptec 2200 has memory at the
address shown in the machine check. Also please check with Adaptec and
make sure you have the latest firmware for that board.
Since we have not received the feedback we requested, we will assume the problem
was not reproduceable or has been fixed in a later update for this product.
Users who have experienced this problem are encouraged to upgrade to the latest
update release, and if this issue is still reproduceable, please contact the Red
Hat Global Support Services page on our website for technical support options:
If you have a telephone based support contract, you may contact Red Hat at
1-888-GO-REDHAT for technical support for the problem you are experiencing.