Bug 56603 - DAC960 panics during 2.4.9-13smp boot
Summary: DAC960 panics during 2.4.9-13smp boot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-11-21 18:48 UTC by Josh Neal
Modified: 2007-04-18 16:38 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-02-15 21:57:21 UTC
Embargoed:


Attachments (Terms of Use)

Description Josh Neal 2001-11-21 18:48:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.75 [en] (X11; U; Linux 2.2.18pre11-va2.9-servers1
i686)

Description of problem:
Successfully installed 7.2 stock on Intel SRKA4 (quad Xeon 4U) with Mylex
1164 RAID controller as primary boot device. After installing, upgraded
kernel from 2.4.7-10smp to 2.4.9-13smp. DAC960 detects controller properly,
but when init starts, the kernel panics, printing this to the console:
Kernel panic: DAC960: SegmentNumber != SegmentCount

Have confirmed that this behavior exists when using a Mylex 2000 RAID
controller as well. Have confirmed this behavior on several similar
SRKA4's. 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install 7.2 stock on Intel SRKA4 (quad Xeon 4U) with Mylex 1164 or 2000
RAID controller as primary boot device.
2. Upgrade kernel to 2.4.9-13smp, reboot.


	

Actual Results:  During boot, DAC960 detects controller and drives, but
when init starts, kernel panics, printing this to the console:
Kernel panic: DAC960: SegmentNumber != SegmentCount

Expected Results:  System should boot normally, as it does when using
2.4.7-10smp.

Additional info:

Am able to reproduce this behavior on multiple SRKA4s with similar
configurations that are supported by 2.4.7-10smp. 

Am booting these machines with LILO, and not GRUB. Have confirmed that I
configured lilo to use the correct initrd.

Comment 1 Josh Neal 2001-11-21 19:34:14 UTC
Tested similar software configuration on machine with Intel L440GX motherboard
and Mylex 2000 as primary boot device.

2.4.9-13smp generates same kernel panic (DAC960: SegmentNumber != SegmentCount),
although this error occurs later in the boot process: it occurs just after local
filesystems are mounted. (By comparison, the SRKA4s kernel panic immediately
after init starts.)

To do: will test configuration with non-Intel motherboard. Will also try rolling
back the DAC960 driver from July rev to February rev and building new initrd.

Comment 2 Arjan van de Ven 2001-11-21 19:37:11 UTC
Interesting would be to boot with "mem=800M" to rule out highmem issues....

Comment 3 Josh Neal 2001-11-21 19:53:30 UTC
Test SRKA4s have 2GB - 4GB physical ram; test L440GX has 1GB ram.

Using "mem=800M" during 2.4.9-13smp boot on both SRKA4 and L440GX prevents
DAC960 kernel panic. 

Good call. What's our next step?

Comment 4 Arjan van de Ven 2001-11-21 19:54:49 UTC
My next step is to go look at the dac960 sourcecode ;(
(and the diff with older kernels)

Comment 5 David Tseng 2001-11-23 17:11:57 UTC
I have the exact same DAC960 kernel panic with my AccelRAID 150 on RH 
7.1 with the kernel updates 2.4.9-6smp and 2.4.9-12smp. It occurs 
occasionally during heavy disk activity and during boot up (max uptime 
was about 10 days).  2.4.3-12 did not have this problem and I have since 
reverted back to it.

Perhaps I'll try the "mem=800M" as suggested.


Comment 6 Dennis Edmonds 2001-11-28 16:00:21 UTC
Could this be related to my problem (bug # 56596?)  I experience a failed init
with the DAC960 driver on a single processor system.  Instead of a panic, I only
have to deal with a failed fsck (sig 11.)  Everything seems fine when not using
big memory (>896MB.)



Comment 7 Arjan van de Ven 2001-11-28 16:02:04 UTC
We're testing a fix right now, but since it has some core blocklayer changes I
rather test it good before asking others to test it...

Comment 8 Mihai RUSU 2002-02-15 04:46:35 UTC
I am having the same issue here. We are using a 2.4.9-13SGI_XFS_1.0.2 compiled
from sources with egcs 2.91.66. We had a base slackware 8.0 installation. I know
it seems not redhat related but i thought that posting here I can help track
this problem only to something specific to this kernel. I get random crashes and
rarely when it boots it reports that DAC960 message. The hardware is a dual SMP,
Mylex 170 RAID5, 1GB ram. I have disabled highmem and still got some crashes. I
could not see if is still this problem (it did'nt showed any message on boot
since i had disabled highmem). I can see the last post here is 3 months old. I
know redhat had released an errata kernel (2.4.9-21) with a DAC960 driver
upgrade. does this solves this problem? Can someone who does have this problem
upgrade to that and test it out?

Comment 9 Arjan van de Ven 2002-02-15 08:46:17 UTC
The 2.4.9-21 kernel is supposed to fix this (it does in our lab).
dizzy: you have bigger problems since egcs will miscompile the DAC960
driver..... the MINIMUM compiler for 2.4 kernels is gcc 2.95.3 (as per
Documentation/Changes); egcs just miscompiles too much code.

Comment 10 Mihai RUSU 2002-02-15 21:57:16 UTC
arjanv: :) . you are right. just only that SGI recommends to use egcs for
compiling their kernel, even the binary kernel provided by them and tested QA
was compiled with egcs (AFAIK). anyway dont you think the errata comment on that
kernel should include this bugid on solved bugs ? :)


Note You need to log in before you can comment on or make changes to this bug.