Bug 56603 - DAC960 panics during 2.4.9-13smp boot
DAC960 panics during 2.4.9-13smp boot
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.2
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-11-21 13:48 EST by Josh Neal
Modified: 2007-04-18 12:38 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-02-15 16:57:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Josh Neal 2001-11-21 13:48:33 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.75 [en] (X11; U; Linux 2.2.18pre11-va2.9-servers1
i686)

Description of problem:
Successfully installed 7.2 stock on Intel SRKA4 (quad Xeon 4U) with Mylex
1164 RAID controller as primary boot device. After installing, upgraded
kernel from 2.4.7-10smp to 2.4.9-13smp. DAC960 detects controller properly,
but when init starts, the kernel panics, printing this to the console:
Kernel panic: DAC960: SegmentNumber != SegmentCount

Have confirmed that this behavior exists when using a Mylex 2000 RAID
controller as well. Have confirmed this behavior on several similar
SRKA4's. 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install 7.2 stock on Intel SRKA4 (quad Xeon 4U) with Mylex 1164 or 2000
RAID controller as primary boot device.
2. Upgrade kernel to 2.4.9-13smp, reboot.


	

Actual Results:  During boot, DAC960 detects controller and drives, but
when init starts, kernel panics, printing this to the console:
Kernel panic: DAC960: SegmentNumber != SegmentCount

Expected Results:  System should boot normally, as it does when using
2.4.7-10smp.

Additional info:

Am able to reproduce this behavior on multiple SRKA4s with similar
configurations that are supported by 2.4.7-10smp. 

Am booting these machines with LILO, and not GRUB. Have confirmed that I
configured lilo to use the correct initrd.
Comment 1 Josh Neal 2001-11-21 14:34:14 EST
Tested similar software configuration on machine with Intel L440GX motherboard
and Mylex 2000 as primary boot device.

2.4.9-13smp generates same kernel panic (DAC960: SegmentNumber != SegmentCount),
although this error occurs later in the boot process: it occurs just after local
filesystems are mounted. (By comparison, the SRKA4s kernel panic immediately
after init starts.)

To do: will test configuration with non-Intel motherboard. Will also try rolling
back the DAC960 driver from July rev to February rev and building new initrd.
Comment 2 Arjan van de Ven 2001-11-21 14:37:11 EST
Interesting would be to boot with "mem=800M" to rule out highmem issues....
Comment 3 Josh Neal 2001-11-21 14:53:30 EST
Test SRKA4s have 2GB - 4GB physical ram; test L440GX has 1GB ram.

Using "mem=800M" during 2.4.9-13smp boot on both SRKA4 and L440GX prevents
DAC960 kernel panic. 

Good call. What's our next step?
Comment 4 Arjan van de Ven 2001-11-21 14:54:49 EST
My next step is to go look at the dac960 sourcecode ;(
(and the diff with older kernels)
Comment 5 David Tseng 2001-11-23 12:11:57 EST
I have the exact same DAC960 kernel panic with my AccelRAID 150 on RH 
7.1 with the kernel updates 2.4.9-6smp and 2.4.9-12smp. It occurs 
occasionally during heavy disk activity and during boot up (max uptime 
was about 10 days).  2.4.3-12 did not have this problem and I have since 
reverted back to it.

Perhaps I'll try the "mem=800M" as suggested.
Comment 6 Dennis Edmonds 2001-11-28 11:00:21 EST
Could this be related to my problem (bug # 56596?)  I experience a failed init
with the DAC960 driver on a single processor system.  Instead of a panic, I only
have to deal with a failed fsck (sig 11.)  Everything seems fine when not using
big memory (>896MB.)

Comment 7 Arjan van de Ven 2001-11-28 11:02:04 EST
We're testing a fix right now, but since it has some core blocklayer changes I
rather test it good before asking others to test it...
Comment 8 Mihai RUSU 2002-02-14 23:46:35 EST
I am having the same issue here. We are using a 2.4.9-13SGI_XFS_1.0.2 compiled
from sources with egcs 2.91.66. We had a base slackware 8.0 installation. I know
it seems not redhat related but i thought that posting here I can help track
this problem only to something specific to this kernel. I get random crashes and
rarely when it boots it reports that DAC960 message. The hardware is a dual SMP,
Mylex 170 RAID5, 1GB ram. I have disabled highmem and still got some crashes. I
could not see if is still this problem (it did'nt showed any message on boot
since i had disabled highmem). I can see the last post here is 3 months old. I
know redhat had released an errata kernel (2.4.9-21) with a DAC960 driver
upgrade. does this solves this problem? Can someone who does have this problem
upgrade to that and test it out?
Comment 9 Arjan van de Ven 2002-02-15 03:46:17 EST
The 2.4.9-21 kernel is supposed to fix this (it does in our lab).
dizzy@roedu.net: you have bigger problems since egcs will miscompile the DAC960
driver..... the MINIMUM compiler for 2.4 kernels is gcc 2.95.3 (as per
Documentation/Changes); egcs just miscompiles too much code.
Comment 10 Mihai RUSU 2002-02-15 16:57:16 EST
arjanv: :) . you are right. just only that SGI recommends to use egcs for
compiling their kernel, even the binary kernel provided by them and tested QA
was compiled with egcs (AFAIK). anyway dont you think the errata comment on that
kernel should include this bugid on solved bugs ? :)

Note You need to log in before you can comment on or make changes to this bug.