Bug 50593 - (SCSI IPS)Netfinity 4500R, ServeRAID 4L, firmware 4.70, kernel-2.4.3-12 hangs after a day
(SCSI IPS)Netfinity 4500R, ServeRAID 4L, firmware 4.70, kernel-2.4.3-12 hangs...
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brock Organ
Depends On:
  Show dependency treegraph
Reported: 2001-08-01 09:41 EDT by rosa
Modified: 2007-04-18 12:35 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-06-19 03:26:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Summary of the history of this case. (2.76 KB, text/plain)
2001-08-01 09:45 EDT, rosa
no flags Details

  None (edit)
Description rosa 2001-08-01 09:41:14 EDT
Description of problem:
All Netfinity 4500R, ServeRAID 4L, firmware 4.70.17 with redhat 7.1 
+ updates will hang after about a day.

No users, no network, no serial, no cronjobs other than the standard 
ones shipped with RedHat *minimal install* like updatedb, logrotate etc.
Only a display attached.

How reproducible:

Steps to Reproduce:
1.Take a new Netfinity 4500R + ServeRAID 4L card + three 18.4 GB disks
2.Install (insert CD and reboot, rest is automatic)
a) IBM UpdateExpress  CD as per IBM website 
b) IBM ServeRAID 4.70 CD as per IBM website 
   http://www.pc.ibm.com/qtechinfo/MIGR-4X7R6P.html, iso is at
c) Redhat 7.1 Linux   CD + updates (+ kickstart)

Actual Results:  After about a day the machine will crash.

Expected Results:  The machine should have stayed up !

Additional info:

1) Several `(ips0) Resetting controller' entries in kernel log
2) `device events' counters continuously increasing for the SCSI 
   disk devices
3) At several occasions, after a cold boot, the machine rebooted 
   immediately after displaying the SCSI messages *). So far the
   second boot has always succeeded.
4) After one to two days, left unattended, the machine hangs.

*) right after this:
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0
       <Adaptec AIC-7899 Ultra 160/m SCSI host adapter>
scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0
       <Adaptec AIC-7899 Ultra 160/m SCSI host adapter>
(Normally it would show ... Loading ips module)

Here's an overview of different netfinities:

 server        device events    firmware  longest     redhat       kernel
           (disk1, disk2, disk3) version  uptime      version      version
 number1        0,  0,  0       4.00.06   225 days   6.2+updates   2.2.16-3smp
 number2        0,  0,  0       4.50.05   >100 days  6.2+updates   2.2.16-3smp
 number3        0,  1,  0       4.50.05   >100 days  6.2+updates   2.2.16-3smp

 replacement   10, 16, 31      4.70.17   1 day      7.1+updates   2.4.3-12
 senttothelab  14, 19, 45      4.70.17   2 day      7.1           2.4.2-2

All machines are 2CPU Netfinity 4500R, ServeRAID 4L,  512MB or 1GB main memory
except for number1 which only has ServeRAID 3L,

Below I'll attach a history of what happened prior to this.
Previous report (way too much detail, piled up as we went along) is at 

A quote from there:
`There are lots of folks now using Red Hat 7.1 and are not seeing this'

Could anybody running this same configuration *) for longer than a
week please, please send me a note !

*) I.e. netfinity 4500R (aka xSeries 340 eServer), ServeRAID 4L,
   firmware 4.70.17, redhat 7.1 

Comment 1 rosa 2001-08-01 09:45:43 EDT
Created attachment 25772 [details]
Summary of the history of this case.
Comment 2 rosa 2003-06-18 20:02:56 EDT
The issue has been resolved since end of 2001. Around that time IBM released 
bugfix RAID firmware 4.80.26 
The machine has since been running without a prob under a load of 2-3 for 
well over a year now. Initially 2.4.9 would cause it to agressively
swap which sometimes made it crawl, but it never went down. That swap problem
went away with an upgrade to 2.4.18

Sorry for not updating ! Only when Alan changed the subject on Tue, 10 Jun 2003 
I noticed that it was still open.

Note You need to log in before you can comment on or make changes to this bug.