Bug 43461

Summary: Netfinity 4500R + serveRAID4L + 2.4.2:(ips0) Controller reset failed
Product: [Retired] Red Hat Linux Reporter: rosa
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: high    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-06 13:11:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
for the record, see next attachmnt. for the rest of the story
none
To me this looks like the proof that it's bad hardware that caused all this. none

Description rosa 2001-06-05 03:42:03 UTC
After two days uptime, just typing at the shell, rebuilding some small rpm,
suddenly:
(ips0) timeout waiting for post.
(ips0) Controller reset failed - controller now offline.

Followed (off course I would say) by loads of
I/O error: dev 08:05, sector 27000872 etc.
Even /sbin/init 6 and shutdown gave I/O errors
I could `cat' /var/log/kernelmessages though, here it started:
Jun 5 01:25:52 server3 kernel: (ips0) Resetting controller.
Jun 5 01:26:37 server3 kernel: (ips0) timeout waiting for post.
Jun 5 01:27:02 server3 kernel: (ips0) timeout waiting for post.
Jun 5 01:27:02 server3 kernel: (ips0) Controller reset failed - controller
now offline.
Jun 5 01:48:12 server3 kernel: scsi: device set offline - command error
recover failed: host 2 chann
el 0 id 0 lun 0:
Jun 5 01:48:19 server3 kernel: 4> I/O error: dev 08:05, sector 134160
Jun 5 01:48:19 server3 kernel: I/O error: dev 08:05, sector 134208
etc. etc.

machine is a 4500R (I believe this is nowadays being calles eserver xseries
340 ?) bought as new twe
lve days ago. Same thing happened after two days, after which we upgraded
the BIOS to the latest rev
ision and ServeRaid firmware to version 4.70.17 (latest available from IBM
website) on the
advice off IBM support.

The kernel is the stock redhat 7.1 which has ips.c with version 4.71.00

Reproduce by booting the machine and leaving it running for two days
or so with little activity.

We have other machines that were running fine for half a year with 2.2.16 !

We would hate to do this because we need 2.4 for bind9 and a host of other
things, but is it possible to downgrade to 2.2.16 although the
raidcard firmware is now 4.70 ??

I am willing to test anything as long it gets us a stable machine !
IBM support guy said he wasn't able to take the case any higher because
rh 7.0 is highest supported (as mentioned on the site).
I cannot believe this ! Of all the hardware I have at home
I never had anything fail this badly on me ... and most everything was
cheaper than the 4500R :) What good is RAID when the controller
stops ?

One other thing: the machine sits 60 miles away so it would be really
helpfull if alt sysrq over serial line would work so we could reboot
it without calling someone to hard-reset it ! This used to work in
2.2 but now it doesn't ..

Comment 1 rosa 2001-06-05 08:23:39 UTC
Just two things I wanted to mention:

At the first incident the controller went offline at 04:03, 
which is right after cron.daily .. this time it was after a 
rpm --rebuild imap-whatever.src.rpm so although definitely not
under heavy load, I would suspect some extra disk activity
in both cases. I've seen some discussion on the kernel mailinglist
about a Wiseman System Management adapter, and although there
definitely is an `ASMA Advanced System Management Adapter ' in the
system I do not know if it is a `Wiseman' 
(see threads at http://uwsg.iu.edu/hypermail/linux/kernel/0101.1/0918.html
and http://www.uwsg.indiana.edu/hypermail/linux/kernel/0103.1/0681.html )
Although I'm not yet sure whether the symptoms are identical: the machine 
does not lock up completely ..

otoh ..

The other thing I realised is that if you look at the messages
you'll see there is twenty minuten between
Jun 5 01:27:02 server3 kernel: (ips0) Controller reset failed - controller now
offline.
and:
Jun 5 01:48:12 server3 kernel: scsi: device set offline - command error

During this time, it did not respond, character echo or whatsoever, while I 
was typing at the serial console, then suddenly (could be after twenty minutes)
I got back the shell prompt (and loads of
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode block 
and `I/O error: dev 08:05, sector 32243712 '
type of messages

HTH,
Harold.

Comment 2 Arjan van de Ven 2001-06-05 15:26:45 UTC
I've asked IBM and they seem to think it's a problem with your hardware.
They have run days and days of stresstests with the driver/firmware without 
any problems at all..

Comment 3 rosa 2001-06-05 16:24:31 UTC
Which driver/kernel which firmware version ?
I took the liberty to ask ipslinux.com (from ips.c)
if it was still an option to downgrade to 2.2 (given that we upgraded the
firmware). They replied:

> The 4.7x ServeRAID release ( firmware / BIOS / drivers ) has been tested
> against the older releases.  There is no reason to down-level it to use a
> 2.2.x kernel.   It should be backwards compatible.
>
> It sounds to me like you may have a bad controller.   Have you tried a
> different one in the failing machine ?

I replied
` I read everything I could find on the net and added the `nmi_watchdog=0' 
  to the boot options. I'm not sure I completely understand what it does'

I suspect that as a side effect it disables some buggy code...
Right now the machine is running three simultaneous compiles of a linux kernel
and three of the gcc compiler while making tar backups of / (except /tmp)
with a loadaverage: 12.35, 11.76, 9.14


I am willing to believe it is faulty harwdware, but what am I going to tell
IBM support ? We bought the new machine twelve days ago.. and the first time I
called the answer was `only redhat 7.0 is supported' 
(see: http://www.pc.ibm.com/us/compat/nos/linux.html )

This is all a bit inconvenient as we bought the machine to ease the
transfer of our primary nameserver to another colocation (other netblock as 
well) and frankly right now I don't dare running anything important on this
machine.  It's an expensive kernel compile farm for the next two weeks :)



Comment 4 Arjan van de Ven 2001-06-05 16:28:22 UTC
nmi_watchdog=0 should be the default in our kernel.

You can use a 2.2 kernel with RHL 7.1, except that you loose some of the
USB hotplug features, but for a nameserver I suspect that isn't an issue.


Comment 5 Need Real Name 2001-06-11 11:17:30 UTC
Since Harold is on holiday, I take over some of his activities. This case is one
of those.

We followed your advise to upgrade the BIOS to 4.70. We even did the
nmi_watchdog=0-trick. We scheduled some kernel-build jobs so that there was some
load on the machine. Not more than two concurrent build jobs, and most of the
time idle.

Today, after five days, the server stopped to react normally and showed
lots of IO-errors, like this:

I/O error: dev 08:05, sector 3670088
EXT2-fs error (device sd(8,5)): ext2_read_inode: unable to read inode
block - inode=229068, block=458761
green.betterbe.com login: I/O error: dev 08:05, sector 3670088
EXT2-fs error (device sd(8,5)): ext2_write_inode: unable to read inode
block - inode=229068, block=458761
I/O error: dev 08:05, sector 0

I even could not login to the server.

After power off/on and manually reparing the filesystem, we found these lines in
the logfile again:

Jun  5 17:07:33 green kernel: (ips0) Resetting controller.
Jun  5 20:37:16 green kernel: (ips0) Resetting controller.
Jun  6 19:27:58 green kernel: (ips0) Resetting controller.
Jun  9 09:39:23 green kernel: (ips0) Resetting controller.
Jun  9 10:47:35 green kernel: (ips0) Resetting controller.
Jun  9 15:47:32 green kernel: (ips0) Resetting controller.
Jun 10 10:47:33 green kernel: (ips0) Resetting controller.
Jun 10 13:38:25 green kernel: (ips0) Resetting controller.

The line reporting that the controller is offline was not there. (due to the
nmi_watchdog=0 setting?).

In the lost+found directory I found 8 files of Jun 10, 15:17, all these files
are kernel-build-related.

The last reboot was Jun 5 12:16

So, almost 5 days uptime (longest so far :-/).

We did what we could so far. What should we do now ?


Comment 6 Need Real Name 2001-06-12 07:02:51 UTC
I saw that the linux driver (ips.c), released together with firmware/bios 4.70
(latest official version), has version 4.70.13. While the ips.c version of the
driver that comes with RH7.1 has version 4.71.

Could it be that that version 4.71 "too new" or "beta" ?

In what circumstances is a

(ips0) Resetting controller

generated? Is it regular?

What could we do to pinpoint the cause?

Please advice.

Comment 7 Arjan van de Ven 2001-06-12 09:01:23 UTC
4.71 has a fix for 2.4 kernels issueing large requests and should otherwise
be identical to 4.70

I don't have such hardware, and IBM thinks it's a hardware bug 


Comment 8 Need Real Name 2001-06-12 14:51:41 UTC
reaction from ipslinux.com:
----------------------------------

I wish I could help more, but I don't know what your problem could be. It
does appear to me that it could be ServeRAID related.   It could be
hardware related also.  I'm at a loss, however, because we have run stress
tests for much longer than this ( before it was released ) and did not see
failures like this.   When a SerevRAID resets occur, it usually means that
the driver thinks that the controller is not responding anymore.

You might try the latest driver ( 4.72 can be downloaded from
http://www.developer.ibm.com/xseries/serveraid.html ), but I can't say that
it will fix this ( since 4.71 should be OK in the first place ).

There are lots of folks now using Red Hat 7.1 and are not seeing this, so I
don't have a good answer at this time.

Good Luck ...   Let me know if there's anything else I can help with.


Comment 9 Need Real Name 2001-06-19 06:12:58 UTC
Another reaction from ipslinux.com:

A couple more items I thought of that are worth considering ....

1.) The device driver does not ever reset the controller.  That request
must come from Linux ( usually as a response to a series of
      failed and/or timed-out commands )

2.) In your bugzilla report I saw:

     (ips0) timeout waiting for post.
     (ips0) Controller reset failed - controller now offline.

    This concerns me a lot.  That indicates that the reset attempt failed.
Reset is fairly simply and I don't know how it would fail unless
     there is a very serious problem in the server or adapter that is
keeping the PCI Reset from occurring.

3.) Have you check you ServeRAID adapter error logs for errors ?




Comment 10 Need Real Name 2001-06-19 06:19:15 UTC
Last friday the serveRaid card is replaced. But the "(ips0) Resetting
controller" messages did not stop!!! 

We found out that the new card did not have the most recent firmware/bios.
Yesterday (monday june 19th) I upgraded tot 4.70 (was 4.50). Still, the messages
did not stop.

Today I will ask to replace the whole server.

Comment 11 Need Real Name 2001-06-19 06:26:58 UTC
It looks there is a pattern.... There are 2 cronjobs scheduled:

0 */2 * * * rpm --rebuild /home/build/IN/kernel-2.4.3-7.src.rpm >
/home/build/rebuild.$$ 2>&1

0 */3 * * * rpm --rebuild /home/rebuild/IN/kernel-2.4.3-7.src.rpm >
/home/rebuild/rebuild.$$ 2>&1

So, user "build" does a kernel compiler every 2 hours en user "rebuild" does it
every 3 hours. It looks like the a controller reset is done at the and of such a
build process, with a chance of about 30%.

Here are the logs and the timestamps of the logfiles of the build processes:

Jun 15 17:27:07 green kernel: ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
Jun 15 20:42:28 green kernel: (ips0) Resetting controller.
Jun 16 07:19:36 green kernel: (ips0) Resetting controller.
Jun 16 08:42:37 green kernel: (ips0) Resetting controller.
Jun 16 10:42:23 green kernel: (ips0) Resetting controller.
Jun 16 13:18:54 green kernel: (ips0) Resetting controller.
Jun 16 14:36:12 green kernel: (ips0) Resetting controller.
Jun 16 14:42:34 green kernel: (ips0) Resetting controller.
Jun 16 19:18:47 green kernel: (ips0) Resetting controller.
Jun 17 01:18:44 green kernel: (ips0) Resetting controller.
Jun 17 03:42:28 green kernel: (ips0) Resetting controller.
Jun 17 04:43:43 green kernel: (ips0) Resetting controller.
Jun 17 07:18:46 green kernel: (ips0) Resetting controller.
Jun 17 10:42:19 green kernel: (ips0) Resetting controller.
Jun 17 15:42:34 green kernel: (ips0) Resetting controller.
Jun 17 19:18:26 green kernel: (ips0) Resetting controller.

-rw-r--r--    1 build    build     2221453 Jun 15 17:12 rebuild.30268
-rw-r--r--    1 build    build     2221945 Jun 15 19:18 rebuild.1056
-rw-r--r--    1 build    build     2221944 Jun 15 20:41 rebuild.30102
-rw-r--r--    1 build    build     2221945 Jun 15 22:41 rebuild.26748
-rw-r--r--    1 build    build     2221945 Jun 16 01:18 rebuild.25063
-rw-r--r--    1 build    build     2221946 Jun 16 02:41 rebuild.21678
-rw-r--r--    1 build    build     2221946 Jun 16 04:42 rebuild.18158
-rw-r--r--    1 build    build     2221946 Jun 16 07:19 rebuild.16614
-rw-r--r--    1 build    build     2221944 Jun 16 08:41 rebuild.13306
-rw-r--r--    1 build    build     2221942 Jun 16 10:41 rebuild.9715
-rw-r--r--    1 build    build     2221944 Jun 16 13:18 rebuild.7977
-rw-r--r--    1 build    build     2221946 Jun 16 14:41 rebuild.4607
-rw-r--r--    1 build    build     2221946 Jun 16 16:41 rebuild.1118
-rw-r--r--    1 build    build     2221946 Jun 16 19:16 rebuild.31761
-rw-r--r--    1 build    build     2221945 Jun 16 20:41 rebuild.28376
-rw-r--r--    1 build    build     2221946 Jun 16 22:41 rebuild.24761
-rw-r--r--    1 build    build     2221945 Jun 17 01:16 rebuild.22985
-rw-r--r--    1 build    build     2221945 Jun 17 02:41 rebuild.19588
-rw-r--r--    1 build    build     2221946 Jun 17 04:42 rebuild.15977
-rw-r--r--    1 build    build     2221945 Jun 17 07:17 rebuild.19901
-rw-r--r--    1 build    build     2221946Jun 17 08:41 rebuild.16497
-rw-r--r--    1 build    build     2221945 Jun 17 10:41 rebuild.13083
-rw-r--r--    1 build    build     2221946 Jun 17 13:18 rebuild.11289
-rw-r--r--    1 build    build     2221943 Jun 17 14:41 rebuild.7889
-rw-r--r--    1 build    build     2221946 Jun 17 16:41 rebuild.4359
-rw-r--r--    1 build    build     2221946 Jun 17 19:15 rebuild.2675
-rw-r--r--    1 build    build     2221946 Jun 17 20:41 rebuild.31766
-rw-r--r--    1 build    build     2221945 Jun 17 22:41 rebuild.28308

-rw-r--r--    1 rebuild  rebuild   2243177 Jun 15 17:13 rebuild.30269
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 15 19:17 rebuild.1058
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 15 21:41 rebuild.28397
-rw-r--r--    1 rebuild  rebuild   2243668 Jun 16 01:18 rebuild.25065
-rw-r--r--    1 rebuild  rebuild   2243668 Jun 16 03:41 rebuild.19846
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 16 07:18 rebuild.16616
-rw-r--r--    1 rebuild  rebuild   2243669 Jun 16 09:41 rebuild.11527
-rw-r--r--    1 rebuild  rebuild   2243669 Jun 16 13:18 rebuild.7979
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 16 15:41 rebuild.2894
-rw-r--r--    1 rebuild  rebuild   2243669 Jun 16 19:17 rebuild.31763
-rw-r--r--    1 rebuild  rebuild   2243667 Jun 16 21:41 rebuild.26584
-rw-r--r--    1 rebuild  rebuild   2243669 Jun 17 01:17 rebuild.22987
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 17 03:41 rebuild.17798
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 17 07:19 rebuild.19903
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 17 09:41 rebuild.14791
-rw-r--r--    1 rebuild  rebuild   2243668 Jun 17 13:18 rebuild.11291
-rw-r--r--    1 rebuild  rebuild   2243669 Jun 17 15:41 rebuild.6175
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 17 19:17 rebuild.2677
-rw-r--r--    1 rebuild  rebuild   2243670 Jun 17 21:41 rebuild.30002


Comment 12 Need Real Name 2001-06-20 05:59:17 UTC
Yesterday, someone of CSG (a company that does hardware support) asked me to
retrieve some information with ipssend (a command line Serveraid utility). After
seeing this, he thought it is one disk that is causing the trouble:

Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
 BIOS version                   : 4.70.17
 Firmware version               : 4.70.17
Device event table:
 |Channel|SCSI ID|Parity |Soft   |Hard   |PFA    |Misc   |
 |-------|-------|-------|-------|-------|-------|-------|
 | 1     | 0     | 0     | 0     | 0     | No    | 3     |
 | 1     | 1     | 0     | 0     | 0     | No    | 3     |
 | 1     | 2     | 0     | 0     | 0     | No    | 11    |
 | 1     | 3     | 0     | 0     | 0     | No    | 0     |
 | 1     | 4     | 0     | 0     | 0     | No    | 0     |
 | 1     | 5     | 0     | 0     | 0     | No    | 0     |
 | 1     | 6     | 0     | 0     | 0     | No    | 0     |
 | 1     | 7     | 0     | 0     | 0     | No    | 0     |
 | 1     | 8     | 0     | 0     | 0     | No    | 0     |
 | 1     | 9     | 0     | 0     | 0     | No    | 0     |
 | 1     | 10    | 0     | 0     | 0     | No    | 0     |
 | 1     | 11    | 0     | 0     | 0     | No    | 0     |
 | 1     | 12    | 0     | 0     | 0     | No    | 0     |
 | 1     | 13    | 0     | 0     | 0     | No    | 0     |
 | 1     | 14    | 0     | 0     | 0     | No    | 0     |
 | 1     | 15    | 0     | 0     | 0     | No    | 0     |
Command completed successfully.

The disk will be replaced.




Comment 13 rosa 2001-06-25 13:52:08 UTC
Since replacing the raidcontroller (this was done while I was on holiday) 
didn't fix the problem, IBM support decided to replace the backplane,
which they did early this afternoon (Jun 25 13:30)

After booting the machine I started one job
[rebuild@green ]$ rpm --rebuild IN/gcc-2.96-85.src.rpm >&  gcc.rebuild.$$ & [1]
29045
to get some disk activity going and soon got the reset message again:

Jun 25 14:02:36 green kernel: ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
Jun 25 14:17:52 green kernel: (ips0) Resetting controller.

To give a detailed report I decided to go through the status commands that IBM
support 
asked us to go through last time.
Notice there is one succesfull `ipssend getevent 1 device' below but 
the second time it hung the machine...


[root@green /root]# ipssend devinfo 1 1 2

Found 1 IBM ServeRAID controller(s).
Device information has been initiated for controller 1...
         Device is a Hard disk
         Channel                  : 1
         SCSI ID                  : 2
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 17357/35548048
         Device ID                : IBM-PSG DDYS-T18S9HA5EM0Z268
Command completed successfully.
[root@green /root]# ipssend devinfo 1 1 1

Found 1 IBM ServeRAID controller(s).
Device information has been initiated for controller 1...
         Device is a Hard disk
         Channel                  : 1
         SCSI ID                  : 1
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 17357/35548048
         Device ID                : IBM-PSG DDYS-T18S9HA5ELLL611
Command completed successfully.
[root@green /root]# ipssend devinfo 1 1 0

Found 1 IBM ServeRAID controller(s).
Device information has been initiated for controller 1...
         Device is a Hard disk
         Channel                  : 1
         SCSI ID                  : 0
         PFA (Yes/No)             : No
         State                    : Online (ONL)
         Size (in MB)/(in sectors): 17357/35548048
         Device ID                : IBM-PSG DDYS-T18S9HA5ELLH601
Command completed successfully.
[root@green /root]# ipssend getevent 1 device

Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
   BIOS version                   : 4.70.17
   Firmware version               : 4.70.17
Device event table:
   |Channel|SCSI ID|Parity |Soft   |Hard   |PFA    |Misc   |
   |-------|-------|-------|-------|-------|-------|-------|
   | 1     | 0     | 0     | 0     | 0     | No    | 14    |
   | 1     | 1     | 0     | 0     | 0     | No    | 19    |
   | 1     | 2     | 0     | 0     | 0     | No    | 45    |
   | 1     | 3     | 0     | 0     | 0     | No    | 0     |
   | 1     | 4     | 0     | 0     | 0     | No    | 0     |
   | 1     | 5     | 0     | 0     | 0     | No    | 0 |
   | 1     | 6     | 0     | 0     | 0     | No    | 0     |
   | 1     | 7     | 0     | 0     | 0     | No    | 0     |
   | 1     | 8     | 0     | 0     | 0     | No    | 0     |
   | 1     | 9     | 0     | 0     | 0     | No    | 0     |
   | 1     | 10    | 0     | 0     | 0     | No    | 0     |
   | 1     | 11    | 0     | 0     | 0     | No    | 0     |
   | 1     | 12    | 0     | 0     | 0     | No    | 0     |
   | 1     | 13    | 0     | 0     | 0     | No    | 0     |
   | 1     | 14    | 0     | 0     | 0     | No    | 0     |
   | 1     | 15    | 0     | 0     | 0     | No    | 0     |
Command completed successfully.

[root@green /root]# ipssend getevent 1 soft

Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
   BIOS version                   : 4.70.17
   Firmware version               : 4.70.17
Controller soft event log (1023 entries):
10020102
8001D0CD
10050202
100000F0
8001D0CD
10F62902
10070101
8001D0CD
10050201
100000F0
8001D0CD
10F62902
100A0200
8001D0CE
10050200
100000F0
8001D0CE
0172000E
01030100
01041B00
0166BABE
8001D535
100503A7
100000F0
8001D535
010C000E
01830100
01041A00
0110BABE
8001D565
1005030C
100000F0
8001D565
010B000E
01830100
01041A00
0169BABE
8001D5B5
1005030B
100000F0
8001D5B5
011A000E
01030100
01041A00
010FBABE
8001D5BD
1005031A
100000F0
8001D5BD
0123000E
01830100
01021800
0121BABE
8001D5BD
0223FFFF
02010918
8001D5BD
0128000E
01030100
01041A00
0168BABE
8001D5C1
10050328
100000F0
8001D5C1
012C000E
01830100
01041A00
014EBABE
8001D5C1
1005032C
100000F0
8001D5C1
012D000E
01830100
01041A00
012EBABE
8001D5C1
1005032D
100000F0
8001D5C1
0138000E
01830100
01021800
0131BABE
8001D5C5
0238FFFF
02010918
8001D5C5
0147000E
01830100
01021800
016FBABE
8001D5C9
0247FFFF
02010918
8001D5C9
014E000E
01830100
01041A00
0107BABE
8001D5CD
1005034E
100000F0
8001D5CD
0150000E
01830100
01041B00
010CBABE
8001D5CD
100503AB
100000F0
8001D5CD
0167000E
01030100
01041A00
013EBABE
8001D5D1
10050367
100000F0
8001D5D1
0171000E
01830100
01041A00
017FBABE
8001D5D5
10050371
100000F0
8001D5D5
0169000E
01830100
01041A00
0116BABE
8001D5D5
10050369
100000F0
8001D5D5
0177000E
01030100
01021800
0152BABE
8001D5D9
0277FFFF
02010918
8001D5D9
0103000E
01830100
01041B00
0179BABE
8001D605
100503B8
100000F0
8001D605
0104000E
01030100
01041A00
0129BABE
8001D605
10050304
100000F0
8001D605
0106000E
01830100
01041A00
0160BABE
8001D605
10050306
100000F0
8001D605
010D000E
01030100
01021800
0134BABE
8001D609
020DFFFF
02010918
8001D609
010F000E
01030100
01041A00
010ABABE
8001D609
1005030F
100000F0
8001D609
0110000E
01830100
01041B00
017ABABE
8001D609
10050396
100000F0
8001D609
0116000E
01830100
01041A00
015DBABE
8001D609
10050316
100000F0
8001D609
0119000E
01020100
01021000
0136BABE
8001D609
0219FFFF
02010910
8001D609
10050319
100000F0
8001D609
011B000E
01830100
01041B00
0142BABE
8001D60D
100503CB
100000F0
8001D60D
0120000E
01030100
01041B00
010BBABE
8001D60D
10050384
100000F0
8001D60D
0122000E
01830100
01041A00
0177BABE
8001D60D
10050322
100000F0
8001D60D
0124000E
01830100
01041A00
0102BABE
8001D60D
10050324
100000F0
8001D60D
0127000E
01030100
01041B00
014ABABE
8001D611
10050390
100000F0
8001D611
0129000E
01830100
01021800
0171BABE
8001D611
0229FFFF
02010918
8001D611
012E000E
01830100
01041B00
0151BABE
8001D611
10050389
100000F0
8001D611
012F000E
01830100
01021800
0139BABE
8001D611
022FFFFF
02010918
8001D611
0130000E
01030100
01041B00
0124BABE
8001D611
100503DD
100000F0
8001D611
0131000E
01830100
01041B00
0156BABE
8001D611
100503AC
100000F0
8001D611
0132000E
01030100
01041B00
0158BABE
8001D611
100503ED
100000F0
8001D611
0133000E
01830100
01041A00
0140BABE
8001D612
10050333
100000F0
8001D612
013E000E
01830100
01041A00
011CBABE
8001D615
1005033E
100000F0
8001D615
0140000E
01030100
01041A00
0122BABE
8001D615
10050340
100000F0
8001D615
0134000E
01830100
01041A00
0138BABE
8001D615
10050334
100000F0
8001D615
0136000E
01830100
01041A00
011DBABE
8001D615
10050336
100000F0
8001D615
0137000E
01830100
01041B00
012DBABE
8001D615
100503B2
100000F0
8001D615
0139000E
01830100
01041B00
0128BABE
8001D615
100503AD
100000F0
8001D615
013B000E
01030100
01041A00
0137BABE
8001D616
1005033B
100000F0
8001D616
013C000E
01030100
01041B00
0103BABE
8001D616
1005039E
100000F0
8001D616
0142000E
01030100
01021800
015ABABE
8001D619
0242FFFF
02010918
8001D619
0143000E
01830100
01023700
0143BABE
8001D619
0243FFFF
02010937
8001D619
0144000E
01830100
01041A00
010EBABE
8001D619
10050344
100000F0
8001D619
0145000E
01830100
01021400
0104BABE
8001D619
0146000E
01830100
01041B00
0153BABE
8001D619
100503E4
100000F0
8001D619
0149000E
01830100
01041B00
016DBABE
8001D619
10050393
100000F0
8001D619
014A000E
01830100
01041B00
014BBABE
8001D619
100503D1
100000F0
8001D619
014C000E
01030100
01041B00
0105BABE
8001D619
100503F5
100000F0
8001D619
014D000E
01830100
01023700
0174BABE
8001D619
024DFFFF
02010937
8001D619
0141000E
01830100
01041B00
0167BABE
8001D61A
100503D2
100000F0
8001D61A
014F000E
01830100
01041A00
0155BABE
8001D61D
1005034F
100000F0
8001D61D
0151000E
01830100
01021400
0133BABE
8001D61D
0154000E
01030100
01041B00
012ABABE
8001D61D
10050397
100000F0
8001D61D
0155000E
01830100
01041B00
0113BABE
8001D61D
100503F4
100000F0
8001D61D
0156000E
01030100
01041B00
016EBABE
8001D61D
10050388
100000F0
8001D61D
0157000E
01830100
01041B00
0111BABE
8001D61D
100503F2
100000F0
8001D61D
0159000E
01830100
01041B00
0178BABE
8001D61E
100503BF
100000F0
8001D61E
015A000E
01030100
01041B00
0165BABE
8001D61E
100503A5
100000F0
8001D61E
015B000E
01830100
01041A00
0170BABE
8001D621
1005035B
100000F0
8001D621
015C000E
01830100
01041B00
0144BABE
8001D621
100503EF
100000F0
8001D621
015D000E
01030100
01041A00
0175BABE
8001D621
1005035D
100000F0
8001D621
015E000E
01830100
01021400
0162BABE
8001D621
015F000E
01830100
01041A00
0176BABE
8001D621
1005035F
100000F0
8001D621
0160000E
01830100
01023700
0164BABE
8001D621
0260FFFF
02010937
8001D621
0162000E
01830100
01041B00
011ABABE
8001D621
100503F0
100000F0
8001D621
0163000E
01830100
01021400
0173BABE
8001D621
0164000E
01830100
01021400
013ABABE
8001D621
0165000E
01030100
01041B00
0141BABE
8001D621
100503A9
100000F0
8001D621
0166000E
01830100
01041A00
013CBABE
8001D621
10050366
100000F0
8001D621
0168000E
01830100
01021400
0154BABE
8001D625
016A000E
01830100
01041A00
0149BABE
8001D625
1005036A
100000F0
8001D625
016B000E
01830100
01021400
017BBABE
8001D625
016C000E
01830100
01041B00
016BBABE
8001D625
100503C0
100000F0
8001D625
016D000E
01830100
01021800
0163BABE
8001D625
026DFFFF
02010918
8001D625
016E000E
01830100
01041B00
0145BABE
8001D625
1005038F
100000F0
8001D625
016F000E
01830100
01041A00
015CBABE
8001D625
1005036F
100000F0
8001D625
0170000E
01030100
01021800
012CBABE
8001D625
0270FFFF
02010918
8001D625
0173000E
01830100
01021400
0146BABE
8001D625
0174000E
01830100
01041B00
017CBABE
8001D625
10050380
100000F0
8001D625
0175000E
01830100
01041B00
014DBABE
8001D629
1005038C
100000F0
8001D629
0176000E
01830100
01041B00
017DBABE
8001D629
100503B6
100000F0
8001D629
0179000E
01830100
01041A00
013FBABE
8001D629
10050379
100000F0
8001D629
017A000E
01830100
01041A00
013BBABE
8001D629
1005037A
100000F0
8001D629
017B000E
01830100
01021400
016ABABE
8001D629
017C000E
01830100
01041A00
0114BABE
8001D629
1005037C
100000F0
8001D629
017D000E
01830100
01041B00
0108BABE
8001D629
100503E8
100000F0
8001D629
017E000E
01830100
01041B00
015FBABE
8001D629
100503B9
100000F0
8001D629
0100000E
01830100
01041B00
011EBABE
8001D655
100503DF
100000F0
8001D655
0105000E
01030100
01021400
0150BABE
8001D655
0109000E
01830100
01021400
0112BABE
8001D655
010E000E
01030100
01041B00
0123BABE
8001D659
100503B1
100000F0
8001D659
0111000E
01830100
01021400
015BBABE
8001D659
0112000E
01830100
01041B00
014CBABE
8001D659
10050394
100000F0
8001D659
0114000E
01830100
01021400
0126BABE
8001D659
0115000E
01830100
01021400
012BBABE
8001D659
0117000E
01830100
01025F00
0157BABE
8001D659
0217FFFF
0201095F
8001D659
0118000E
01830100
01021800
017EBABE
8001D659
0218FFFF
02010918
8001D659
011E000E
01030100
01041B00
0148BABE
8001D65D
1005038A
100000F0
8001D65D
0121000E
01830100
01021400
0109BABE
8001D65D
012B000E
01830100
01025F00
0172BABE
8001D661
022BFFFF
0201095F
8001D661
013A000E
01830100
01021400
0130BABE
8001D665
013D000E
01030100
01025F00
016CBABE
8001D665
023DFFFF
0201095F
8001D665
013F000E
01830100
01021400
0100BABE
8001D665
0148000E
01830100
01021400
013DBABE
8001D669
0152000E
01030100
01021400
0118BABE
8001D66D
0153000E
01830100
01021400
011BBABE
8001D66D
0101000E
01830000
01021800
014FBABE
8001D6A5
0201FFFF
02000918
8001D6A5
10F62900
10850008
8001D6A5
0108000E
01830000
01021800
0106BABE
8001D6A7
0208FFFF
02000918
8001D6A7
0113000E
01830100
01025F00
010DBABE
8001D6A9
0213FFFF
0201095F
8001D6A9
011C000E
01030100
01021400
012FBABE
8001D6AD
011D000E
01830000
01041B00
0147BABE
8001D6AD
100503C9
100000F0
8001D6AD
10050001
10C90302
1000002A
8001D6AD
10050100
100000F0
8001D6AD
011F000E
01830100
01021400
0159BABE
8001D711
0125000E
01830100
01021400
0115BABE
8001D711
0126000E
01830000
01041B00
0135BABE
8001D711
100503E7
100000F0
8001D711
012A000E
01830000
01351600
0120BABE
8001D711
3F2A041A
8001D712
1005039D
100000F0
8001D712
0135000E
01830000
01021800
0161BABE
8001D712
0235FFFF
02000918
8001D712
014B000E
01830100
01021400
0119BABE
8001D712
0158000E
01830000
01041B00
0125BABE
8001D712
100503C4
100000F0
8001D712
0178000E
01830000
01041B00
015EBABE
8001D712
1005038B
100000F0
8001D712
10F62902
10060002
8001D7E9
10050202
100000F0
8001D7E9
10F62902
10060100
8001D7EA
10050200
100000F0
8001D7EA
04DDD001
8001D7F1
10F62900
108B0008
8001D7F1
04DDD001
8001D7F3
04DDD001
8001D7F3
04DDD001
8001D7F4
04DDD001
8001D7F4
04DDD001
8001D7F6
04DDD001
8001D7F6
04DDD001
8001D7F7
04DDD001
8001D7F7
04DDD001
8001D7F8
04DDD001
8001D7F8
04DDD001
8001D7F8
04DDD001
8001D7F8
04DDD001
8001D7F8
04DDD001
8001D7F9
04DDD001
8001D7FA
04DDD001
8001D7FB
04DDD001
8001D7FB
04DDD001
8001D7FC
04DDD001
8001D7FC
04DDD001
8001D7FD
04DDD001
8001D826
04DDD001
8001D82C
0172000E
01030000
01021800
0166BABE
8001DC15
0272FFFF
02000918
8001DC15
010D000E
01030000
01021800
0134BABE
8001DCE9
020DFFFF
02000918
8001DCE9
70108002
800003FB
Command completed successfully.

[root@green /root]# ipssend getevent 1 device

Found 1 IBM ServeRAID controller(s).

And then the machine went dead again, no response anymore not even on the
console ...
Phew, this sure is becoming one expensive server by now ..



Comment 14 rosa 2001-06-25 13:54:40 UTC
 Since replacing the raidcontroller (this was done while I was on holiday)
didn't fix the problem, IBM support decided to replace the backplane,
which they did early this afternoon (Jun 25 13:30)

After booting the machine I started one job
[rebuild@green ]$ rpm --rebuild IN/gcc-2.96-85.src.rpm >& gcc.rebuild.$$ & [1]
29045
to get some disk activity going and soon got the reset message again:

Jun 25 14:02:36 green kernel: ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
Jun 25 14:17:52 green kernel: (ips0) Resetting controller.

To give a detailed report I decided to go through the status commands that IBM
support
asked us to go through last time.
Notice there is one succesfull `ipssend getevent 1 device' below but
the second time it hung the machine...


[root@green /root]# ipssend devinfo 1 1 2

Found 1 IBM ServeRAID controller(s).
Device information has been initiated for controller 1...
Device is a Hard disk
Channel : 1
SCSI ID : 2
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 17357/35548048
Device ID : IBM-PSG DDYS-T18S9HA5EM0Z268
Command completed successfully.
[root@green /root]# ipssend devinfo 1 1 1

Found 1 IBM ServeRAID controller(s).
Device information has been initiated for controller 1...
Device is a Hard disk
Channel : 1
SCSI ID : 1
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 17357/35548048
Device ID : IBM-PSG DDYS-T18S9HA5ELLL611
Command completed successfully.
[root@green /root]# ipssend devinfo 1 1 0

Found 1 IBM ServeRAID controller(s).
Device information has been initiated for controller 1...
Device is a Hard disk
Channel : 1
SCSI ID : 0
PFA (Yes/No) : No
State : Online (ONL)
Size (in MB)/(in sectors): 17357/35548048
Device ID : IBM-PSG DDYS-T18S9HA5ELLH601
Command completed successfully.
[root@green /root]# ipssend getevent 1 device

Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
BIOS version : 4.70.17
Firmware version : 4.70.17
Device event table:
|Channel|SCSI ID|Parity |Soft |Hard |PFA |Misc |
|-------|-------|-------|-------|-------|-------|-------|
| 1 | 0 | 0 | 0 | 0 | No | 14 |
| 1 | 1 | 0 | 0 | 0 | No | 19 |
| 1 | 2 | 0 | 0 | 0 | No | 45 |
| 1 | 3 | 0 | 0 | 0 | No | 0 |
| 1 | 4 | 0 | 0 | 0 | No | 0 |
| 1 | 5 | 0 | 0 | 0 | No | 0 |
| 1 | 6 | 0 | 0 | 0 | No | 0 |
| 1 | 7 | 0 | 0 | 0 | No | 0 |
| 1 | 8 | 0 | 0 | 0 | No | 0 |
| 1 | 9 | 0 | 0 | 0 | No | 0 |
| 1 | 10 | 0 | 0 | 0 | No | 0 |
| 1 | 11 | 0 | 0 | 0 | No | 0 |
| 1 | 12 | 0 | 0 | 0 | No | 0 |
| 1 | 13 | 0 | 0 | 0 | No | 0 |
| 1 | 14 | 0 | 0 | 0 | No | 0 |
| 1 | 15 | 0 | 0 | 0 | No | 0 |
Command completed successfully.

[root@green /root]# ipssend getevent 1 soft

Found 1 IBM ServeRAID controller(s).
Get event table has been initiated for controller 1...
BIOS version : 4.70.17
Firmware version : 4.70.17
Controller soft event log (1023 entries):
10020102
8001D0CD
10050202
100000F0
8001D0CD
10F62902
10070101
8001D0CD
10050201
100000F0
8001D0CD
10F62902
100A0200
8001D0CE
10050200
100000F0
8001D0CE
0172000E
01030100
01041B00
0166BABE
8001D535
100503A7
100000F0
8001D535
010C000E
01830100
01041A00
0110BABE
8001D565
1005030C
100000F0
8001D565
010B000E
01830100
01041A00
0169BABE
8001D5B5
1005030B
100000F0
8001D5B5
011A000E
01030100
01041A00
010FBABE
8001D5BD
1005031A
100000F0
8001D5BD
0123000E
01830100
01021800
0121BABE
8001D5BD
0223FFFF
02010918
8001D5BD
0128000E
01030100
01041A00
0168BABE
8001D5C1
10050328
100000F0
8001D5C1
012C000E
01830100
01041A00
014EBABE
8001D5C1
1005032C
100000F0
8001D5C1
012D000E
01830100
01041A00
012EBABE
8001D5C1
1005032D
100000F0
8001D5C1
0138000E
01830100
01021800
0131BABE
8001D5C5
0238FFFF
02010918
8001D5C5
0147000E
01830100
01021800
016FBABE
8001D5C9
0247FFFF
02010918
8001D5C9
014E000E
01830100
01041A00
0107BABE
8001D5CD
1005034E
100000F0
8001D5CD
0150000E
01830100
01041B00
010CBABE
8001D5CD
100503AB
100000F0
8001D5CD
0167000E
01030100
01041A00
013EBABE
8001D5D1
10050367
100000F0
8001D5D1
0171000E
01830100
01041A00
017FBABE
8001D5D5
10050371
100000F0
8001D5D5
0169000E
01830100
01041A00
0116BABE
8001D5D5
10050369
100000F0
8001D5D5
0177000E
01030100
01021800
0152BABE
8001D5D9
0277FFFF
02010918
8001D5D9
0103000E
01830100
01041B00
0179BABE
8001D605
100503B8
100000F0
8001D605
0104000E
01030100
01041A00
0129BABE
8001D605
10050304
100000F0
8001D605
0106000E
01830100
01041A00
0160BABE
8001D605
10050306
100000F0
8001D605
010D000E
01030100
01021800
0134BABE
8001D609
020DFFFF
02010918
8001D609
010F000E
01030100
01041A00
010ABABE
8001D609
1005030F
100000F0
8001D609
0110000E
01830100
01041B00
017ABABE
8001D609
10050396
100000F0
8001D609
0116000E
01830100
01041A00
015DBABE
8001D609
10050316
100000F0
8001D609
0119000E
01020100
01021000
0136BABE
8001D609
0219FFFF
02010910
8001D609
10050319
100000F0
8001D609
011B000E
01830100
01041B00
0142BABE
8001D60D
100503CB
100000F0
8001D60D
0120000E
01030100
01041B00
010BBABE
8001D60D
10050384
100000F0
8001D60D
0122000E
01830100
01041A00
0177BABE
8001D60D
10050322
100000F0
8001D60D
0124000E
01830100
01041A00
0102BABE
8001D60D
10050324
100000F0
8001D60D
0127000E
01030100
01041B00
014ABABE
8001D611
10050390
100000F0
8001D611
0129000E
01830100
01021800
0171BABE
8001D611
0229FFFF
02010918
8001D611
012E000E
01830100
01041B00
0151BABE
8001D611
10050389
100000F0
8001D611
012F000E
01830100
01021800
0139BABE
8001D611
022FFFFF
02010918
8001D611
0130000E
01030100
01041B00
0124BABE
8001D611
100503DD
100000F0
8001D611
0131000E
01830100
01041B00
0156BABE
8001D611
100503AC
100000F0
8001D611
0132000E
01030100
01041B00
0158BABE
8001D611
100503ED
100000F0
8001D611
0133000E
01830100
01041A00
0140BABE
8001D612
10050333
100000F0
8001D612
013E000E
01830100
01041A00
011CBABE
8001D615
1005033E
100000F0
8001D615
0140000E
01030100
01041A00
0122BABE
8001D615
10050340
100000F0
8001D615
0134000E
01830100
01041A00
0138BABE
8001D615
10050334
100000F0
8001D615
0136000E
01830100
01041A00
011DBABE
8001D615
10050336
100000F0
8001D615
0137000E
01830100
01041B00
012DBABE
8001D615
100503B2
100000F0
8001D615
0139000E
01830100
01041B00
0128BABE
8001D615
100503AD
100000F0
8001D615
013B000E
01030100
01041A00
0137BABE
8001D616
1005033B
100000F0
8001D616
013C000E
01030100
01041B00
0103BABE
8001D616
1005039E
100000F0
8001D616
0142000E
01030100
01021800
015ABABE
8001D619
0242FFFF
02010918
8001D619
0143000E
01830100
01023700
0143BABE
8001D619
0243FFFF
02010937
8001D619
0144000E
01830100
01041A00
010EBABE
8001D619
10050344
100000F0
8001D619
0145000E
01830100
01021400
0104BABE
8001D619
0146000E
01830100
01041B00
0153BABE
8001D619
100503E4
100000F0
8001D619
0149000E
01830100
01041B00
016DBABE
8001D619
10050393
100000F0
8001D619
014A000E
01830100
01041B00
014BBABE
8001D619
100503D1
100000F0
8001D619
014C000E
01030100
01041B00
0105BABE
8001D619
100503F5
100000F0
8001D619
014D000E
01830100
01023700
0174BABE
8001D619
024DFFFF
02010937
8001D619
0141000E
01830100
01041B00
0167BABE
8001D61A
100503D2
100000F0
8001D61A
014F000E
01830100
01041A00
0155BABE
8001D61D
1005034F
100000F0
8001D61D
0151000E
01830100
01021400
0133BABE
8001D61D
0154000E
01030100
01041B00
012ABABE
8001D61D
10050397
100000F0
8001D61D
0155000E
01830100
01041B00
0113BABE
8001D61D
100503F4
100000F0
8001D61D
0156000E
01030100
01041B00
016EBABE
8001D61D
10050388
100000F0
8001D61D
0157000E
01830100
01041B00
0111BABE
8001D61D
100503F2
100000F0
8001D61D
0159000E
01830100
01041B00
0178BABE
8001D61E
100503BF
100000F0
8001D61E
015A000E
01030100
01041B00
0165BABE
8001D61E
100503A5
100000F0
8001D61E
015B000E
01830100
01041A00
0170BABE
8001D621
1005035B
100000F0
8001D621
015C000E
01830100
01041B00
0144BABE
8001D621
100503EF
100000F0
8001D621
015D000E
01030100
01041A00
0175BABE
8001D621
1005035D
100000F0
8001D621
015E000E
01830100
01021400
0162BABE
8001D621
015F000E
01830100
01041A00
0176BABE
8001D621
1005035F
100000F0
8001D621
0160000E
01830100
01023700
0164BABE
8001D621
0260FFFF
02010937
8001D621
0162000E
01830100
01041B00
011ABABE
8001D621
100503F0
100000F0
8001D621
0163000E
01830100
01021400
0173BABE
8001D621
0164000E
01830100
01021400
013ABABE
8001D621
0165000E
01030100
01041B00
0141BABE
8001D621
100503A9
100000F0
8001D621
0166000E
01830100
01041A00
013CBABE
8001D621
10050366
100000F0
8001D621
0168000E
01830100
01021400
0154BABE
8001D625
016A000E
01830100
01041A00
0149BABE
8001D625
1005036A
100000F0
8001D625
016B000E
01830100
01021400
017BBABE
8001D625
016C000E
01830100
01041B00
016BBABE
8001D625
100503C0
100000F0
8001D625
016D000E
01830100
01021800
0163BABE
8001D625
026DFFFF
02010918
8001D625
016E000E
01830100
01041B00
0145BABE
8001D625
1005038F
100000F0
8001D625
016F000E
01830100
01041A00
015CBABE
8001D625
1005036F
100000F0
8001D625
0170000E
01030100
01021800
012CBABE
8001D625
0270FFFF
02010918
8001D625
0173000E
01830100
01021400
0146BABE
8001D625
0174000E
01830100
01041B00
017CBABE
8001D625
10050380
100000F0
8001D625
0175000E
01830100
01041B00
014DBABE
8001D629
1005038C
100000F0
8001D629
0176000E
01830100
01041B00
017DBABE
8001D629
100503B6
100000F0
8001D629
0179000E
01830100
01041A00
013FBABE
8001D629
10050379
100000F0
8001D629
017A000E
01830100
01041A00
013BBABE
8001D629
1005037A
100000F0
8001D629
017B000E
01830100
01021400
016ABABE
8001D629
017C000E
01830100
01041A00
0114BABE
8001D629
1005037C
100000F0
8001D629
017D000E
01830100
01041B00
0108BABE
8001D629
100503E8
100000F0
8001D629
017E000E
01830100
01041B00
015FBABE
8001D629
100503B9
100000F0
8001D629
0100000E
01830100
01041B00
011EBABE
8001D655
100503DF
100000F0
8001D655
0105000E
01030100
01021400
0150BABE
8001D655
0109000E
01830100
01021400
0112BABE
8001D655
010E000E
01030100
01041B00
0123BABE
8001D659
100503B1
100000F0
8001D659
0111000E
01830100
01021400
015BBABE
8001D659
0112000E
01830100
01041B00
014CBABE
8001D659
10050394
100000F0
8001D659
0114000E
01830100
01021400
0126BABE
8001D659
0115000E
01830100
01021400
012BBABE
8001D659
0117000E
01830100
01025F00
0157BABE
8001D659
0217FFFF
0201095F
8001D659
0118000E
01830100
01021800
017EBABE
8001D659
0218FFFF
02010918
8001D659
011E000E
01030100
01041B00
0148BABE
8001D65D
1005038A
100000F0
8001D65D
0121000E
01830100
01021400
0109BABE
8001D65D
012B000E
01830100
01025F00
0172BABE
8001D661
022BFFFF
0201095F
8001D661
013A000E
01830100
01021400
0130BABE
8001D665
013D000E
01030100
01025F00
016CBABE
8001D665
023DFFFF
0201095F
8001D665
013F000E
01830100
01021400
0100BABE
8001D665
0148000E
01830100
01021400
013DBABE
8001D669
0152000E
01030100
01021400
0118BABE
8001D66D
0153000E
01830100
01021400
011BBABE
8001D66D
0101000E
01830000
01021800
014FBABE
8001D6A5
0201FFFF
02000918
8001D6A5
10F62900
10850008
8001D6A5
0108000E
01830000
01021800
0106BABE
8001D6A7
0208FFFF
02000918
8001D6A7
0113000E
01830100
01025F00
010DBABE
8001D6A9
0213FFFF
0201095F
8001D6A9
011C000E
01030100
01021400
012FBABE
8001D6AD
011D000E
01830000
01041B00
0147BABE
8001D6AD
100503C9
100000F0
8001D6AD
10050001
10C90302
1000002A
8001D6AD
10050100
100000F0
8001D6AD
011F000E
01830100
01021400
0159BABE
8001D711
0125000E
01830100
01021400
0115BABE
8001D711
0126000E
01830000
01041B00
0135BABE
8001D711
100503E7
100000F0
8001D711
012A000E
01830000
01351600
0120BABE
8001D711
3F2A041A
8001D712
1005039D
100000F0
8001D712
0135000E
01830000
01021800
0161BABE
8001D712
0235FFFF
02000918
8001D712
014B000E
01830100
01021400
0119BABE
8001D712
0158000E
01830000
01041B00
0125BABE
8001D712
100503C4
100000F0
8001D712
0178000E
01830000
01041B00
015EBABE
8001D712
1005038B
100000F0
8001D712
10F62902
10060002
8001D7E9
10050202
100000F0
8001D7E9
10F62902
10060100
8001D7EA
10050200
100000F0
8001D7EA
04DDD001
8001D7F1
10F62900
108B0008
8001D7F1
04DDD001
8001D7F3
04DDD001
8001D7F3
04DDD001
8001D7F4
04DDD001
8001D7F4
04DDD001
8001D7F6
04DDD001
8001D7F6
04DDD001
8001D7F7
04DDD001
8001D7F7
04DDD001
8001D7F8
04DDD001
8001D7F8
04DDD001
8001D7F8
04DDD001
8001D7F8
04DDD001
8001D7F8
04DDD001
8001D7F9
04DDD001
8001D7FA
04DDD001
8001D7FB
04DDD001
8001D7FB
04DDD001
8001D7FC
04DDD001
8001D7FC
04DDD001
8001D7FD
04DDD001
8001D826
04DDD001
8001D82C
0172000E
01030000
01021800
0166BABE
8001DC15
0272FFFF
02000918
8001DC15
010D000E
01030000
01021800
0134BABE
8001DCE9
020DFFFF
02000918
8001DCE9
70108002
800003FB
Command completed successfully.

[root@green /root]# ipssend getevent 1 device

Found 1 IBM ServeRAID controller(s).

And then the machine went dead again, no response anymore not even on the
console ...
Phew, this sure is becoming one expensive server by now ..



Comment 15 rosa 2001-07-02 02:13:41 UTC
Created attachment 22387 [details]
for the record, see next attachmnt. for the rest of the story

Comment 16 rosa 2001-07-02 02:22:37 UTC
Created attachment 22388 [details]
To me this looks like the proof that it's bad hardware that caused all this.