Red Hat Bugzilla – Bug 174768
System crashes after several minutes (15-20) after normal bootup
Last modified: 2013-08-05 21:42:12 EDT
From Bugzilla Helper:
User-Agent: Opera/8.5 (X11; Linux i686; U; en)
Description of problem:
Kernel crashes with this message (there was two other routines down to stack -
related to registers show).
sp = a0000001006cba90
bsp = a0000001006c5018
But system still can be ping'ed, but processes don't work.
Version-Release number of selected component (if applicable):
Update 2 is installed
Steps to Reproduce:
1. Boot machine
2. Wait several minutes (15-20)
The system is IBM x455 server consisting of two boxes connected with expansion
cable. NUMA support is working correctly. RHEL3 works OK on this machine
Today, I've tried to install the same RedHat release on another stand-alone x455
box. This system works OK.
The hardware difference between machine 1 (buggy) and machine 2 (normal):
1. machine 1 has the following additional HW components:
1.1. Topspin HCA card (drivers not installed)
1.2. QLogic FiberChannel adapter (detected and comfigured properly)
1.3. machine 1 is in fact two x455 nodes interconnected with IBM scalability
2. machine 1 has 32+32Gb of physical RAM distributed on two nodes, machine 2 has
16 Gb of RAM
Thanks for you attention to this problem.
If you need additional information to resolve this, just ask.
Please re-test it with the latest rhel 4, I'm closing it ... since I'm __not__
aware of any this kind of problem recently. Please re-open the bug if you can
rproduce with the recent rhel 4 or rhel 5 or upstream kernel.