Bug 123307

Summary: HP Proliant system freezes, tg3 driver
Product: Red Hat Enterprise Linux 3 Reporter: Ole Holm Nielsen <ole.h.nielsen>
Component: kernelAssignee: Chris Williams <cww>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: crn1, petrides, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 19:26:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ole Holm Nielsen 2004-05-16 20:29:36 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6)
Gecko/20040113

Description of problem:
I upgraded a HP Proliant ML350G3 to Red Hat Enterprise Linux AS 
release 3 (Taroon Update 2).  The bcm5700 Ethernet driver then
got replaced by the tg3 driver.  We have had the system freeze
twice with intervals of about 20 hours:  No network traffic 
possible, login to (non-graphical) console hangs after username
is entered.  No response to Ctrl-Alt-Del.  Power cycling was
the only way to bring the system on-line again.
We expect this to be caused by the new tg3 driver.  Frequent 
system freezes are of course totally unacceptable.

We reinstalled HP's recommended bcm5700 driver from the HP website.
System has been up for 12 hours so far without problems.

Version-Release number of selected component (if applicable):
kernel-2.4.21-15.ELsmp

How reproducible:
Always

Steps to Reproduce:
1. Install RHEL 3.0 Update 2 on HP Proliant with Broadcom Gigabit
Ethernet.
2. The tg3 driver replaces bcm5700 upon reboot.
3. With moderate traffic, system froze twice in 2 days.
    

Actual Results:  No network traffic possible, login to (non-graphical)
console 
hangs after username is entered.  No response to Ctrl-Alt-Del. 

Expected Results:  The system should on no account freeze.

Additional info:

Nothing is printed in the syslog around the time of the system freeze.

Comment 1 Ole Holm Nielsen 2004-05-19 08:06:42 UTC
For the past 3 days our HP Proliant system was running stably with
the bcm5700 network driver.  But this morning the system froze once
more.  So the problem is not necessarily related to the tg3 driver.
IMHO, we cannot trust the kernel-2.4.21-15.ELsmp to run stably on
this HP Proliant hardware.  We have reverted to the previous kernel 
2.4.21-9.0.3ELsmp which has never caused us any problems.  We're
hoping for a resolution of the present problem.

Comment 2 Ole Holm Nielsen 2004-05-28 10:36:59 UTC
We also had a hang with kernel 2.4.21-9.0.3ELsmp, which was reliable
previously.  Then we stopped the HP hardware monitoring software
(Proliant Support Pack) "hpasm" daemons, and we haven't had any
further problems for 8 days now ! 
It appears that upgrades in Update 2 interact with the hpasm daemons
in a bad way.  HP is very slow in releasing new versions of Proliant
Support Pack as the Linux kernel versions change :-(
It is still an interesting problem how the hpasm daemons can
cause the kernel to freeze, so perhaps the bug should be left open.
Perhaps HP people have some insight into the problem ?

Additional info: Our SCSI controller is a HP SmartArray 641 using
the cciss driver.  Don't know if this is relevant for the bug.

Comment 3 Rik van Riel 2004-05-28 12:09:22 UTC
Reassigning to our HP contact, since the problem seems to be with hpasm.

Comment 4 Cesar B 2004-06-25 14:40:31 UTC
Hi, lamentably i have a similar case, a Compaq Proliant ML-350 G3 
with Smart Array 64xx , under Linux Enterprise Server 3.0, our system 
system freeze with intervals of 2,8, ?  days , without apparent 
reason.
I upgrade to kernel 2.4.21-15.0.2.ELsmp, the last released in 
http://www.redhat.com/security/ , and upgrade Rom Flash components, 
nevertheless the problem persists (excuses by my badly english).
I read you commentaries, I not run "hpasm" daemons, lamentably stop 
this daemons is not a solution for as.
At the moment we are contacting with Red Hat looking for the solution.
 
I will thank for any commentary that can do to me.

Thanks, César.

Comment 5 Andre ten Bohmer 2004-11-29 14:35:32 UTC
Hi, lock ups on several HP servers with Red Hat EL AS 2.1 and 3:
- dl-380-g2 and g3
- dl-360-g3
- dl-145 (Red Hat EL AS3 AMD64)
Some have hpasm, some don't. Some have the latest firmware, others
don't. Similarities:
- all are dual processor systems (Intel P-III, Intel Xeon and Opteron)
- no syslog entry pointing at a possible problem
- if we are able to eyeball the problem (sometimes ASR kicks in during
nightly hours): not network connection, black console so no login
possible, hard disk leds show continuously activity. BIOS setting of 2
servers modified: MPS table set from auto to Full APIC as advised on a
 HP Linux list. Now see of those 2 servers stay stable ...
Cheers,
Andre


Comment 6 Andre ten Bohmer 2004-12-10 09:03:21 UTC
One server (DL-360-G3, Red Hat AS 2.1, HP bcm5700 NIC driver) was
reset (again) via ASR yesterday evening after three weeks uptime since
last ASR, latest firmware versions and MPS table set from auto to Full
APIC to no avail. Combination HP hardware and Red Hat Linux is not
acceptable with this kind of instability, so we contacted HP to assist us.
Andre

Comment 7 Larry Troan 2005-09-12 17:15:51 UTC
Last comment here was 2004-12-10. 

Prior to that, comment #3 indicates Red Hat believes this to be in hpasm. 
Chris, assigning this to you. Suggest you ping Jim Kam or someone else in HP
Support to determine what action, if any, is still required  on this issue.

Comment 8 RHEL Program Management 2007-10-19 19:26:26 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.