Bug 202130 - Tyan system hangs and keyboard lights blink even when not out of resources
Summary: Tyan system hangs and keyboard lights blink even when not out of resources
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: other
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Brian Maly
QA Contact: Brian Brock
URL:
Whiteboard:
: 202131 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-08-10 21:44 UTC by Kathy Whyte
Modified: 2021-11-08 19:27 UTC (History)
1 user (show)

Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-08 03:13:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to match upstream kernel (320 bytes, patch)
2006-08-12 07:16 UTC, Dan Carpenter
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0304 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5 2007-04-28 18:58:50 UTC

Description Kathy Whyte 2006-08-10 21:44:40 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)

Description of problem:
Processor/s: Dual Processor - Dual AMD Opteron 285 2.6
GHz 64-Bit w/ Dual Core Technology
Motherboard: Tyan® Thunder K8WE Motherboard w/ SLI Support
8 GB RAM, 8 GB swap 

The machine seems to run fine when not loaded, but when
performing a CFD calculation that uses about 4 GB of RAM
it will lock up at random times where it does not respond
and the caps lock and scroll lock light will flash on and
off together about every second. 


System will run longer with two iterations than with four.
Has tried disabling dual core no improvement.

Sometimes when the system locks up the keyboard lights do not flash.

Dual boot windows and Linux system...




Version-Release number of selected component (if applicable):
kernel 2.6.9-34.0.2ELsmp

How reproducible:
Always


Steps to Reproduce:
1.boot computer
2. run job(s) to about 4.6G RAM useage
3. Computer hangs with lights sometims blinking sometimes not.

Actual Results:
Computer hangs with lights sometims blinking sometimes not.

Expected Results:
should be able to run through the job(s) properly.

Additional info:
Runs a job fine on same system under windows.

Comment 1 Jason Baron 2006-08-11 15:22:58 UTC
*** Bug 202131 has been marked as a duplicate of this bug. ***

Comment 2 Jason Baron 2006-08-11 15:26:04 UTC
Can you please post any messages from /var/log/messages, that appear at the time
of the crash. When the machine locks up, can you do alt-sysrq-t. this will
hopefully dump the state of all the processess. thanks.

Comment 3 Dan Carpenter 2006-08-11 16:30:01 UTC
For some reason everything after the 1.01 BIOS on this set up the interrupts as
edge triggered instead of level triggered.  Only the 1.01 bios will work under
rhel4.

Tyan says that if you use a kernel later than 2.6.14 the interrupts are set up
correctly.  I haven't seen anything that important change between 2.6.13 and
2.6.14 so i couldn't swear that they didn't just change a something in the
.config file...

Anyway, I'm working with their BIOS team and hopefully they'll get this fixed soon.

Comment 4 Dan Carpenter 2006-08-12 07:16:26 UTC
Created attachment 134075 [details]
patch to match upstream kernel

I think I've got it...	This is the same as the upstream kernel.

The PCI devices in /proc/interrupts are level triggered now.

Comment 5 Kathy Whyte 2006-08-15 18:44:22 UTC
I have updated the kernel to version 2.6.9-42.ELsmp that 
    was released last week.  The machine still exibited the 
    lockup issue with the new kernel. It still has bios 
    version 1.03, but by default the HT-LDT Frequency is set 
    to auto which when the machine boots it says the HT-LDT 
    Frequency is 1000MHz. I changed the setting in the bios 
    from auto to 800MHz and it seems to run now, as my 
    calculations ran all night last night and today with no 
    problems.  What is this HT-LDT freqency?  Thanks
    Ron Morton


Comment 6 Jim Paradis 2006-08-16 20:13:38 UTC
The patch listed in Comment #4 is similar to something I saw in RHEL3.  I'll
test it out and post it.


Comment 7 Jim Paradis 2006-08-16 21:08:48 UTC
Patch tested and posted to rhkernel-list


Comment 8 Kathy Whyte 2006-08-18 13:07:39 UTC
I thought I had the problem resolved, but
as soon as I say somthing, it locked up again. When I
returned from lunch, the caps lock and scroll lock lights
were flashing and the machine had locked up after running
for ~2 days with no issues!
Ron Morton

Comment 9 Kathy Whyte 2006-08-21 18:10:52 UTC
Interestingly, the machine locked up over the weekend with
BIOS version 1.01. It did run for about 1.5 days before
it locked up, and as before the scroll lock and caps lock
lights on the keyboard were flashing.

Ron

Comment 10 Kathy Whyte 2006-08-21 18:13:56 UTC
Ron is my user whom I am trying to assist with this issue.
Where do we need to go next with this?
I've not done a kernel recompile in years and on linux only added a module that
was already available and that was also years ago.
My user needs resolution as soon as possible.

Thanks for your attention and assistance to date and in the future.

Kathy Whyte

Comment 11 Jim Paradis 2006-08-21 18:17:34 UTC
The patch listed in Comment #4 is not part of the 2.6.9-42.ELsmp kernel.  It has
been proposed for possible inclusion in a future kernel release.


Comment 12 Jason Baron 2006-08-29 18:33:31 UTC
committed in stream U5 build 42.3. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 13 Kathy Whyte 2006-10-03 16:52:05 UTC
My current kernel is 2.6.9-42.14.ELsmp and BIOS is
1.04.2895. The motherboard is a Tyan Thunder K8WE Model
S2895 running BIOS version 1.04.

The above system still locks up...

I have two machines with IWILL motherboards and dual single
core Opteron 248's that runs fine.

Comment 14 RHEL Program Management 2007-02-07 23:25:01 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 15 Jay Turner 2007-02-08 13:34:50 UTC
QE ack for RHEL4.5.

Comment 17 Red Hat Bugzilla 2007-03-18 22:40:58 UTC
User jparadis's account has been closed

Comment 18 Mike Gahagan 2007-04-03 19:24:53 UTC
Patch is in, looks to have been reported to have resolved at least one customer
issue.


Comment 21 Red Hat Bugzilla 2007-05-08 03:13:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html


Note You need to log in before you can comment on or make changes to this bug.