From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1) Description of problem: Processor/s: Dual Processor - Dual AMD Opteron 285 2.6 GHz 64-Bit w/ Dual Core Technology Motherboard: Tyan® Thunder K8WE Motherboard w/ SLI Support 8 GB RAM, 8 GB swap The machine seems to run fine when not loaded, but when performing a CFD calculation that uses about 4 GB of RAM it will lock up at random times where it does not respond and the caps lock and scroll lock light will flash on and off together about every second. System will run longer with two iterations than with four. Has tried disabling dual core no improvement. Sometimes when the system locks up the keyboard lights do not flash. Dual boot windows and Linux system... Version-Release number of selected component (if applicable): kernel 2.6.9-34.0.2ELsmp How reproducible: Always Steps to Reproduce: 1.boot computer 2. run job(s) to about 4.6G RAM useage 3. Computer hangs with lights sometims blinking sometimes not. Actual Results: Computer hangs with lights sometims blinking sometimes not. Expected Results: should be able to run through the job(s) properly. Additional info: Runs a job fine on same system under windows.
*** Bug 202131 has been marked as a duplicate of this bug. ***
Can you please post any messages from /var/log/messages, that appear at the time of the crash. When the machine locks up, can you do alt-sysrq-t. this will hopefully dump the state of all the processess. thanks.
For some reason everything after the 1.01 BIOS on this set up the interrupts as edge triggered instead of level triggered. Only the 1.01 bios will work under rhel4. Tyan says that if you use a kernel later than 2.6.14 the interrupts are set up correctly. I haven't seen anything that important change between 2.6.13 and 2.6.14 so i couldn't swear that they didn't just change a something in the .config file... Anyway, I'm working with their BIOS team and hopefully they'll get this fixed soon.
Created attachment 134075 [details] patch to match upstream kernel I think I've got it... This is the same as the upstream kernel. The PCI devices in /proc/interrupts are level triggered now.
I have updated the kernel to version 2.6.9-42.ELsmp that was released last week. The machine still exibited the lockup issue with the new kernel. It still has bios version 1.03, but by default the HT-LDT Frequency is set to auto which when the machine boots it says the HT-LDT Frequency is 1000MHz. I changed the setting in the bios from auto to 800MHz and it seems to run now, as my calculations ran all night last night and today with no problems. What is this HT-LDT freqency? Thanks Ron Morton
The patch listed in Comment #4 is similar to something I saw in RHEL3. I'll test it out and post it.
Patch tested and posted to rhkernel-list
I thought I had the problem resolved, but as soon as I say somthing, it locked up again. When I returned from lunch, the caps lock and scroll lock lights were flashing and the machine had locked up after running for ~2 days with no issues! Ron Morton
Interestingly, the machine locked up over the weekend with BIOS version 1.01. It did run for about 1.5 days before it locked up, and as before the scroll lock and caps lock lights on the keyboard were flashing. Ron
Ron is my user whom I am trying to assist with this issue. Where do we need to go next with this? I've not done a kernel recompile in years and on linux only added a module that was already available and that was also years ago. My user needs resolution as soon as possible. Thanks for your attention and assistance to date and in the future. Kathy Whyte
The patch listed in Comment #4 is not part of the 2.6.9-42.ELsmp kernel. It has been proposed for possible inclusion in a future kernel release.
committed in stream U5 build 42.3. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
My current kernel is 2.6.9-42.14.ELsmp and BIOS is 1.04.2895. The motherboard is a Tyan Thunder K8WE Model S2895 running BIOS version 1.04. The above system still locks up... I have two machines with IWILL motherboards and dual single core Opteron 248's that runs fine.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
QE ack for RHEL4.5.
User jparadis's account has been closed
Patch is in, looks to have been reported to have resolved at least one customer issue.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html