Bug 510284

Summary: Dell T7400 dual Xeon quad core (E5405) system crash/freeze (X86_64)
Product: Red Hat Enterprise Linux 5 Reporter: Ian Dickens <ian>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.3CC: ian
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-02 13:03:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ian Dickens 2009-07-08 15:12:35 UTC
Description of problem:

Background - Dell T7400 with 2 quad core Xeon processors (E5405) and 32G of RAM running RHEL 5.3.  we have the auditing system configured as well runing the NISPOM ruleset.

We are attempting to use condor on these systems (we have 2 of them) allowing condor to attempt to use 6 of the 8 cores.  When telling condor to use more than 2 cores the system freezes beyond recovery.  The machines will not crash when using only 2 cores.  I read up on the apci threads and attempted to use RH kernel 2.6.18-156 and was able to reproduce the freeze.  Once frozen, the system obviously does not respond to any network/consolve probing (as you would expect).  It's dead...

The condor tools run as a non-privileged user...

Version-Release number of selected component (if applicable):

RHEL 5.3 using either kernels 2.6.18-128 or 2.6.18.156 for X86_64 platforms.


How reproducible:

So far...  Configure condor to assine slots to more than 2 of the 8 cores.  Crash is almost immediate once the test jobs are submitted.  However, if you configure condor to only used 2 cores - the jobs complete as expected.  I am guessing this might be a kernel bug since the whole machine dies...

To be fair:  I am not 100% sure condor is not to blame - but since it runs as a non-root user the kernel should not crash - condor should....

Comment 1 RHEL Program Management 2014-03-07 12:40:09 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 2 RHEL Program Management 2014-06-02 13:03:12 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).

Comment 3 Red Hat Bugzilla 2023-09-14 01:17:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days