Bug 510284 - Dell T7400 dual Xeon quad core (E5405) system crash/freeze (X86_64) [NEEDINFO]
Dell T7400 dual Xeon quad core (E5405) system crash/freeze (X86_64)
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: Red Hat Kernel Manager
Red Hat Kernel QE team
Depends On:
  Show dependency treegraph
Reported: 2009-07-08 11:12 EDT by Ian Dickens
Modified: 2014-06-02 09:03 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-06-02 09:03:12 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
pm-rhel: needinfo? (ian)

Attachments (Terms of Use)

  None (edit)
Description Ian Dickens 2009-07-08 11:12:35 EDT
Description of problem:

Background - Dell T7400 with 2 quad core Xeon processors (E5405) and 32G of RAM running RHEL 5.3.  we have the auditing system configured as well runing the NISPOM ruleset.

We are attempting to use condor on these systems (we have 2 of them) allowing condor to attempt to use 6 of the 8 cores.  When telling condor to use more than 2 cores the system freezes beyond recovery.  The machines will not crash when using only 2 cores.  I read up on the apci threads and attempted to use RH kernel 2.6.18-156 and was able to reproduce the freeze.  Once frozen, the system obviously does not respond to any network/consolve probing (as you would expect).  It's dead...

The condor tools run as a non-privileged user...

Version-Release number of selected component (if applicable):

RHEL 5.3 using either kernels 2.6.18-128 or for X86_64 platforms.

How reproducible:

So far...  Configure condor to assine slots to more than 2 of the 8 cores.  Crash is almost immediate once the test jobs are submitted.  However, if you configure condor to only used 2 cores - the jobs complete as expected.  I am guessing this might be a kernel bug since the whole machine dies...

To be fair:  I am not 100% sure condor is not to blame - but since it runs as a non-root user the kernel should not crash - condor should....
Comment 1 RHEL Product and Program Management 2014-03-07 07:40:09 EST
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Comment 2 RHEL Product and Program Management 2014-06-02 09:03:12 EDT
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).

Note You need to log in before you can comment on or make changes to this bug.