Red Hat Bugzilla – Bug 510284
Dell T7400 dual Xeon quad core (E5405) system crash/freeze (X86_64)
Last modified: 2014-06-02 09:03:12 EDT
Description of problem:
Background - Dell T7400 with 2 quad core Xeon processors (E5405) and 32G of RAM running RHEL 5.3. we have the auditing system configured as well runing the NISPOM ruleset.
We are attempting to use condor on these systems (we have 2 of them) allowing condor to attempt to use 6 of the 8 cores. When telling condor to use more than 2 cores the system freezes beyond recovery. The machines will not crash when using only 2 cores. I read up on the apci threads and attempted to use RH kernel 2.6.18-156 and was able to reproduce the freeze. Once frozen, the system obviously does not respond to any network/consolve probing (as you would expect). It's dead...
The condor tools run as a non-privileged user...
Version-Release number of selected component (if applicable):
RHEL 5.3 using either kernels 2.6.18-128 or 188.8.131.52 for X86_64 platforms.
So far... Configure condor to assine slots to more than 2 of the 8 cores. Crash is almost immediate once the test jobs are submitted. However, if you configure condor to only used 2 cores - the jobs complete as expected. I am guessing this might be a kernel bug since the whole machine dies...
To be fair: I am not 100% sure condor is not to blame - but since it runs as a non-root user the kernel should not crash - condor should....
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).