Bug 192760
Summary: | RHAS4 U3 x86_64 largesmp kernels don't support >8 cores on AMD64 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Nakul Saraiya <nakul> | ||||||||||||
Component: | kernel | Assignee: | Brian Maly <bmaly> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 4.0 | CC: | bnagendr, cseshadri, jbaron, konradr | ||||||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | RHBA-2007-0304 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2007-05-08 01:34:19 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Nakul Saraiya
2006-05-22 19:33:17 UTC
RHEL4 already supports >8 CPUs in Update 3 using clustered APIC mode. This has been verified in-house with a 8-way dual-core system. AMD64 does *not* support clustered APIC mode for >8 cores; Intel EM64T does. The boot hangs on our AMD Opteron systems, probably because of lost interrupts. See the kernel.org source under arch/x86_64/kernel/genapic.c and .../genapic_flat.c and .../mpparse.c. I have a patch against both RHEL4 U1 (in use) and RHEL4 U3 (in test) that make this work. Created attachment 129895 [details]
Boot hang #1
Created attachment 129896 [details]
Boot hang #2
Created attachment 129897 [details]
Working patch for RHAS4 U1 16-core AMD64
This is slightly non-optimal (should use DM_FIXED delivery mode), and fixes
some other small annoyances on AMD64. It has been tested and is in use. It is
against RHEL4 U1 and was submitted to your Eng team through our Business
Development contacts. It also required that the config be changed to support
16 CPUs (not in patch.)
Created attachment 129898 [details]
PROPOSED patch for RHAS4 U3 for 16-core AMD64
This has been compiled but not yet tested. I will update the bug report
tomorrow with test results (boot log.) I also changed the largesmp config to
use 16 CPUs, since that is all that AMD64 supports today, unlike Intel EM64T.
Comment on attachment 129898 [details]
PROPOSED patch for RHAS4 U3 for 16-core AMD64
Oops, I changed the 'flat' APIC mode to fixed, rather than 'physical flat'.
Will submit an updated patch tomorrow.
Nakul, the bug was changed to "CLOSED WONTFIX" - is that the correct state? bugzilla didn't allow me to reopen the bug, so I just flagged it as best I could. I'm currently testing a patch and should have results for you later today. Nakul, Re-openning the bug Quick update - I have managed to get our system working but it seems to go into clustered mode (both RHAS4U1 with patch and RHAS4U3 with patch.) While the right answer is to use physical-flat mode, I believe that I may have been wrong about AMD64 supporting clustered mode. I will check with AMD as to the impact of this. In the meanwhile, I'll see how to coax the system into physflat mode. An update while I get to the root cause. 1. For some reason, boot_cpu_data.cpu_vendor is not being set by the time that clustered_apic_check() is called - my stock Tyan AMD64 system thinks it is an Intel EM64T system with your kernel. So do my systems. 2. The reason for the original hang on our systems was the missing call to clustered_apic_check() in mpparse.c. We don't (currently) supply ACPI info, but rely on mptables. So this local change fixes the hang, but still puts us into clustered APIC mode due to #1. Created attachment 130052 [details]
Patch against RHEL4U3 tree to put AMD64 systems into physflat mode
This successfully boots on a Tyan 2-socket and our 16-socket and puts the
16-socket into physical flat mode. SQA will test this more thoroughly over the
next week or two.
committed in stream U5 build 42.18. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. QE ack for 4.5. User jparadis's account has been closed Is there a patch available for this bug for RHAS4U4? Patch is in the -52 kernel, already working for two customers. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html |