Bug 141699
Summary: | FEAT: RHEL 4 U3: ia64 needs hint@pause in spinloop | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Tony Luck <tony.luck> | ||||
Component: | kernel | Assignee: | Geoff Gustafson <grgustaf> | ||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.0 | CC: | davej, fenghua.yu, fhirtz, jbaron, jturner, rohit.seth, tao, tburke, wwlinuxengineering | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | ia64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHSA-2006-0132 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-03-07 18:33:27 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 168429 | ||||||
Attachments: |
|
Description
Tony Luck
2004-12-02 23:52:54 UTC
We're looking to understand the impact of this. What is the need that this patch addresses? Also, have these patches been ported and tested against RHEL4? This is a major performance impact for Montecito, a spinning thread without the relax can seriously impact the progress of another thread running on the same core. Impact for non-Montecito is negligible (performance of spinloops is by definition irrelevent :-) Posted to IT 57252: Tony: (As stated in BZ 141699): We're looking to understand the impact of this. What is the need that this patch addresses? Also, have these patches been ported and tested against RHEL4? We see that on 2/9, you did provide a business justification for this (quoting you here from the BZ): "This is a major performance impact for Montecito, a spinning thread without the relax can seriously impact the progress of another thread running on the same core. Impact for non-Montecito is negligible (performance of spinloops is by definition irrelevent :-) " Unfortunately, given that U1 kernel freeze is in 2 days (2/17), there's virtually no chance that this will be incorporated in U1. In order to get this feature considered for U2, you'll need to port the patches again the final (GA) U1 kernel version (2.6.9.?) Geoff will be able to help ensure you get this kernel and will also help you test Created attachment 114488 [details]
Missing patch in EL4-U1-RC
Majority of patches are actually accepted in RHEL4-U1-RC now. But there is a small missing part in RHEL4-U1-RC. I attached the missing patch (see comment #10), which is on the top of RHEL4-U1-RC (kernel-2.6.9-9.EL). Please apply this missing patch for the next release. Is this a duplicate of bug #158336? PM ACK for U2. *** Bug 158851 has been marked as a duplicate of this bug. *** *** Bug 141851 has been marked as a duplicate of this bug. *** Tim asked, Is this a duplicate of bug #158336? No -- 158336 addresses the missing glibc patch, while this bug (141699) is a missing patch addressing the issue where tight kernel loops should use hint@pause. I'm examining this now... An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-420.html Please be aware that I can no longer see PRIVATE comments. From rhkernel-list: Prarit wrote: >>I'm offering the patch here with the above note that this is a kabi >> violation The kABI violation makes it a NAK for me. udelay() violations in driver accesses could render parts of the system highly unstable. >> , however, given that this bug is a STOPSHIP I think it should be >> included in the kernel. There _is_ a compromise possible; ugly, but I think it's the best we can do. Don't change the existing udelay calibration code. Existing modules continue to work. Add a *new* calibration that stores its results in a new (per-cpu) variable. Change the udelay macro to (a) do the cpu_relax(), and (b) read the new calibration variable. Now, existing modules will continue to read the old calibration data, which is correct for their compiled-in cpu_relax()-less udelay loops. Newly-compiled code will pick up the new udelay macro, will use the newer loop calibration data and will perform the cpu_relax() as required. This is more engineering work but fixes all code in the kernel itself without breaking external modules. Whups -- I should have indicated that the writer above was Stephen Tweedie. This feature request will not be incorporated into RHEL4 U2. Reasons being: 1) kabi impact as noted in comments above 2) lack of an engineer to cover it at this late point in the development - especially given the kabi impact increases the work to have someone pursue an alternative workaround. Since Bug 141851 [spin loops on both ia32 and ia32e need cpu_relax] was closed as a DUP of this bug per comment #15 above, I'm changing the summary of this bug to reflect ia32 as well as ia64. Providing PM ACK on this for U3. Already on U3Proposed list. Status check on the patch Fengua Yu attached to this bugzilla back in comment #10 on May17th. This patch didn't make it into RHEL4 U2, it would be nice to see it in U3. There are no kABI implications of chaning udelay() in this way on ia64 ... the udelay spinloop is simply waiting for the ar.itc cycle counter to reach the value that has been calculated for the end-point of the requested delay. It makes no difference to the accuracy of the delay whether we do nothing in the body of the loop, or we execute the hint@pause instruction [in either case it is possible the Montecito will switch to the other thread during the spin loop, with the hint@pause it just just much more likely that we will switch ... but since the granularity of udelay() is microseconds, we will switch back in time to do the end point check accurately enough] Crossed wires here. The implementations on i386/x86_64 and ia64 are different so 141851 should not have been dup'd to this bug. Please move discussion on i386/x86_64 there. Tony argues (and in irc Arjan says, "tony's answer is enough for me"), that on ia64 udelay() is implemented differently and does NOT cause this kABI problem. So taking this patch is reasonable now, where more creativity will be required for x86. Reverting summary to be ia64-specific again. Since this bug now clear on only addressing IA64 issue, we can move this back to U3 proposed list. Also, since Intel did the work, ie Tony, Geoff, can you take on this bug instead? thanks. Assigning it to myself. Please confirm the issue is resolved with the 2.6.9-27.EL or latest kernel. Thanks. Yes. Fixed in 2.6.9-27.EL. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html |