Bug 141699

Summary: FEAT: RHEL 4 U3: ia64 needs hint@pause in spinloop
Product: Red Hat Enterprise Linux 4 Reporter: Tony Luck <tony.luck>
Component: kernelAssignee: Geoff Gustafson <grgustaf>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: davej, fenghua.yu, fhirtz, jbaron, jturner, rohit.seth, tao, tburke, wwlinuxengineering
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0132 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-07 18:33:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168429    
Attachments:
Description Flags
Missing patch in EL4-U1-RC none

Description Tony Luck 2004-12-02 23:52:54 UTC
Description of problem: ia64 kernel should use the "hint@pause" 
instructions inside tight loops.  A few places have not done this.

Additional info:

Patch went into Linus' tree in two parts:

http://linux.bkbits.net:8080/linux-
2.5/cset@4193aa1axAj4iBkvQTc4RO0f9CdPow

and

http://linux.bkbits.net:8080/linux-
2.5/cset@41aca8b7JX2EN7uVERxvLcBs_1mjdQ

Comment 4 Frank Hirtz 2005-01-24 22:51:45 UTC
We're looking to understand the impact of this. What is the need that
this patch addresses? Also, have these patches been ported and tested
against RHEL4? 

Comment 5 Tony Luck 2005-02-09 19:07:02 UTC
This is a major performance impact for Montecito, a spinning thread without the 
relax can seriously impact the progress of another thread running on the same 
core.

Impact for non-Montecito is negligible (performance of spinloops is by 
definition irrelevent :-)

Comment 6 Susan Denham 2005-02-15 22:27:55 UTC
Posted to IT 57252:

Tony:

(As stated in BZ 141699):  We're looking to understand the impact of
this. What is the need that this patch addresses? Also, have these
patches been ported and tested against RHEL4?


We see that on 2/9, you did provide a business justification for this
(quoting you here from the BZ):
"This is a major performance impact for Montecito, a spinning thread
without the relax can seriously impact the progress of another thread
running on the same core.

Impact for non-Montecito is negligible (performance of spinloops is by
definition irrelevent :-) "

Unfortunately, given that U1 kernel freeze is in 2 days (2/17),
there's virtually no chance that this will be incorporated in U1.  In
order to get this feature considered for U2, you'll need to port the
patches again the final (GA) U1 kernel version (2.6.9.?)   Geoff will
be able to help ensure you get this kernel and will also help you test

Comment 10 Fenghua Yu 2005-05-17 22:39:30 UTC
Created attachment 114488 [details]
Missing patch in EL4-U1-RC

Comment 11 Fenghua Yu 2005-05-17 22:42:14 UTC
Majority of patches are actually accepted in RHEL4-U1-RC now.

But there is a small missing part in RHEL4-U1-RC. I attached the missing patch 
(see comment #10), which is on the top of RHEL4-U1-RC (kernel-2.6.9-9.EL). 
Please apply this missing patch for the next release.


Comment 12 Tim Burke 2005-05-25 20:51:50 UTC
Is this a duplicate of bug #158336?

Comment 13 Marty Wesley 2005-05-26 06:59:09 UTC
PM ACK for U2.

Comment 14 Dave Jones 2005-05-27 00:56:23 UTC
*** Bug 158851 has been marked as a duplicate of this bug. ***

Comment 15 Dave Jones 2005-05-27 00:57:13 UTC
*** Bug 141851 has been marked as a duplicate of this bug. ***

Comment 16 Prarit Bhargava 2005-05-31 15:17:49 UTC
Tim asked,

Is this a duplicate of bug #158336?

No -- 158336 addresses the missing glibc patch, while this bug (141699) is 
a missing patch addressing the issue where tight kernel loops should use 
hint@pause.  

I'm examining this now...

Comment 17 Tim Powers 2005-06-08 15:13:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-420.html


Comment 25 Prarit Bhargava 2005-06-29 15:43:18 UTC
Please be aware that I can no longer see PRIVATE comments. 

Comment 26 Prarit Bhargava 2005-06-30 11:44:38 UTC
From rhkernel-list: 
 
Prarit wrote: 
>>I'm offering the patch here with the above note that this is a kabi 
>> violation 
 
 
The kABI violation makes it a NAK for me.  udelay() violations in driver 
accesses could render parts of the system highly unstable. 
 
 
>> , however, given that this bug is a STOPSHIP I think it should be 
>> included in the kernel. 
 
 
There _is_ a compromise possible; ugly, but I think it's the best we can 
do. 
 
Don't change the existing udelay calibration code.  Existing modules 
continue to work. 
 
Add a *new* calibration that stores its results in a new (per-cpu) 
variable.  Change the udelay macro to (a) do the cpu_relax(), and (b) 
read the new calibration variable. 
 
Now, existing modules will continue to read the old calibration data, 
which is correct for their compiled-in cpu_relax()-less udelay loops.  
Newly-compiled code will pick up the new udelay macro, will use the 
newer loop calibration data and will perform the cpu_relax() as 
required. 
 
This is more engineering work but fixes all code in the kernel itself 
without breaking external modules. 

Comment 27 Prarit Bhargava 2005-06-30 18:11:26 UTC
Whups -- I should have indicated that the writer above was Stephen Tweedie. 

Comment 28 Tim Burke 2005-07-07 20:13:22 UTC
This feature request will not be incorporated into RHEL4 U2.  Reasons being:
1) kabi impact as noted in comments above
2) lack of an engineer to cover it at this late point in the development -
especially given the kabi impact increases the work to have someone pursue an
alternative workaround.



Comment 32 Larry Troan 2005-09-27 11:45:04 UTC
Since Bug 141851 [spin loops on both ia32 and ia32e need cpu_relax] was closed
as a DUP of this bug per comment #15 above, I'm changing the summary of this bug
to reflect ia32 as well as ia64.

Comment 34 Larry Troan 2005-10-07 20:22:08 UTC
Providing PM ACK on this for U3. Already on U3Proposed list.

Comment 36 Tony Luck 2005-10-24 21:37:25 UTC
Status check on the patch Fengua Yu attached to this bugzilla back in comment 
#10 on May17th.  This patch didn't make it into RHEL4 U2, it would be nice to 
see it in U3.

There are no kABI implications of chaning udelay() in this way on ia64 ... the 
udelay spinloop is simply waiting for the ar.itc cycle counter to reach the 
value that has been calculated for the end-point of the requested delay. It 
makes no difference to the accuracy of the delay whether we do nothing in the 
body of the loop, or we execute the hint@pause instruction [in either case it 
is possible the Montecito will switch to the other thread during the spin loop, 
with the hint@pause it just just much more likely that we will switch ... but 
since the granularity of udelay() is microseconds, we will switch back in time 
to do the end point check accurately enough]

Comment 39 Geoff Gustafson 2005-10-25 19:50:26 UTC
Crossed wires here. The implementations on i386/x86_64 and ia64 are different so
141851 should not have been dup'd to this bug. Please move discussion on
i386/x86_64 there.

Tony argues (and in irc Arjan says, "tony's answer is enough for me"), that on
ia64 udelay() is implemented differently and does NOT cause this kABI problem.
So taking this patch is reasonable now, where more creativity will be required
for x86.

Reverting summary to be ia64-specific again.


Comment 40 Linda Wang 2005-11-04 15:05:11 UTC
Since this bug now clear on only addressing IA64 issue, we can move this back to
U3 proposed list. Also, since Intel did the work, ie Tony, Geoff, can you take
on this bug instead?  thanks.

Comment 41 Geoff Gustafson 2005-11-07 15:54:17 UTC
Assigning it to myself.


Comment 47 Jay Turner 2006-01-03 19:36:12 UTC
Please confirm the issue is resolved with the 2.6.9-27.EL or latest kernel.  Thanks.

Comment 48 Tony Luck 2006-01-16 18:33:44 UTC
Yes.  Fixed in 2.6.9-27.EL.

Comment 51 Red Hat Bugzilla 2006-03-07 18:33:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html