Bug 110170 - [PATCH] LTC5381- rhel 3 will need to pick up the cyclone-lpj-fix patch
[PATCH] LTC5381- rhel 3 will need to pick up the cyclone-lpj-fix patch
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Doug Ledford
: 108595 (view as bug list)
Depends On:
Blocks: 107562
  Show dependency treegraph
Reported: 2003-11-15 16:25 EST by IBM Bug Proxy
Modified: 2007-11-30 17:06 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-05-11 21:07:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
linux-2.4.23-pre9_cyclone-lpj-fix_A0.patch (906 bytes, text/plain)
2003-11-15 16:28 EST, IBM Bug Proxy
no flags Details

  None (edit)
Description IBM Bug Proxy 2003-11-15 16:25:20 EST
The following has be reported by IBM LTC:  
RHEL 3 will need to pick up the cyclone-lpj-fix patch
Hardware Environment:  
x440 and x445  
Software Environment:  
Distros based on kernels < 2.4.23  
Steps to Reproduce:  
1. Boot RHEL   
2. Observe the BogoMIPS rating given to each cpu 
Actual Results:  
Occasionally we'll see something like: 
	Calibrating delay loop... 3.27 BogoMIPS 
Expected Results:  
Always seeing something like: 
	Calibrating delay loop... 199.47 BogoMIPS 
Additional Information: 
This issue is caused by the lost-tick compensation code in the 2.4 kernel 
using loops_per_jiffy before that value is calculated. This can cause 
loops_per_jiffy to be miscalculated, which may cause SCSI hangs at boot, 
occasional keyboard and mouse hangs in X as well as other unseen issues. 
Normally the problem only occurs if the last cpu booted mis-calculates 
loops_per_jiffy, so it seems to show up rarely.  
The fix is to apply the patch submitted to lkml (now in 2.4.23-rc1)
seen here: 

Here is the patch included into 2.4.23-rc1As a side note, this problem
was found while testing SLES8, so it has been 
fixed and does not affect SuSE. Glen/Greg - please submit the patch
above to Red Hat for RHEL3.  Thanks.I'll bring this patch up during my
telecon today with Greg Kelleher.
Comment 1 IBM Bug Proxy 2003-11-15 16:28:43 EST
Created attachment 96003 [details]
Comment 2 mark wisner 2003-11-25 14:30:30 EST
------ Additional Comments From khoa@us.ibm.com  2003-25-11 14:29 -------
Just put this bug report in the correct state/ownership.... 
Comment 3 Bob Johnson 2003-12-05 14:36:01 EST
Chris, adding you to this bug, I thought all was good with timer fixes
for x440 and x445 ?
Comment 4 john stultz 2003-12-08 13:59:08 EST
Bob: Not quite. There is a subtle race in the lost-ticks 
compensation code. It appears to only bite us with certain 
combinations of cpu numbers, frequencies and SMT. (ie: 1 cpu @ 2Ghz 
w/ HT, 8 cpus @ 2.8Ghz w/o HT). It was not seen in testing RHEL 3.0, 
but was discovered by Andrea Arcangeli while testing for SLES8 SP3 
(after RHEL 3.0 had gone gold). The patch (now accepted into 
2.4.23-rc1) was ported and submitted as soon as possible after the 
issue was found.  
Let me know if you have any further questions.  
Comment 5 john stultz 2003-12-17 15:25:45 EST
RHEL 2.1 Update 4 should also take this fix.  
Comment 6 keith mannth 2004-01-15 14:20:40 EST
I still see bogoMips of 1.5 on a x440 with 2.4.21-7.ELsmp.  Please
take this patch. 
Comment 7 Chris Kloiber 2004-01-29 21:03:50 EST
We have customer in the field seeing this issue with X445's. They
report that running with HyperThreading disabled they see the issue
(and the clock runs so fast they can't login) but the issue is not
apparent when HyperThreading is enabled. This customer wishes to run
with HyperThreading off for their application. (CRM #282865)
Comment 9 Jim Richard 2004-02-03 20:12:31 EST
Please see Bug numbers #108595 and #110999 for aparently related side 
effects. We are seeing both issues (SCSI hangs on on-board disk 
array, and Fast system clock) on 2 xSeries 445s, one with dual 2.5 
Ghz and one with quad 2.8 Ghz processors. We are currently running 
RHEL AS 3.0 Update 1 Kernel ( 2.4.21-9.ELsmp). We have also seen both 
problems in hugemem kernel. I've attached dmesg and other details to 
both bugs.  Also we see both problems with HyperThreading on and 
HyperThreading off, when set from the boot parms, we have yet to test 
with HyperThreading off via the bios. 
Comment 10 john stultz 2004-02-05 18:10:24 EST
I opened bug #115061 to track this issue under RHEL 2.1 
Comment 12 Ernie Petrides 2004-02-21 04:54:44 EST
The patch in comment #1 has been committed to the RHEL 3 U2 patch
pool tonight, and it will first be available internally in the
Red Hat Engineering build of kernel version 2.4.21-9.11.EL.
Comment 13 Bernd Schmidt 2004-02-23 11:13:31 EST
Adding to the blocker to keep track of it.
Comment 14 john stultz 2004-03-26 16:26:11 EST
Tested the 2.4.21-11.ELsmp kernel and the issue looks to be resolved. 
BogoMIPS looks correct for all cpus and the earlier attached patch is 
present in the kernel src directory. 
I believe this bug can be closed. Bug #108595 should also be closable 
after the reporters have verified it works for them.  
Comment 15 IBM Bug Proxy 2004-04-12 19:01:14 EDT
----- Additional Comments From jstultz@us.ibm.com(prefers email via johnstul@us.ibm.com)  2004-04-12 19:03 -------
I'm closing the LTC bug, as this issue is resolved. 
Comment 16 Sebastian Wenner 2004-04-28 17:38:24 EDT
*** Bug 110999 has been marked as a duplicate of this bug. ***
Comment 17 John Flanagan 2004-05-11 21:07:47 EDT
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

Comment 18 Ernie Petrides 2005-10-03 20:07:54 EDT
*** Bug 108595 has been marked as a duplicate of this bug. ***
Comment 19 Emmanuel Chevreau 2006-03-30 08:14:02 EST
I have an HP Proliant DL360 with
Internal SCSI Card (SmartArray 6i) ==> System Disk
Aditional SCSI CARD 1 (LSI Logic 53c1030) ==> SCSI Disk Array
Additional SCSI Card 2 (LSI Logic 53c1030) ==> Tape DRIVE HP Ultrium 215
System is RHEL3 U4 ES with kernel 2.4.21-37 (I also try with updated kernel
I have similary bug when usin LTO Tape drive
/var/log/messages :
Mar 30 14:55:44 galibier kernel: scsi : aborting command due to timeout : pid
313, scsi1, channel 0, id 5, lun 0 Log Sense 00 7e 00 00 00 00 00 ff 40
Mar 30 14:55:44 galibier kernel: mptscsih: ioc1: id=5 OldAbort: scheduling ABORT
SCSI IO (sc=c3588600)
Mar 30 14:55:45 galibier kernel: SCSI host 1 abort (pid 313) timed out - resetting
Mar 30 14:55:45 galibier kernel: SCSI bus is being reset for host 1 channel 0.
Mar 30 14:55:45 galibier kernel: mptscsih: ioc1: id=5 OldReset: scheduling
BUS_RESET SCSI IO (sc=c3588600)
Mar 30 14:55:45 galibier kernel: mptbase: ioc1: WARNING - IOCStatus(0x0048):
SCSI Task Terminated

I think it is the same bug, so I ask it to be reoppen.
Comment 20 Chris McDermott 2006-04-06 00:17:31 EDT
Emmanuel, while your particular problem may be a result of missed/lost 
interrupts due to bugs in the lost tick compensation code (as described in 
this bug), it won't be addressed by the patch attached to this bugzilla. This 
bug report and attached patch applies only to specific IBM xSeries hardware 
utilizing the IBM Summit chipset.

Note You need to log in before you can comment on or make changes to this bug.