Bug 110170 - [PATCH] LTC5381- rhel 3 will need to pick up the cyclone-lpj-fix patch
Summary: [PATCH] LTC5381- rhel 3 will need to pick up the cyclone-lpj-fix patch
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Doug Ledford
QA Contact:
URL:
Whiteboard:
: 108595 (view as bug list)
Depends On:
Blocks: 107562
TreeView+ depends on / blocked
 
Reported: 2003-11-15 21:25 UTC by IBM Bug Proxy
Modified: 2007-11-30 22:06 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-05-12 01:07:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
linux-2.4.23-pre9_cyclone-lpj-fix_A0.patch (906 bytes, text/plain)
2003-11-15 21:28 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2004:188 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 2 2004-05-11 04:00:00 UTC

Description IBM Bug Proxy 2003-11-15 21:25:20 UTC
The following has be reported by IBM LTC:  
RHEL 3 will need to pick up the cyclone-lpj-fix patch
Hardware Environment:  
x440 and x445  
  
Software Environment:  
Distros based on kernels < 2.4.23  
  
Steps to Reproduce:  
1. Boot RHEL   
2. Observe the BogoMIPS rating given to each cpu 
 
  
Actual Results:  
Occasionally we'll see something like: 
	Calibrating delay loop... 3.27 BogoMIPS 
Expected Results:  
Always seeing something like: 
	Calibrating delay loop... 199.47 BogoMIPS 
 
Additional Information: 
This issue is caused by the lost-tick compensation code in the 2.4 kernel 
using loops_per_jiffy before that value is calculated. This can cause 
loops_per_jiffy to be miscalculated, which may cause SCSI hangs at boot, 
occasional keyboard and mouse hangs in X as well as other unseen issues. 
Normally the problem only occurs if the last cpu booted mis-calculates 
loops_per_jiffy, so it seems to show up rarely.  
 
The fix is to apply the patch submitted to lkml (now in 2.4.23-rc1)
seen here: 
http://www.ussg.iu.edu/hypermail/linux/kernel/0311.0/0329.html


Here is the patch included into 2.4.23-rc1As a side note, this problem
was found while testing SLES8, so it has been 
fixed and does not affect SuSE. Glen/Greg - please submit the patch
above to Red Hat for RHEL3.  Thanks.I'll bring this patch up during my
telecon today with Greg Kelleher.

Comment 1 IBM Bug Proxy 2003-11-15 21:28:43 UTC
Created attachment 96003 [details]
linux-2.4.23-pre9_cyclone-lpj-fix_A0.patch

Comment 2 mark wisner 2003-11-25 19:30:30 UTC
------ Additional Comments From khoa.com  2003-25-11 14:29 -------
Just put this bug report in the correct state/ownership.... 

Comment 3 Bob Johnson 2003-12-05 19:36:01 UTC
Chris, adding you to this bug, I thought all was good with timer fixes
for x440 and x445 ?

Comment 4 john stultz 2003-12-08 18:59:08 UTC
Bob: Not quite. There is a subtle race in the lost-ticks 
compensation code. It appears to only bite us with certain 
combinations of cpu numbers, frequencies and SMT. (ie: 1 cpu @ 2Ghz 
w/ HT, 8 cpus @ 2.8Ghz w/o HT). It was not seen in testing RHEL 3.0, 
but was discovered by Andrea Arcangeli while testing for SLES8 SP3 
(after RHEL 3.0 had gone gold). The patch (now accepted into 
2.4.23-rc1) was ported and submitted as soon as possible after the 
issue was found.  
 
Let me know if you have any further questions.  

Comment 5 john stultz 2003-12-17 20:25:45 UTC
RHEL 2.1 Update 4 should also take this fix.  

Comment 6 keith mannth 2004-01-15 19:20:40 UTC
I still see bogoMips of 1.5 on a x440 with 2.4.21-7.ELsmp.  Please
take this patch. 

Comment 7 Chris Kloiber 2004-01-30 02:03:50 UTC
We have customer in the field seeing this issue with X445's. They
report that running with HyperThreading disabled they see the issue
(and the clock runs so fast they can't login) but the issue is not
apparent when HyperThreading is enabled. This customer wishes to run
with HyperThreading off for their application. (CRM #282865)

Comment 9 Jim Richard 2004-02-04 01:12:31 UTC
Please see Bug numbers #108595 and #110999 for aparently related side 
effects. We are seeing both issues (SCSI hangs on on-board disk 
array, and Fast system clock) on 2 xSeries 445s, one with dual 2.5 
Ghz and one with quad 2.8 Ghz processors. We are currently running 
RHEL AS 3.0 Update 1 Kernel ( 2.4.21-9.ELsmp). We have also seen both 
problems in hugemem kernel. I've attached dmesg and other details to 
both bugs.  Also we see both problems with HyperThreading on and 
HyperThreading off, when set from the boot parms, we have yet to test 
with HyperThreading off via the bios. 

Comment 10 john stultz 2004-02-05 23:10:24 UTC
I opened bug #115061 to track this issue under RHEL 2.1 

Comment 12 Ernie Petrides 2004-02-21 09:54:44 UTC
The patch in comment #1 has been committed to the RHEL 3 U2 patch
pool tonight, and it will first be available internally in the
Red Hat Engineering build of kernel version 2.4.21-9.11.EL.


Comment 13 Bernd Schmidt 2004-02-23 16:13:31 UTC
Adding to the blocker to keep track of it.

Comment 14 john stultz 2004-03-26 21:26:11 UTC
Tested the 2.4.21-11.ELsmp kernel and the issue looks to be resolved. 
BogoMIPS looks correct for all cpus and the earlier attached patch is 
present in the kernel src directory. 
 
I believe this bug can be closed. Bug #108595 should also be closable 
after the reporters have verified it works for them.  

Comment 15 IBM Bug Proxy 2004-04-12 23:01:14 UTC
----- Additional Comments From jstultz.com(prefers email via johnstul.com)  2004-04-12 19:03 -------
I'm closing the LTC bug, as this issue is resolved. 

Comment 16 Sebastian Wenner 2004-04-28 21:38:24 UTC
*** Bug 110999 has been marked as a duplicate of this bug. ***

Comment 17 John Flanagan 2004-05-12 01:07:47 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-188.html


Comment 18 Ernie Petrides 2005-10-04 00:07:54 UTC
*** Bug 108595 has been marked as a duplicate of this bug. ***

Comment 19 Emmanuel Chevreau 2006-03-30 13:14:02 UTC
I have an HP Proliant DL360 with
Internal SCSI Card (SmartArray 6i) ==> System Disk
 
Aditional SCSI CARD 1 (LSI Logic 53c1030) ==> SCSI Disk Array
 
Additional SCSI Card 2 (LSI Logic 53c1030) ==> Tape DRIVE HP Ultrium 215
 
System is RHEL3 U4 ES with kernel 2.4.21-37 (I also try with updated kernel
2.4.21-40)
 
I have similary bug when usin LTO Tape drive
 
/var/log/messages :
Mar 30 14:55:44 galibier kernel: scsi : aborting command due to timeout : pid
313, scsi1, channel 0, id 5, lun 0 Log Sense 00 7e 00 00 00 00 00 ff 40
Mar 30 14:55:44 galibier kernel: mptscsih: ioc1: id=5 OldAbort: scheduling ABORT
SCSI IO (sc=c3588600)
Mar 30 14:55:45 galibier kernel: SCSI host 1 abort (pid 313) timed out - resetting
Mar 30 14:55:45 galibier kernel: SCSI bus is being reset for host 1 channel 0.
Mar 30 14:55:45 galibier kernel: mptscsih: ioc1: id=5 OldReset: scheduling
BUS_RESET SCSI IO (sc=c3588600)
Mar 30 14:55:45 galibier kernel: mptbase: ioc1: WARNING - IOCStatus(0x0048):
SCSI Task Terminated


I think it is the same bug, so I ask it to be reoppen.
Thanks


Comment 20 Chris McDermott 2006-04-06 04:17:31 UTC
Emmanuel, while your particular problem may be a result of missed/lost 
interrupts due to bugs in the lost tick compensation code (as described in 
this bug), it won't be addressed by the patch attached to this bugzilla. This 
bug report and attached patch applies only to specific IBM xSeries hardware 
utilizing the IBM Summit chipset.


Note You need to log in before you can comment on or make changes to this bug.