The following has be reported by IBM LTC:
RHEL 3 will need to pick up the cyclone-lpj-fix patch
x440 and x445
Distros based on kernels < 2.4.23
Steps to Reproduce:
1. Boot RHEL
2. Observe the BogoMIPS rating given to each cpu
Occasionally we'll see something like:
Calibrating delay loop... 3.27 BogoMIPS
Always seeing something like:
Calibrating delay loop... 199.47 BogoMIPS
This issue is caused by the lost-tick compensation code in the 2.4 kernel
using loops_per_jiffy before that value is calculated. This can cause
loops_per_jiffy to be miscalculated, which may cause SCSI hangs at boot,
occasional keyboard and mouse hangs in X as well as other unseen issues.
Normally the problem only occurs if the last cpu booted mis-calculates
loops_per_jiffy, so it seems to show up rarely.
The fix is to apply the patch submitted to lkml (now in 2.4.23-rc1)
Here is the patch included into 2.4.23-rc1As a side note, this problem
was found while testing SLES8, so it has been
fixed and does not affect SuSE. Glen/Greg - please submit the patch
above to Red Hat for RHEL3. Thanks.I'll bring this patch up during my
telecon today with Greg Kelleher.
Created attachment 96003 [details]
------ Additional Comments From firstname.lastname@example.org 2003-25-11 14:29 -------
Just put this bug report in the correct state/ownership....
Chris, adding you to this bug, I thought all was good with timer fixes
for x440 and x445 ?
Bob: Not quite. There is a subtle race in the lost-ticks
compensation code. It appears to only bite us with certain
combinations of cpu numbers, frequencies and SMT. (ie: 1 cpu @ 2Ghz
w/ HT, 8 cpus @ 2.8Ghz w/o HT). It was not seen in testing RHEL 3.0,
but was discovered by Andrea Arcangeli while testing for SLES8 SP3
(after RHEL 3.0 had gone gold). The patch (now accepted into
2.4.23-rc1) was ported and submitted as soon as possible after the
issue was found.
Let me know if you have any further questions.
RHEL 2.1 Update 4 should also take this fix.
I still see bogoMips of 1.5 on a x440 with 2.4.21-7.ELsmp. Please
take this patch.
We have customer in the field seeing this issue with X445's. They
report that running with HyperThreading disabled they see the issue
(and the clock runs so fast they can't login) but the issue is not
apparent when HyperThreading is enabled. This customer wishes to run
with HyperThreading off for their application. (CRM #282865)
Please see Bug numbers #108595 and #110999 for aparently related side
effects. We are seeing both issues (SCSI hangs on on-board disk
array, and Fast system clock) on 2 xSeries 445s, one with dual 2.5
Ghz and one with quad 2.8 Ghz processors. We are currently running
RHEL AS 3.0 Update 1 Kernel ( 2.4.21-9.ELsmp). We have also seen both
problems in hugemem kernel. I've attached dmesg and other details to
both bugs. Also we see both problems with HyperThreading on and
HyperThreading off, when set from the boot parms, we have yet to test
with HyperThreading off via the bios.
I opened bug #115061 to track this issue under RHEL 2.1
The patch in comment #1 has been committed to the RHEL 3 U2 patch
pool tonight, and it will first be available internally in the
Red Hat Engineering build of kernel version 2.4.21-9.11.EL.
Adding to the blocker to keep track of it.
Tested the 2.4.21-11.ELsmp kernel and the issue looks to be resolved.
BogoMIPS looks correct for all cpus and the earlier attached patch is
present in the kernel src directory.
I believe this bug can be closed. Bug #108595 should also be closable
after the reporters have verified it works for them.
----- Additional Comments From email@example.com(prefers email via firstname.lastname@example.org) 2004-04-12 19:03 -------
I'm closing the LTC bug, as this issue is resolved.
*** Bug 110999 has been marked as a duplicate of this bug. ***
An errata has been issued which should help the problem described in this bug report.
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen
this bug report if the solution does not work for you.
*** Bug 108595 has been marked as a duplicate of this bug. ***
I have an HP Proliant DL360 with
Internal SCSI Card (SmartArray 6i) ==> System Disk
Aditional SCSI CARD 1 (LSI Logic 53c1030) ==> SCSI Disk Array
Additional SCSI Card 2 (LSI Logic 53c1030) ==> Tape DRIVE HP Ultrium 215
System is RHEL3 U4 ES with kernel 2.4.21-37 (I also try with updated kernel
I have similary bug when usin LTO Tape drive
Mar 30 14:55:44 galibier kernel: scsi : aborting command due to timeout : pid
313, scsi1, channel 0, id 5, lun 0 Log Sense 00 7e 00 00 00 00 00 ff 40
Mar 30 14:55:44 galibier kernel: mptscsih: ioc1: id=5 OldAbort: scheduling ABORT
SCSI IO (sc=c3588600)
Mar 30 14:55:45 galibier kernel: SCSI host 1 abort (pid 313) timed out - resetting
Mar 30 14:55:45 galibier kernel: SCSI bus is being reset for host 1 channel 0.
Mar 30 14:55:45 galibier kernel: mptscsih: ioc1: id=5 OldReset: scheduling
BUS_RESET SCSI IO (sc=c3588600)
Mar 30 14:55:45 galibier kernel: mptbase: ioc1: WARNING - IOCStatus(0x0048):
SCSI Task Terminated
I think it is the same bug, so I ask it to be reoppen.
Emmanuel, while your particular problem may be a result of missed/lost
interrupts due to bugs in the lost tick compensation code (as described in
this bug), it won't be addressed by the patch attached to this bugzilla. This
bug report and attached patch applies only to specific IBM xSeries hardware
utilizing the IBM Summit chipset.