The following has be reported by IBM LTC: RHEL 3 will need to pick up the cyclone-lpj-fix patch Hardware Environment: x440 and x445 Software Environment: Distros based on kernels < 2.4.23 Steps to Reproduce: 1. Boot RHEL 2. Observe the BogoMIPS rating given to each cpu Actual Results: Occasionally we'll see something like: Calibrating delay loop... 3.27 BogoMIPS Expected Results: Always seeing something like: Calibrating delay loop... 199.47 BogoMIPS Additional Information: This issue is caused by the lost-tick compensation code in the 2.4 kernel using loops_per_jiffy before that value is calculated. This can cause loops_per_jiffy to be miscalculated, which may cause SCSI hangs at boot, occasional keyboard and mouse hangs in X as well as other unseen issues. Normally the problem only occurs if the last cpu booted mis-calculates loops_per_jiffy, so it seems to show up rarely. The fix is to apply the patch submitted to lkml (now in 2.4.23-rc1) seen here: http://www.ussg.iu.edu/hypermail/linux/kernel/0311.0/0329.html Here is the patch included into 2.4.23-rc1As a side note, this problem was found while testing SLES8, so it has been fixed and does not affect SuSE. Glen/Greg - please submit the patch above to Red Hat for RHEL3. Thanks.I'll bring this patch up during my telecon today with Greg Kelleher.
Created attachment 96003 [details] linux-2.4.23-pre9_cyclone-lpj-fix_A0.patch
------ Additional Comments From khoa.com 2003-25-11 14:29 ------- Just put this bug report in the correct state/ownership....
Chris, adding you to this bug, I thought all was good with timer fixes for x440 and x445 ?
Bob: Not quite. There is a subtle race in the lost-ticks compensation code. It appears to only bite us with certain combinations of cpu numbers, frequencies and SMT. (ie: 1 cpu @ 2Ghz w/ HT, 8 cpus @ 2.8Ghz w/o HT). It was not seen in testing RHEL 3.0, but was discovered by Andrea Arcangeli while testing for SLES8 SP3 (after RHEL 3.0 had gone gold). The patch (now accepted into 2.4.23-rc1) was ported and submitted as soon as possible after the issue was found. Let me know if you have any further questions.
RHEL 2.1 Update 4 should also take this fix.
I still see bogoMips of 1.5 on a x440 with 2.4.21-7.ELsmp. Please take this patch.
We have customer in the field seeing this issue with X445's. They report that running with HyperThreading disabled they see the issue (and the clock runs so fast they can't login) but the issue is not apparent when HyperThreading is enabled. This customer wishes to run with HyperThreading off for their application. (CRM #282865)
Please see Bug numbers #108595 and #110999 for aparently related side effects. We are seeing both issues (SCSI hangs on on-board disk array, and Fast system clock) on 2 xSeries 445s, one with dual 2.5 Ghz and one with quad 2.8 Ghz processors. We are currently running RHEL AS 3.0 Update 1 Kernel ( 2.4.21-9.ELsmp). We have also seen both problems in hugemem kernel. I've attached dmesg and other details to both bugs. Also we see both problems with HyperThreading on and HyperThreading off, when set from the boot parms, we have yet to test with HyperThreading off via the bios.
I opened bug #115061 to track this issue under RHEL 2.1
The patch in comment #1 has been committed to the RHEL 3 U2 patch pool tonight, and it will first be available internally in the Red Hat Engineering build of kernel version 2.4.21-9.11.EL.
Adding to the blocker to keep track of it.
Tested the 2.4.21-11.ELsmp kernel and the issue looks to be resolved. BogoMIPS looks correct for all cpus and the earlier attached patch is present in the kernel src directory. I believe this bug can be closed. Bug #108595 should also be closable after the reporters have verified it works for them.
----- Additional Comments From jstultz.com(prefers email via johnstul.com) 2004-04-12 19:03 ------- I'm closing the LTC bug, as this issue is resolved.
*** Bug 110999 has been marked as a duplicate of this bug. ***
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-188.html
*** Bug 108595 has been marked as a duplicate of this bug. ***
I have an HP Proliant DL360 with Internal SCSI Card (SmartArray 6i) ==> System Disk Aditional SCSI CARD 1 (LSI Logic 53c1030) ==> SCSI Disk Array Additional SCSI Card 2 (LSI Logic 53c1030) ==> Tape DRIVE HP Ultrium 215 System is RHEL3 U4 ES with kernel 2.4.21-37 (I also try with updated kernel 2.4.21-40) I have similary bug when usin LTO Tape drive /var/log/messages : Mar 30 14:55:44 galibier kernel: scsi : aborting command due to timeout : pid 313, scsi1, channel 0, id 5, lun 0 Log Sense 00 7e 00 00 00 00 00 ff 40 Mar 30 14:55:44 galibier kernel: mptscsih: ioc1: id=5 OldAbort: scheduling ABORT SCSI IO (sc=c3588600) Mar 30 14:55:45 galibier kernel: SCSI host 1 abort (pid 313) timed out - resetting Mar 30 14:55:45 galibier kernel: SCSI bus is being reset for host 1 channel 0. Mar 30 14:55:45 galibier kernel: mptscsih: ioc1: id=5 OldReset: scheduling BUS_RESET SCSI IO (sc=c3588600) Mar 30 14:55:45 galibier kernel: mptbase: ioc1: WARNING - IOCStatus(0x0048): SCSI Task Terminated I think it is the same bug, so I ask it to be reoppen. Thanks
Emmanuel, while your particular problem may be a result of missed/lost interrupts due to bugs in the lost tick compensation code (as described in this bug), it won't be addressed by the patch attached to this bugzilla. This bug report and attached patch applies only to specific IBM xSeries hardware utilizing the IBM Summit chipset.