Bug 152630
Description
wingc
2005-03-30 21:16:41 UTC
Created attachment 112479 [details]
dmesg from 2.4.21-27.0.2.EL (failure case- clock runs at double speed)
This is the boot log from an unpatched 2.4.21-27.0.2.EL
(kernel-kernel-2.4.21-27.EL.x86_64.rpm)
The clock runs at double normal speed.
Created attachment 112480 [details]
dmesg output from patched 2.4.21-27.0.2.EL (clock runs properly)
This is the boot log from 2.4.21-27.0.2.EL with my patch.
The clock runs at the correct rate, and NTP can synchronize properly.
Created attachment 112481 [details] [PATCH 1/4] [ACPI] enhance intr-src-override parsing to handle ES7000 Apply this patch first to RHEL3 2.4.21-27.0.2.EL kernel. This is based on the changeset: [ACPI] enhance intr-src-override parsing to handle ES7000 http://linux.bkbits.net:8080/linux-2.4/cset@4085c7237X3p0GB-qUMTF6YhNnxSTA Created attachment 112483 [details] [PATCH 2/4] [ACPI] handle SCI override to nth IOAPIC Apply this patch second to 2.4.21-27.0.2.EL after applying patch #1. This is based on the linux-2.4 changeset: [ACPI] handle SCI override to nth IOAPIC http://linux.bkbits.net:8080/linux-2.4/cset@40d27b61Ia-EhvtZw9wtHiKwJl6krQ Created attachment 112485 [details] [PATCH 3/4] [PATCH] i386 and x86_64 ACPI mpparse timer bug Apply this patch third to 2.4.21-27.0.2.EL, after applying patches #1 and #2. This patch is based on the linux-2.4 changeset: [PATCH] i386 and x86_64 ACPI mpparse timer bug http://linux.bkbits.net:8080/linux-2.4/cset@40d6d35ddJnSnjsuIq54YJLwar1vhA Created attachment 112486 [details] [PATCH 4/4] revert part of changeset #1 (arch/x86_64/kernel/acpi.c) Finally, apply this patch to 2.4.21-27.0.2.EL after applying #1, #2, #3. This reverts the change that the first changeset: [ACPI] enhance intr-src-override parsing to handle ES7000 http://linux.bkbits.net:8080/linux-2.4/cset@4085c7237X3p0GB-qUMTF6YhNnxSTA made to 'arch/x86_64/kernel/acpi.c'. If this change is not reverted, the clock still runs at the incorrect rate (double normal speed) Created attachment 112487 [details]
output of 'lspci' on my ATI Radeon Xpress 200 motherboard
PCI devices on my motherboard.
problem still exists in most recent RHEL3 U5 beta (kernel 2.4.21-31.EL); not that I'd expect it to be fixed based on the changes in U5 so far. Created attachment 112515 [details]
dmesg from 2.4.21-31.EL (RHEL3 U5 beta)
boot log from U5 beta kernel (2.4.21-31.EL).
Nothing significant has changed; the 'Setting APIC routing to flat' message has
appeared but this shouldn't make a difference on single CPU, PC style machines
anyway.
The clock still runs at double speed.
More information. It looks like my patch changes the routing of the timer interrupt, although according to /proc/interrupts the number of timer interrupts received per second is the same with or without the patch. Something is different with the local APIC timer interrupt with the patches. On a bad kernel (2.4.21-27.0.2.EL); clock runs at double speed: % cat /proc/interrupts; sleep 10; cat /proc/interrupts 0: 15247 IO-APIC-edge timer LOC: 7596 0: 16251 IO-APIC-edge timer LOC: 8097 (100 timer ints/second, but only 50 local APIC timer ints/second?) On a kernel with my patches (2.4.21-27.0.2.EL): % cat /proc/interrupts; sleep 10; cat /proc/interrupts 0: 55115 XT-PIC timer LOC: 55067 0: 56116 XT-PIC timer LOC: 56068 (100 timer ints/second, and 100 local APIC timer ints/second) The U5 beta kernel (2.4.21-31.EL) acts the same way as unpached 2.4.21-27.0.2.EL. (timer interrupt shows up as IO-APIC-edge, 100 timer ints/sec, 50 LOC ints/sec) My patch causes the timer interrupt to be handled via 'XT-PIC' and the clock works properly. I don't understand what's going on well enough to know what this means... Also fails with the SMP kernel. (2.4.21-27.0.2.ELsmp) The clock runs twice as fast. Behavior of /proc/interrupts on 2.4.21-27.0.2.ELsmp: % cat /proc/interrupts; sleep 10; cat /proc/interrupts 0: 47872 IO-APIC-edge timer LOC: 23913 0: 48874 IO-APIC-edge timer LOC: 24413 (100 timer ints/sec, 50 local APIC timer ints/sec) I just realized that I was an idiot when it comes to interpreting the results of cat /proc/interrupts. Since the clock is running at double the normal rate, 'sleep 10' completes in 5 seconds, not 10. So, the correct analysis should have been: On a broken kernel: 200 timer ints/second, 100 local APIC timer ints/second On a working kernel: 100 timer ints/second, 100 local APIC timer ints/second. So, to confirm, the machine receives twice as many timer interrupts per second as it should, and this is why the clock runs twice as fast. Confirmed that the bug still exists in RHEL4. (in the initial release kernel 2.6.9-5.EL): $ cat /proc/interrupts; sleep 10; cat /proc/interrupts 0: 2728975 IO-APIC-edge timer LOC: 1364203 0: 2738985 IO-APIC-edge timer LOC: 1369207 (the 'sleep 10' completes in 5 seconds, so there are 2000 timer ints/sec, and 1000 local APIC ints/second. Bug 163347 describes a similar problem. could someone with an affected system possibly test this kernel: http://people.redhat.com/bmaly/linux-2.4.21-ATIfixes.tar.gz This kernel has a patch already applied, which is a backport of a 2.6 kernel patch. the 2.6 patch resolves this issue on all AMD64 systems with ATI and hopefully will have positive results on a 2.4 kernel. "disable_timer_pin_1" may need to be passed in as a boot arg on some systems. is this bug affecting RHEL4 as well? bug 173236 is the exact same issue (affecting ATI chipsets) but for RHEL4. Does the ServerWorks chipset also behave badly on RHEL4? If so I can add a check for ServerWorks and re-post the RHEL4 patch as well. Created attachment 127024 [details]
patch to disable IRQ 0
A fix for this problem has just been committed to the RHEL3 U8 patch pool this evening (in kernel version 2.4.21-40.6.EL). This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 3.8 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 3.8 release. A kernel has been released that contains a patch for this problem. Please verify if your problem is fixed with the latest available kernel from the RHEL3 public beta channel at rhn.redhat.com and post your results to this bugzilla. Sorry, I no longer have the hardware in question for which this bug was originally reported. I am unable to test. Reverting to ON_QA. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0437.html |