Bug 448449
Summary: | CLOCK_MONOTONIC can exhibit "time warps" by jumping back in time | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | David Holmes <David.Holmes> | ||||||
Component: | realtime-kernel | Assignee: | Luis Claudio R. Goncalves <lgoncalv> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | beta | CC: | bhu, lgoncalv, mfuruta, williams | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2015-04-08 17:33:05 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
David Holmes
2008-05-26 23:39:45 UTC
Could you please attach here: - A better description of the hardware were you found the failures - lspci -v - /proc/cpuinfo I have the impression the tests ran in a 32bits kernel. Is it a safe assumption? Created attachment 306861 [details] Dump of http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh I've attached the output of http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh which was requested by Ingo Molnar last year. The system is running the 32-bit kernel. Hardware information is as follows: These are Supermicro machines. Each of them consists of 1U Supermicro chassis [1], Supermicro motherboard [2], Intel CPU, memory, HDD, and CDROM drive all put together by a local retailer. [1] http://www.supermicro.com/products/chassis/1U/512/SC512F-260.cfm [2] http://www.supermicro.com/products/motherboard/Xeon3000/3000/PDSMi+.cfm As requested: [root@vmsqe-core2q-02 ~]# lspci -v 00:00.0 Host bridge: Intel Corporation E7230/3000/3010 Memory Controller Hub (rev c0) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information 00:01.0 PCI bridge: Intel Corporation E7230/3000/3010 PCI Express Root Port (rev c0) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Capabilities: [88] #0d [0000] Capabilities: [80] Power Management version 2 Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Capabilities: [a0] Express Root Port (Slot+) IRQ 0 Capabilities: [100] Virtual Channel Capabilities: [140] Unknown (5) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=09, subordinate=0a, sec-latency=0 Memory behind bridge: e8100000-e81fffff Capabilities: [40] Express Root Port (Slot+) IRQ 0 Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Capabilities: [90] #0d [0000] Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel Capabilities: [180] Unknown (5) 00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=0d, subordinate=0d, sec-latency=0 I/O behind bridge: 00004000-00004fff Memory behind bridge: e8000000-e80fffff Capabilities: [40] Express Root Port (Slot+) IRQ 0 Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Capabilities: [90] #0d [0000] Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel Capabilities: [180] Unknown (5) 00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0 I/O behind bridge: 00005000-00005fff Memory behind bridge: e8200000-e82fffff Capabilities: [40] Express Root Port (Slot+) IRQ 0 Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Capabilities: [90] #0d [0000] Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel Capabilities: [180] Unknown (5) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI]) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, medium devsel, latency 0, IRQ 18 I/O ports at 3000 [size=32] 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI]) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, medium devsel, latency 0, IRQ 19 I/O ports at 3020 [size=32] 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI]) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, medium devsel, latency 0, IRQ 20 I/O ports at 3040 [size=32] 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI]) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, medium devsel, latency 0, IRQ 16 I/O ports at 3060 [size=32] 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI]) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, medium devsel, latency 0, IRQ 18 Memory at e8600000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=0f, subordinate=0f, sec-latency=32 I/O behind bridge: 00006000-00006fff Memory behind bridge: e8300000-e83fffff Prefetchable memory behind bridge: 00000000e0000000-00000000e7f00000 Capabilities: [50] #0d [0000] 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information 00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01) (prog-if 80 [Master]) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19 I/O ports at 01f0 [size=8] I/O ports at 03f4 [size=1] I/O ports at 0170 [size=8] I/O ports at 0374 [size=1] I/O ports at 30b0 [size=16] Capabilities: [70] Power Management version 2 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: medium devsel, IRQ 19 I/O ports at 1100 [size=32] 09:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A (rev 09) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=09, secondary=0a, subordinate=0a, sec-latency=64 Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0 Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [6c] Power Management version 2 Capabilities: [d8] PCI-X bridge device Capabilities: [100] Advanced Error Reporting Capabilities: [300] Power Budgeting 09:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09) (prog-if 20 [IO(X)-APIC]) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, fast devsel, latency 0 Memory at e8100000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Express Endpoint IRQ 0 Capabilities: [6c] Power Management version 2 0d:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03) Subsystem: Super Micro Computer Inc Unknown device 108c Flags: bus master, fast devsel, latency 0, IRQ 219 Memory at e8000000 (32-bit, non-prefetchable) [size=128K] I/O ports at 4000 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+ Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 80-50-90-ff-ff-48-30-00 0e:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller Subsystem: Super Micro Computer Inc Unknown device 109a Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at e8200000 (32-bit, non-prefetchable) [size=128K] I/O ports at 5000 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 81-50-90-ff-ff-48-30-00 0f:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) (prog-if 00 [VGA]) Subsystem: Super Micro Computer Inc Unknown device 7980 Flags: bus master, stepping, fast Back2Back, medium devsel, latency 66, IRQ 11 Memory at e0000000 (32-bit, prefetchable) [size=128M] I/O ports at 6000 [size=256] Memory at e8300000 (32-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at e8320000 [disabled] [size=128K] Capabilities: [50] Power Management version 2 [root@vmsqe-core2q-02 ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Quad CPU @ 2.40GHz stepping : 7 cpu MHz : 2394.041 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 4790.70 clflush size : 64 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Quad CPU @ 2.40GHz stepping : 7 cpu MHz : 2394.041 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 4785.49 clflush size : 64 processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Quad CPU @ 2.40GHz stepping : 7 cpu MHz : 2394.041 cache size : 4096 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 4788.01 clflush size : 64 processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Quad CPU @ 2.40GHz stepping : 7 cpu MHz : 2394.041 cache size : 4096 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 4788.02 clflush size : 64 We have tested two Supermicro boards with Intel processors but had no success on reproducing the bug. Did you have any success reproducing this bug on other similar machines? As per the original description: - All the failing machines are Supermicro systems with the same basic hardware, differing only in CPU (ie number of cores). Other hardware has not exhibited the same failures running the same OS version. - The four machines running RHEL-RT 5.1 (kernel 2.6.21) all exhibited time-warps. - The one machine running Ubuntu 7.10 (kernel 2.6.22) did not - Rebooting one of the failing machines into an earlier RHEL version (kernel 2.6.18-47.el5PAE) did not experience any CLOCK_MONOTONIC time-warps over a 24 hour period. But even on the failing machines this bug can disappear for long periods. It's almost as if the value of the time affects whether or not the bug appears. A few questions: - Have you been able to reproduce this issue with MRG GA kernel, 2.6.24.7-65.el5rt? - Have you been able to reproduce this issue in any other machines (except the two ones cited in the report)? - Are both failing machines quad-core? - Would that be possible running the cfs-debug-info.sh script against 2.6.24.7-65.el5rt? - Are you using NTP? Have you niticed anything different on the logs? This issue is taking long to be solved but I really haven't been able to reproduce the issue. If the answer to the questions above doesn't give us any clues, I will instrument a kernel to check on clocksources, ntp and GTOD events. Or prepare a script to run with the kernel-rt-trace kernel. - None of the machines have been tested with the latest kernel as yet (most recent was 2.6.24.4-30.el5rt) - This was reproduced on 4 machines running RH, and failed to reproduce on a similar machine running Ubuntu - as per original description. - Failing machines have been both dual-core and quad-core - Will need to get access to machines to gather requested info, once they have been updated to latest kernel. - Need to check ntp status once I can access the machines again. No info on anything in logs. I reproduced a time-warp with 2.6.24.7-65: vmsqe-core2q-02 > ./time-warp-test-rh-notsc 4 CPUs, running 4 parallel test-tasks. checking for time-warps via: - gettimeofday (TOD) syscall (usec resolution) - clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution) - dummy monotonic counter for testing new TOD-warp maximum: -5842 usecs, 0004515713524496 -> 0004515713522dc4 new TOD-warp maximum: -30834 usecs, 0004515715b50034 -> 0004515715b487c2 new TOD-warp maximum: -75572 usecs, 000451574c8bc581 -> 000451574c8a9e4d new TOD-warp maximum: -119428 usecs, 000451579d8d2210 -> 000451579d8b4f8c new TOD-warp maximum: -142421 usecs, 000451595902dfe9 -> 000451595900b394 new TOD-warp maximum: -868400 usecs, 0004516d93818c3f -> 0004516d93744c0f new TOD-warp maximum: -1004857 usecs, 0004517239390eba -> 000451723929b981 new CLOCK-warp maximum: -4294955983 nsecs, 000278fdfa570f39 -> 000278fcfa573b6a | 5.31 us, TOD-warps:3251 | 5.31 us, CLOCK-warps:1 | 5.23 us, COUNTER-warps:0 \ This is a quad-core machine. The warp happened somewhere between 24 and 36 hours of testing. ntpd is not running. Output from cfs-debug-info.sh will be attached. Created attachment 311211 [details]
cfs-debug-info output
Thanks David. I will walk through the data and try to find a similar quad-core machine here to reproduce the issue. I have been running the test for 8+ days with no single error message in a Supermicro MB with an Intel processor. This is just a FYI. Which processor exactly? We've seen issues with the Core2 systems but not the Xeon "Woodcrest". Thanks. There has been no recent activity on this issue. Is this still a problem? I'm running the time-warp test on one of the original machines to see if this still reproduces, but as noted in the previous comments this problem sometimes disappears for long periods of time. We do know that even with Solaris we sometimes get strange timing issues on these kinds of machines. I haven't seen a time warp in quite some time... This issue has not been updated in a while and is using an older, unsupported kernel. This BZ is being closed WONTFIX. If you believe this is still an issue on our most recent MRG-2.5 3.10 Realtime kernel, please file a new issue for further investigation. |