Bug 502893 - TSC synchronisation fails on Nehalem
Summary: TSC synchronisation fails on Nehalem
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.1
Hardware: x86_64
OS: Linux
low
high
Target Milestone: 1.1.5
: ---
Assignee: Red Hat Real Time Maintenance
QA Contact: David Sommerseth
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-27 16:20 UTC by Andrew Gilligan
Modified: 2018-10-20 00:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-14 19:11:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dmesg output from 2.6.24.7-117 (54.77 KB, text/plain)
2009-06-11 11:24 UTC, Andrew Gilligan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1157 0 normal SHIPPED_LIVE Important: kernel-rt security and bug fix update 2009-07-14 19:11:05 UTC

Description Andrew Gilligan 2009-05-27 16:20:53 UTC
Description of problem:
Kernel messages show that TSC synchronisation fails between cores on Nehalem-based systems.

A notable side-effect of this is that gettimeofday() calls take approximately 20 times longer to return.
A simple C program calling gettimeofday() 10 million times should take 0.3 seconds - on the affected kernels this takes 5.8 seconds.


Version-Release number of selected component (if applicable):
kernel-rt-2.6.24.7-101
kernel-rt-2.6.24.7-108
kernel-rt-2.6.24.7-111

How reproducible:
always

Steps to Reproduce:
Boot into any one of 2.6.24.7-101, 108 or 111 on a Nehalem machine.
 
Actual results:
kernel: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz stepping 05
kernel: checking TSC synchronization [CPU#0 -> CPU#1]:
kernel: Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
kernel: Marking TSC unstable due to check_tsc_sync_source failed


Expected results:
kernel: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz stepping 05
kernel: checking TSC synchronization [CPU#0 -> CPU#1]: passed.


Additional info:
Kernels 2.6.24.7-93 (and earlier) do not exhibit this behaviour.

Comment 1 Chris Van Hoof 2009-06-02 17:12:04 UTC
Andrew -- Can you attach a fresh dmesg from this host, and the output of:

head /sys/devices/system/clocksource/clocksource0/*

--chris

Comment 12 Andrew Gilligan 2009-06-11 11:24:59 UTC
Created attachment 347384 [details]
dmesg output from 2.6.24.7-117

Comment 13 Andrew Gilligan 2009-06-11 11:25:58 UTC
# head /sys/devices/system/clocksource/clocksource0/*
==> /sys/devices/system/clocksource/clocksource0/available_clocksource <==
hpet acpi_pm jiffies tsc 

==> /sys/devices/system/clocksource/clocksource0/current_clocksource <==
hpet

Comment 16 Paul Batkowski 2009-06-23 15:06:57 UTC
Andrew,

Issue is resolved in 2.6.24.7-119 and will be going into MRG 1.2. 

Paul

Comment 19 David Sommerseth 2009-07-09 15:14:53 UTC
Verified in mrg-rt.git as commit 2ff40aa32a4fbdff147ed64f4bac8f1b5425adf1 and found in kernel-rt-2.6.24.7-126 SRPM.

Comment 21 errata-xmlrpc 2009-07-14 19:11:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1157.html


Note You need to log in before you can comment on or make changes to this bug.