Bug 502893

Summary: TSC synchronisation fails on Nehalem
Product: Red Hat Enterprise MRG Reporter: Andrew Gilligan <agilligan>
Component: realtime-kernelAssignee: Red Hat Real Time Maintenance <rt-maint>
Status: CLOSED ERRATA QA Contact: David Sommerseth <davids>
Severity: high Docs Contact:
Priority: low    
Version: 1.1CC: bhu, lgoncalv, ovasik, pbatkowski, tao, vanhoof
Target Milestone: 1.1.5   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-14 19:11:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output from 2.6.24.7-117 none

Description Andrew Gilligan 2009-05-27 16:20:53 UTC
Description of problem:
Kernel messages show that TSC synchronisation fails between cores on Nehalem-based systems.

A notable side-effect of this is that gettimeofday() calls take approximately 20 times longer to return.
A simple C program calling gettimeofday() 10 million times should take 0.3 seconds - on the affected kernels this takes 5.8 seconds.


Version-Release number of selected component (if applicable):
kernel-rt-2.6.24.7-101
kernel-rt-2.6.24.7-108
kernel-rt-2.6.24.7-111

How reproducible:
always

Steps to Reproduce:
Boot into any one of 2.6.24.7-101, 108 or 111 on a Nehalem machine.
 
Actual results:
kernel: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz stepping 05
kernel: checking TSC synchronization [CPU#0 -> CPU#1]:
kernel: Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
kernel: Marking TSC unstable due to check_tsc_sync_source failed


Expected results:
kernel: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz stepping 05
kernel: checking TSC synchronization [CPU#0 -> CPU#1]: passed.


Additional info:
Kernels 2.6.24.7-93 (and earlier) do not exhibit this behaviour.

Comment 1 Chris Van Hoof 2009-06-02 17:12:04 UTC
Andrew -- Can you attach a fresh dmesg from this host, and the output of:

head /sys/devices/system/clocksource/clocksource0/*

--chris

Comment 12 Andrew Gilligan 2009-06-11 11:24:59 UTC
Created attachment 347384 [details]
dmesg output from 2.6.24.7-117

Comment 13 Andrew Gilligan 2009-06-11 11:25:58 UTC
# head /sys/devices/system/clocksource/clocksource0/*
==> /sys/devices/system/clocksource/clocksource0/available_clocksource <==
hpet acpi_pm jiffies tsc 

==> /sys/devices/system/clocksource/clocksource0/current_clocksource <==
hpet

Comment 16 Paul Batkowski 2009-06-23 15:06:57 UTC
Andrew,

Issue is resolved in 2.6.24.7-119 and will be going into MRG 1.2. 

Paul

Comment 19 David Sommerseth 2009-07-09 15:14:53 UTC
Verified in mrg-rt.git as commit 2ff40aa32a4fbdff147ed64f4bac8f1b5425adf1 and found in kernel-rt-2.6.24.7-126 SRPM.

Comment 21 errata-xmlrpc 2009-07-14 19:11:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1157.html