Bug 462853

Summary: Time fails to pass on nanosleep()
Product: Red Hat Enterprise Linux 5 Reporter: Martin Poole <mpoole>
Component: kernel-xenAssignee: Rik van Riel <riel>
Status: CLOSED WORKSFORME QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: akarlsso, akunysz, dzickus, gasmith, tao, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-27 19:35:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 492568    
Attachments:
Description Flags
nanosleep() test program
none
backport of upstream patch that rounds up sleep none

Description Martin Poole 2008-09-19 11:44:16 UTC
Description of problem:

Customer is experiencing application problems because time is failing to advance despite calling nanosleep() and receiving no error.


Version-Release number of selected component (if applicable):

kernel-xen-2.6.18-92.1.10.el5

also seen with earlier kernels, -67 and -92


How reproducible:

Always on customer system using test program

Steps to Reproduce:
1. Running attached nanotest program
2.
3.
  
Actual results:

Over multiple runs a wide variation in results is seen

backwards = 0
no_time = 0
short_sleep = 0
long_sleep = 22
Elapsed: 240
--
backwards = 0
no_time = 5262
short_sleep = 20
long_sleep = 50
Elapsed: 156
--
backwards = 0
no_time = 9864
short_sleep = 37
long_sleep = 99
Elapsed: 99
--
backwards = 0
no_time = 9863
short_sleep = 35
long_sleep = 102
Elapsed: 102
--
backwards = 0
no_time = 6627
short_sleep = 22
long_sleep = 71
Elapsed: 146

Expected results:

backwards = 0
no_time = 0
short_sleep = 0


Additional info:

System is "PRIMERGY RX200 S3"

with one socket filled,

Handle 0x0004, DMI type 4, 35 bytes.
Processor Information
        Socket Designation: CPU 1
        Type: Central Processor
        Family: Xeon
        Manufacturer: Intel
        ID: FB 06 00 00 FF FB EB BF
        Signature: Type 0, Family 6, Model 15, Stepping 11
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                DS (Debug store)
                ACPI (ACPI supported)
                MMX (MMX technology supported)
                FXSR (Fast floating-point save and restore)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                SS (Self-snoop)
                HTT (Hyper-threading technology)
                TM (Thermal monitor supported)
                PBE (Pending break enabled)
        Version: Intel(R) Xeon(R) CPU 5148 @ 
        Voltage: 1.5 V
        External Clock: 1333 MHz
        Max Speed: 2333 MHz
        Current Speed: 2333 MHz
        Status: Populated, Enabled
        Upgrade: ZIF Socket
        L1 Cache Handle: 0x0006
        L2 Cache Handle: 0x0007
        L3 Cache Handle: 0x0008
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified

Comment 1 Martin Poole 2008-09-19 11:45:45 UTC
Created attachment 317183 [details]
nanosleep() test program

Comment 2 Martin Poole 2008-09-19 11:55:30 UTC
Latest tests were run with notsc.

Similar results are seen in the RHEL5 DomU.

Comment 4 Rik van Riel 2008-09-19 16:01:00 UTC
Created attachment 317208 [details]
backport of upstream patch that rounds up sleep

I am building a test kernel with this backported patch now and will test the nanosleep() test program with and without this patch.

Comment 5 Rik van Riel 2008-09-19 22:09:59 UTC
I cannot seem to reproduce the short sleep problem on my hardware, even though I do think I understand why it can happen.

Do we have any hardware in-house on which the bug is reproducible?

Comment 6 Rik van Riel 2008-10-10 18:33:11 UTC
Martin (Poole),

if I get you a test kernel with the patch, could you get it tested at the customer site?  I have not found any hardware here that reproduces the bug, but the patch is low risk enough that testing at the customer site should be enough to get it approved for merging in a RHEL update.

Comment 8 Issue Tracker 2008-10-13 11:28:16 UTC
Customer is willing to test an experimental kernel and will even be able to
install it today (october 13th) or tomorrow (14th) if we give it to him
now.

Internal Status set to 'Waiting on Engineering'
Status set to: Waiting on Tech

This event sent from IssueTracker by akunysz 
 issue 173294

Comment 9 Rik van Riel 2008-10-13 17:47:58 UTC
I have made test kernels available at

http://people.redhat.com/riel/.bz462853/

Please let me know whether the test kernel resolves the issue.

Comment 10 Issue Tracker 2008-10-14 07:21:17 UTC
Thank you. Customer has been given test kernel. Waiting for feedback.

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by akunysz 
 issue 173294

Comment 13 Rik van Riel 2008-10-15 19:25:46 UTC
Since the patch is upstream, safe, obviously correct and greatly improves the test case for the customer, I will submit it for inclusion in a RHEL update.

There may be other unrelated time bugs that caused the issue to show up on one of the domUs.

Comment 14 Rik van Riel 2008-10-15 19:35:10 UTC
Posted the patch for internal review.

Comment 42 Rik van Riel 2009-05-27 19:35:08 UTC
The bug only happens on one specific system and can not be reproduced on other systems of the same model.  Putting in a workaround for one specific system entails too much risk for next to no gain, so CLOSED WORKSFORME.

Please reopen if the bug can be triggered on multiple systems.