462853 – Time fails to pass on nanosleep()

Bug 462853 - Time fails to pass on nanosleep()

Summary: Time fails to pass on nanosleep()

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Rik van Riel
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	492568
TreeView+	depends on / blocked

Reported:	2008-09-19 11:44 UTC by Martin Poole
Modified:	2018-10-20 01:08 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-05-27 19:35:08 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
nanosleep() test program (3.35 KB, text/plain) 2008-09-19 11:45 UTC, Martin Poole	no flags	Details
backport of upstream patch that rounds up sleep (1.05 KB, patch) 2008-09-19 16:01 UTC, Rik van Riel	no flags	Details \| Diff
View All

Description Martin Poole 2008-09-19 11:44:16 UTC

Description of problem:

Customer is experiencing application problems because time is failing to advance despite calling nanosleep() and receiving no error.


Version-Release number of selected component (if applicable):

kernel-xen-2.6.18-92.1.10.el5

also seen with earlier kernels, -67 and -92


How reproducible:

Always on customer system using test program

Steps to Reproduce:
1. Running attached nanotest program
2.
3.
  
Actual results:

Over multiple runs a wide variation in results is seen

backwards = 0
no_time = 0
short_sleep = 0
long_sleep = 22
Elapsed: 240
--
backwards = 0
no_time = 5262
short_sleep = 20
long_sleep = 50
Elapsed: 156
--
backwards = 0
no_time = 9864
short_sleep = 37
long_sleep = 99
Elapsed: 99
--
backwards = 0
no_time = 9863
short_sleep = 35
long_sleep = 102
Elapsed: 102
--
backwards = 0
no_time = 6627
short_sleep = 22
long_sleep = 71
Elapsed: 146

Expected results:

backwards = 0
no_time = 0
short_sleep = 0


Additional info:

System is "PRIMERGY RX200 S3"

with one socket filled,

Handle 0x0004, DMI type 4, 35 bytes.
Processor Information
        Socket Designation: CPU 1
        Type: Central Processor
        Family: Xeon
        Manufacturer: Intel
        ID: FB 06 00 00 FF FB EB BF
        Signature: Type 0, Family 6, Model 15, Stepping 11
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                DS (Debug store)
                ACPI (ACPI supported)
                MMX (MMX technology supported)
                FXSR (Fast floating-point save and restore)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                SS (Self-snoop)
                HTT (Hyper-threading technology)
                TM (Thermal monitor supported)
                PBE (Pending break enabled)
        Version: Intel(R) Xeon(R) CPU 5148 @ 
        Voltage: 1.5 V
        External Clock: 1333 MHz
        Max Speed: 2333 MHz
        Current Speed: 2333 MHz
        Status: Populated, Enabled
        Upgrade: ZIF Socket
        L1 Cache Handle: 0x0006
        L2 Cache Handle: 0x0007
        L3 Cache Handle: 0x0008
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified

Comment 1 Martin Poole 2008-09-19 11:45:45 UTC

Created attachment 317183 [details]
nanosleep() test program

Comment 2 Martin Poole 2008-09-19 11:55:30 UTC

Latest tests were run with notsc.

Similar results are seen in the RHEL5 DomU.

Comment 4 Rik van Riel 2008-09-19 16:01:00 UTC

Created attachment 317208 [details]
backport of upstream patch that rounds up sleep

I am building a test kernel with this backported patch now and will test the nanosleep() test program with and without this patch.

Comment 5 Rik van Riel 2008-09-19 22:09:59 UTC

I cannot seem to reproduce the short sleep problem on my hardware, even though I do think I understand why it can happen.

Do we have any hardware in-house on which the bug is reproducible?

Comment 6 Rik van Riel 2008-10-10 18:33:11 UTC

Martin (Poole),

if I get you a test kernel with the patch, could you get it tested at the customer site?  I have not found any hardware here that reproduces the bug, but the patch is low risk enough that testing at the customer site should be enough to get it approved for merging in a RHEL update.

Comment 8 Issue Tracker 2008-10-13 11:28:16 UTC

Customer is willing to test an experimental kernel and will even be able to
install it today (october 13th) or tomorrow (14th) if we give it to him
now.

Internal Status set to 'Waiting on Engineering'
Status set to: Waiting on Tech

This event sent from IssueTracker by akunysz 
 issue 173294

Comment 9 Rik van Riel 2008-10-13 17:47:58 UTC

I have made test kernels available at

http://people.redhat.com/riel/.bz462853/

Please let me know whether the test kernel resolves the issue.

Comment 10 Issue Tracker 2008-10-14 07:21:17 UTC

Thank you. Customer has been given test kernel. Waiting for feedback.

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by akunysz 
 issue 173294

Comment 13 Rik van Riel 2008-10-15 19:25:46 UTC

Since the patch is upstream, safe, obviously correct and greatly improves the test case for the customer, I will submit it for inclusion in a RHEL update.

There may be other unrelated time bugs that caused the issue to show up on one of the domUs.

Comment 14 Rik van Riel 2008-10-15 19:35:10 UTC

Posted the patch for internal review.

Comment 42 Rik van Riel 2009-05-27 19:35:08 UTC

The bug only happens on one specific system and can not be reproduced on other systems of the same model.  Putting in a workaround for one specific system entails too much risk for next to no gain, so CLOSED WORKSFORME.

Please reopen if the bug can be triggered on multiple systems.

Note You need to log in before you can comment on or make changes to this bug.