Bug 591495

Summary: gtod backwards when running rhts test /kernel/syscalls/gettimeofday on ppc64 machine
Product: Red Hat Enterprise Linux 6 Reporter: Zhang Kexin <kzhang>
Component: kernelAssignee: Steve Best <sbest>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: brueckner, peterm, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-11 15:59:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 575278, 599016    
Attachments:
Description Flags
reproducer
none
Simple fix to stop gettimeofday() going backwards on ppc64
none
Proper fix for time going backwards none

Description Zhang Kexin 2010-05-12 12:17:57 UTC
Description of problem:
when running rhts test /kernel/syscalls/gettimeofday on rhel6 s390x kernel, 
gtod backwards happens sometimes. machine is ibm-js22-vios-01-lp1.rhts.eng.bos.redhat.com

Version-Release number of selected component (if applicable):
2.6.32-25

How reproducible:
always

Steps to Reproduce:
1. install rhts test /kernel/syscalls/gettimeofday
2. change into the test directory, and change the gtod_backwards loop count to
200.
3. make run
  
Actual results:
***** Start gtod_backwards *****
***** Start loop number 1 *****
Test start time = 1273663133.038369s
Test end time = 1273663134.564814s
***** Done loop number 1 *****
***** Start loop number 2 *****
Test start time = 1273663134.567711s
start time = 1273663135.168261 
end time = 1273663135.168260 
FAIL: time went backwards -1000 nsec (-1.999999 )
***** Done loop number 2 *****
***** Start loop number 3 *****
Test start time = 1273663135.171130s
Test end time = 1273663136.692549s
***** Done loop number 3 *****
***** Start loop number 4 *****
Test start time = 1273663136.695444s
Test end time = 1273663138.217518s
***** Done loop number 4 *****
***** Start loop number 5 *****
Test start time = 1273663138.220447s
start time = 1273663139.168263 
end time = 1273663139.168262 
FAIL: time went backwards -1000 nsec (-1.999999 )
***** Done loop number 5 *****


Expected results:
no backwards

Additional info:
there is similar bug on s390x machine, but it is fixed in 2.6.32-25, please see https://bugzilla.redhat.com/show_bug.cgi?id=575728

Comment 2 RHEL Program Management 2010-05-12 14:36:05 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 IBM Bug Proxy 2010-05-12 16:30:55 UTC
------- Comment From edpollar.com 2010-05-12 11:09 EDT-------
Reverse mirror of 591495 - gtod backwards when running rhts test /kernel/syscalls/gettimeofday on ppc64 machine

Comment 4 IBM Bug Proxy 2010-05-13 00:50:53 UTC
------- Comment From tonyb.com 2010-05-12 20:44 EDT-------
Can we get the source for /kernel/syscalls/gettimeofday  to aid in local reproduction?

Comment 5 Zhang Kexin 2010-05-13 05:29:42 UTC
Created attachment 413632 [details]
reproducer

Comment 6 IBM Bug Proxy 2010-06-16 13:01:26 UTC
Created attachment 424434 [details]
Simple fix to stop gettimeofday() going backwards on ppc64


------- Comment on attachment From paulus.com 2010-06-16 08:51 EDT-------


So, what's happening on ppc64 is that integer truncation in computing the 'stamp_xsec' value used in the VDSO gettimeofday implementation is causing the gettimeofday result to go backwards 1 microsecond occasionally when the kernel updates the VDSO data (which it does every tick).

The stamp_xsec value is the time as of the update in units of 1/2^20 seconds, approximately 0.954 microseconds, known as xsecs.  With CONFIG_HZ = 100, the tick is 10000 microseconds long, or 10485.76 xsecs.  That means that on successive ticks the stamp_xsec value will increment by either 10485 or 10486.  If userspace does gettimeofday right near the end of a tick and the rounding happens just right (or wrong :) it can get a value that is 10486 xsec since the last update.  If the kernel updates the vdso data then and stamp_xsec advances by 10485, and userspace then immediately does another gettimeofday(), it can get a value that is only 10485 past the previous update.  Under the right conditions, this can give a microseconds value that is 1 less than the previous value.

There are two ways to fix this.  The first is very simple but slightly less than ideal -- it involves reducing the 'tb_to_xs' value that userspace uses to convert timebase counts to xsecs by a small amount (0.005% for CONFIG_HZ=100) so that the time computed by userspace will run very slightly slower during the tick and end up about 0.5 microseconds slow by the end of the tick.  That's enough to avoid time going backwards due to integer truncation in computing stamp_xsec.  Its disadvantage is that the time computed by gettimeofday() is very slightly inaccurate and may be behind what the kernel computes and uses internally by up to around 500 nanoseconds.

The second way involves more code change and needs a new field in the vdso data page structure, but gives a more accurate result.  It involves changes to the code in the VDSO to use a different method, not involving stamp_xsec, to convert the timebase to the time of day.  We can't actually remove the stamp_xsec field since the structure is exposed in /proc/ppc64/systemcfg and is part of the user/kernel ABI now.

The patch attached here implements the first alternative.  I am still working on a patch for the second alternative, which I will send upstream, but it will be a more invasive patch.

Comment 7 IBM Bug Proxy 2010-06-17 13:11:27 UTC
Created attachment 424806 [details]
Proper fix for time going backwards


------- Comment on attachment From paulus.com 2010-06-17 09:01 EDT-------


This is the alternative patch which fixes the problem properly by modifying the VDSO code to not use the stamp_xsec field, and instead use a new field which stores the nanoseconds as a 0.32 binary fraction in a new field in the vdso_data.  This is the patch which I will be sending upstream shortly to fix the problem in the mainline Linux kernel.

The advantages of this patch are that it makes gettimeofday() and clock_gettime() slightly faster (gettimeofday() in a 64-bit process takes 32.2ns on a POWER7 with the patch compared to 37.4ns without) and it fixes the main problem without losing accuracy.  The disadvantage of this patch compared to the other one is that it is larger and more invasive, so may present more risk, though I have checked and tested it thoroughly.

Note that you should not apply both patches; apply one or the other but not both.

One point in the patch is worth mentioning -- in testing I found instances where update_vsyscall() got called with xtime.tv_nsec = 1000000000 or 1000000001.  That may indicate a bug in generic code.

Comment 9 Steve Best 2010-06-22 12:43:55 UTC
posted to rh-kernel mailing list
http://post-office.corp.redhat.com/archives/rhkernel-list/2010-June/msg01297.html

Comment 10 IBM Bug Proxy 2010-06-25 15:31:50 UTC
------- Comment From tpnoonan.com 2010-06-25 11:22 EDT-------
per rh planned for ss7

Comment 11 Aristeu Rozanski 2010-07-01 16:24:39 UTC
Patch(es) available on kernel-2.6.32-42.el6

Comment 14 Zhang Kexin 2010-07-05 09:19:04 UTC
tested on ibm-js22-vios-01-lp1.rhts.eng.bos.redhat.com with kernel version 2.6.32-42.el6.ppc64, ran /kernel/power-management/clock_gettime and /kernel/power-management/gettimeofday , backward does not happen.
set it as verified.

Comment 15 releng-rhel@redhat.com 2010-11-11 15:59:33 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.