Bug 591495
Summary: | gtod backwards when running rhts test /kernel/syscalls/gettimeofday on ppc64 machine | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Zhang Kexin <kzhang> | ||||||||
Component: | kernel | Assignee: | Steve Best <sbest> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 6.0 | CC: | brueckner, peterm, syeghiay | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | ppc64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-11-11 15:59:33 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 575278, 599016 | ||||||||||
Attachments: |
|
Description
Zhang Kexin
2010-05-12 12:17:57 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. ------- Comment From edpollar.com 2010-05-12 11:09 EDT------- Reverse mirror of 591495 - gtod backwards when running rhts test /kernel/syscalls/gettimeofday on ppc64 machine ------- Comment From tonyb.com 2010-05-12 20:44 EDT------- Can we get the source for /kernel/syscalls/gettimeofday to aid in local reproduction? Created attachment 413632 [details]
reproducer
Created attachment 424434 [details]
Simple fix to stop gettimeofday() going backwards on ppc64
------- Comment on attachment From paulus.com 2010-06-16 08:51 EDT-------
So, what's happening on ppc64 is that integer truncation in computing the 'stamp_xsec' value used in the VDSO gettimeofday implementation is causing the gettimeofday result to go backwards 1 microsecond occasionally when the kernel updates the VDSO data (which it does every tick).
The stamp_xsec value is the time as of the update in units of 1/2^20 seconds, approximately 0.954 microseconds, known as xsecs. With CONFIG_HZ = 100, the tick is 10000 microseconds long, or 10485.76 xsecs. That means that on successive ticks the stamp_xsec value will increment by either 10485 or 10486. If userspace does gettimeofday right near the end of a tick and the rounding happens just right (or wrong :) it can get a value that is 10486 xsec since the last update. If the kernel updates the vdso data then and stamp_xsec advances by 10485, and userspace then immediately does another gettimeofday(), it can get a value that is only 10485 past the previous update. Under the right conditions, this can give a microseconds value that is 1 less than the previous value.
There are two ways to fix this. The first is very simple but slightly less than ideal -- it involves reducing the 'tb_to_xs' value that userspace uses to convert timebase counts to xsecs by a small amount (0.005% for CONFIG_HZ=100) so that the time computed by userspace will run very slightly slower during the tick and end up about 0.5 microseconds slow by the end of the tick. That's enough to avoid time going backwards due to integer truncation in computing stamp_xsec. Its disadvantage is that the time computed by gettimeofday() is very slightly inaccurate and may be behind what the kernel computes and uses internally by up to around 500 nanoseconds.
The second way involves more code change and needs a new field in the vdso data page structure, but gives a more accurate result. It involves changes to the code in the VDSO to use a different method, not involving stamp_xsec, to convert the timebase to the time of day. We can't actually remove the stamp_xsec field since the structure is exposed in /proc/ppc64/systemcfg and is part of the user/kernel ABI now.
The patch attached here implements the first alternative. I am still working on a patch for the second alternative, which I will send upstream, but it will be a more invasive patch.
Created attachment 424806 [details]
Proper fix for time going backwards
------- Comment on attachment From paulus.com 2010-06-17 09:01 EDT-------
This is the alternative patch which fixes the problem properly by modifying the VDSO code to not use the stamp_xsec field, and instead use a new field which stores the nanoseconds as a 0.32 binary fraction in a new field in the vdso_data. This is the patch which I will be sending upstream shortly to fix the problem in the mainline Linux kernel.
The advantages of this patch are that it makes gettimeofday() and clock_gettime() slightly faster (gettimeofday() in a 64-bit process takes 32.2ns on a POWER7 with the patch compared to 37.4ns without) and it fixes the main problem without losing accuracy. The disadvantage of this patch compared to the other one is that it is larger and more invasive, so may present more risk, though I have checked and tested it thoroughly.
Note that you should not apply both patches; apply one or the other but not both.
One point in the patch is worth mentioning -- in testing I found instances where update_vsyscall() got called with xtime.tv_nsec = 1000000000 or 1000000001. That may indicate a bug in generic code.
posted to rh-kernel mailing list http://post-office.corp.redhat.com/archives/rhkernel-list/2010-June/msg01297.html ------- Comment From tpnoonan.com 2010-06-25 11:22 EDT------- per rh planned for ss7 Patch(es) available on kernel-2.6.32-42.el6 tested on ibm-js22-vios-01-lp1.rhts.eng.bos.redhat.com with kernel version 2.6.32-42.el6.ppc64, ran /kernel/power-management/clock_gettime and /kernel/power-management/gettimeofday , backward does not happen. set it as verified. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |