Bug 703505
| Summary: | 300 seconds time shift in vdso version of clock_gettime() | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Vasily Averin <vvs> | ||||||
| Component: | kernel | Assignee: | Prarit Bhargava <prarit> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Xiaochen Wang <xiawang> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 5.6 | CC: | avagin, dZhu, khorenko, kzhang, prarit, xiawang | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 975640 (view as bug list) | Environment: | |||||||
| Last Closed: | 2012-02-21 03:47:47 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 975640 | ||||||||
| Attachments: | 
 | ||||||||
| 
        
          Description
        
        
          Vasily Averin
        
        
        
        
        
          2011-05-10 14:41:58 UTC
        
       >"last" field of vdso_vxtime have "signed int" type. Casting to "long" type >causes sign-extend. As result (vread - cycle_last) overflows in case of >"negative" values of counter and typically is near of 300 seconds period of >HPET timer. > >Proposed fix: to cast vdso_vxtime->last to "unsigned int" before cycle_last >initialization, please see attached patch Hmmm ... I wonder if (u32) is correct? Should that really be (unsigned long long) to match the vread size? P. FWIW ... upstream has the equivalent "last" as a cycles_t (which is an unsigned long long). Could you try casting to a cycles_t and retest? Thanks, P. Dear Prarit,
I do not understand the goal of proposed experiments, and I believe I was not well explained the essence of the problem.
Therefore let me describe this problem again in more details.
Please take look at sources, to arch/x86_64/kernel/time.c
HPET timer works in 32-bit mode, and vread_hpet() returns 32-bit values.
On each timer interrupt value of HPET timer saved to 32-bit variable vxtime.last
In vgetns() we calculate nanoseconds correction of time, by using difference of current value of HPET timer taken via vread_hpet() and value of HPET timer captured at the time of the last timer interrupt and saved to vxtime.last variable.
The problem is that this calculation get us incorrect results when value saved in vxtime.last become 32-bit "negative" values, i.e. when 32 bit is set to 1. 
For example, let's image that on last timer interrupt we got from HPET timer value  0x80000000 and saved it to vxtime.last
Now, we're entered to vgetns()
 vread = vread_hpet(); 
      << let's image we got here 0x0000000080000001
 cycle_last = vdso_vxtime->last;  
      << here we have type conversion with sign extension and instead of
         original 0x80000000 we'll assign 0xFFFFFFFF80000000 to cycle_last
And as result (vread - cycle_last) will get us overflow, 0x100000001 instead of correct 0x000000001
To fix this issue we need to prevent sign extension.
We can reach this goal by using clear type conversion from signed int to unsigned int. 
However I believe cast from "signed int" to "unsigned long long" does not help us to avoid sign extension
$ cat test.c 
#include <stdio.h>
 
int main() 
{
    int i_value   = 0x80000000;
    printf("The integer is: 0x%x\n", i_value);
    printf("The unsigned long long is: 0x%llx\n", (unsigned long long)i_value);
}
$ gcc test.c 
$ ./a.out 
The integer is: 0x80000000
The unsigned long long is: 0xffffffff80000000
I believe my patch is best solution of this problem.
Theoretically we can change type of last field in vxtime_data structure from int to some other, however this change will affect some other code in arch/x86_64/kernel/time.c.
btw. I've noticed similar problem in do_gettimeoffset_hpet() too
 
 unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
and nobody didn't noticed it before because of following spike 
 min(counter,hpet_tick_real)
I've found that my patch is not fully correct. It gives incorrect result in case of timer overflow: i.e if vxtime.last = 0xffffffff and new counters value is 0. in this case we get 0x0000000100000001. To resolve this issue correctly we need either: - use 32-bit variables to calculate the difference (vread - cycle_last), like it is done in monotonic_clock() or do_timer_account_lost_ticks() in your kernel - or to use any variables but then use "& mask", like it is done in upstreams vgetns(): v = (vread() - gtod->clock.cycle_last) & gtod->clock.mask; (In reply to comment #3) > btw. I've noticed similar problem in do_gettimeoffset_hpet() too > > unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last; I was wrong, because hpet_readl return 32-bit values. So this issue affects vdso code only. Created attachment 498738 [details]
diff-vdso-hpet.20110513
32bit mask for HPET timer values allows to avoid an overflow in calculation of nanosecond correction for vdso version of clock-gettime().
(In reply to comment #6) > Created attachment 498738 [details] > diff-vdso-hpet.20110513 > > 32bit mask for HPET timer values allows to avoid an overflow in calculation of > nanosecond correction for vdso version of clock-gettime(). Hmm ... Vasily, let me think about this over the weekend. The patch looks good but I'd like to think about it before I submit it... Thanks :) It's always great when customers fix their own bugs :) P. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available in kernel-2.6.18-282.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0150.html |