Created attachment 498031 [details] fix for incorrect type conversion in vgetns() Description of problem: Vasily Averin from Parallels OpenVZ linux kernel team discovered reason of following problem: Time (output of /bin/date) shifts inside containers (or chroot environments) if they use new glibc version with vDSO support. Every 150 seconds time is shifted forwards by 300 seconds, then it returns back to normal for 150 seconds, then shifts forward and back again and again. Version-Release number of selected component (if applicable): this issue affects nodes that uses HPET timer and x86_64 RHEL5.6 kernels, for example 2.6.18-238.1.1.el5 How reproducible: 10/10 Steps to Reproduce: 1. take hardware node that uses HPET timer 2. install RHEL5.6 x86_64 kernel 3. prepare chroot environment that uses glibc with vDSO support, for example like it was described in https://bugzilla.redhat.com/show_bug.cgi?id=673616#c3 4. switch to chroot, monitor output of data command. Actual results: time is shifted forwards by 300 seconds, then it returns back to normal, then it shifts and returns back again, again, and again. Expected results: time is correct Additional info: Investigation shows that /bin/date uses clock_gettime GLIBC call that returns incorrect value. gdb tracing helps to found overflow in vgetns() function, and finally we have found that issue caused by incorrect calculations of nanoseconds correction for vdso version of clock_gettime() / gettimeofday() Please take a look at: arch/x86_64/vdso/vclock_gettime.c ... static inline long vgetns(void) { cycles_t vread; long cycle_last, mult; if (vdso_vxtime->mode == VXTIME_HPET) { vread = vread_hpet(); cycle_last = vdso_vxtime->last; <<<< incorrect type conversion mult = vdso_vxtime->quot; } [some extra strings skipped] return ((vread - cycle_last) * mult) >> <<<<< overflow here NS_SCALE; } "last" field of vdso_vxtime have "signed int" type. Casting to "long" type causes sign-extend. As result (vread - cycle_last) overflows in case of "negative" values of counter and typically is near of 300 seconds period of HPET timer. Proposed fix: to cast vdso_vxtime->last to "unsigned int" before cycle_last initialization, please see attached patch
>"last" field of vdso_vxtime have "signed int" type. Casting to "long" type >causes sign-extend. As result (vread - cycle_last) overflows in case of >"negative" values of counter and typically is near of 300 seconds period of >HPET timer. > >Proposed fix: to cast vdso_vxtime->last to "unsigned int" before cycle_last >initialization, please see attached patch Hmmm ... I wonder if (u32) is correct? Should that really be (unsigned long long) to match the vread size? P.
FWIW ... upstream has the equivalent "last" as a cycles_t (which is an unsigned long long). Could you try casting to a cycles_t and retest? Thanks, P.
Dear Prarit, I do not understand the goal of proposed experiments, and I believe I was not well explained the essence of the problem. Therefore let me describe this problem again in more details. Please take look at sources, to arch/x86_64/kernel/time.c HPET timer works in 32-bit mode, and vread_hpet() returns 32-bit values. On each timer interrupt value of HPET timer saved to 32-bit variable vxtime.last In vgetns() we calculate nanoseconds correction of time, by using difference of current value of HPET timer taken via vread_hpet() and value of HPET timer captured at the time of the last timer interrupt and saved to vxtime.last variable. The problem is that this calculation get us incorrect results when value saved in vxtime.last become 32-bit "negative" values, i.e. when 32 bit is set to 1. For example, let's image that on last timer interrupt we got from HPET timer value 0x80000000 and saved it to vxtime.last Now, we're entered to vgetns() vread = vread_hpet(); << let's image we got here 0x0000000080000001 cycle_last = vdso_vxtime->last; << here we have type conversion with sign extension and instead of original 0x80000000 we'll assign 0xFFFFFFFF80000000 to cycle_last And as result (vread - cycle_last) will get us overflow, 0x100000001 instead of correct 0x000000001 To fix this issue we need to prevent sign extension. We can reach this goal by using clear type conversion from signed int to unsigned int. However I believe cast from "signed int" to "unsigned long long" does not help us to avoid sign extension $ cat test.c #include <stdio.h> int main() { int i_value = 0x80000000; printf("The integer is: 0x%x\n", i_value); printf("The unsigned long long is: 0x%llx\n", (unsigned long long)i_value); } $ gcc test.c $ ./a.out The integer is: 0x80000000 The unsigned long long is: 0xffffffff80000000 I believe my patch is best solution of this problem. Theoretically we can change type of last field in vxtime_data structure from int to some other, however this change will affect some other code in arch/x86_64/kernel/time.c. btw. I've noticed similar problem in do_gettimeoffset_hpet() too unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last; and nobody didn't noticed it before because of following spike min(counter,hpet_tick_real)
I've found that my patch is not fully correct. It gives incorrect result in case of timer overflow: i.e if vxtime.last = 0xffffffff and new counters value is 0. in this case we get 0x0000000100000001. To resolve this issue correctly we need either: - use 32-bit variables to calculate the difference (vread - cycle_last), like it is done in monotonic_clock() or do_timer_account_lost_ticks() in your kernel - or to use any variables but then use "& mask", like it is done in upstreams vgetns(): v = (vread() - gtod->clock.cycle_last) & gtod->clock.mask;
(In reply to comment #3) > btw. I've noticed similar problem in do_gettimeoffset_hpet() too > > unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last; I was wrong, because hpet_readl return 32-bit values. So this issue affects vdso code only.
Created attachment 498738 [details] diff-vdso-hpet.20110513 32bit mask for HPET timer values allows to avoid an overflow in calculation of nanosecond correction for vdso version of clock-gettime().
(In reply to comment #6) > Created attachment 498738 [details] > diff-vdso-hpet.20110513 > > 32bit mask for HPET timer values allows to avoid an overflow in calculation of > nanosecond correction for vdso version of clock-gettime(). Hmm ... Vasily, let me think about this over the weekend. The patch looks good but I'd like to think about it before I submit it... Thanks :) It's always great when customers fix their own bugs :) P.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available in kernel-2.6.18-282.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0150.html