Bug 703505

Summary: 300 seconds time shift in vdso version of clock_gettime()
Product: Red Hat Enterprise Linux 5 Reporter: Vasily Averin <vvs>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Xiaochen Wang <xiawang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.6CC: avagin, dZhu, khorenko, kzhang, prarit, xiawang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 975640 (view as bug list) Environment:
Last Closed: 2012-02-21 03:47:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 975640    
Attachments:
Description Flags
fix for incorrect type conversion in vgetns()
none
diff-vdso-hpet.20110513 none

Description Vasily Averin 2011-05-10 14:41:58 UTC
Created attachment 498031 [details]
fix for incorrect type conversion in vgetns()

Description of problem:
Vasily Averin from Parallels OpenVZ linux kernel team discovered reason of following problem:

Time (output of /bin/date) shifts inside containers (or chroot environments) if they use new glibc version with vDSO support.

Every 150 seconds time is shifted forwards by 300 seconds, then it returns back to normal for 150 seconds, then shifts forward and back again and again.

Version-Release number of selected component (if applicable):
this issue affects nodes that uses HPET timer and x86_64 RHEL5.6 kernels,
for example 2.6.18-238.1.1.el5

How reproducible:
10/10

Steps to Reproduce:
1. take hardware node that uses HPET timer
2. install RHEL5.6 x86_64 kernel
3. prepare chroot environment that uses glibc with vDSO support, for example like it was described in https://bugzilla.redhat.com/show_bug.cgi?id=673616#c3
4. switch to chroot, monitor output of data command.
  
Actual results:
time is shifted forwards by 300 seconds, then it returns back to normal, then it shifts and returns back again, again, and again.

Expected results:
time is correct

Additional info:
Investigation shows that /bin/date uses clock_gettime GLIBC call that returns incorrect value. gdb tracing helps to found overflow in  vgetns() function, and finally we have found that issue caused by incorrect calculations of nanoseconds correction for vdso version of clock_gettime() / gettimeofday()

Please take a look at:
arch/x86_64/vdso/vclock_gettime.c
...
static inline long vgetns(void)
{
        cycles_t vread;
        long cycle_last, mult;

        if (vdso_vxtime->mode == VXTIME_HPET) {
                vread = vread_hpet();
                cycle_last = vdso_vxtime->last; <<<< incorrect type conversion
                mult = vdso_vxtime->quot;
        }
        [some extra strings skipped]
        return ((vread - cycle_last) * mult) >>     <<<<< overflow here
                        NS_SCALE;
}

"last" field of vdso_vxtime have "signed int" type. Casting to "long" type causes sign-extend. As result (vread - cycle_last) overflows in case of "negative" values of counter and typically is near of 300 seconds period of HPET timer.

Proposed fix: to cast vdso_vxtime->last to "unsigned int" before cycle_last initialization, please see attached patch

Comment 1 Prarit Bhargava 2011-05-11 13:10:00 UTC
>"last" field of vdso_vxtime have "signed int" type. Casting to "long" type
>causes sign-extend. As result (vread - cycle_last) overflows in case of
>"negative" values of counter and typically is near of 300 seconds period of
>HPET timer.
>
>Proposed fix: to cast vdso_vxtime->last to "unsigned int" before cycle_last
>initialization, please see attached patch

Hmmm ... I wonder if (u32) is correct?  Should that really be (unsigned long long) to match the vread size?

P.

Comment 2 Prarit Bhargava 2011-05-11 13:20:11 UTC
FWIW ... upstream has the equivalent "last" as a cycles_t (which is an unsigned long long).

Could you try casting to a cycles_t and retest?

Thanks,

P.

Comment 3 Vasily Averin 2011-05-12 21:45:38 UTC
Dear Prarit,

I do not understand the goal of proposed experiments, and I believe I was not well explained the essence of the problem.

Therefore let me describe this problem again in more details.

Please take look at sources, to arch/x86_64/kernel/time.c
HPET timer works in 32-bit mode, and vread_hpet() returns 32-bit values.
On each timer interrupt value of HPET timer saved to 32-bit variable vxtime.last
In vgetns() we calculate nanoseconds correction of time, by using difference of current value of HPET timer taken via vread_hpet() and value of HPET timer captured at the time of the last timer interrupt and saved to vxtime.last variable.

The problem is that this calculation get us incorrect results when value saved in vxtime.last become 32-bit "negative" values, i.e. when 32 bit is set to 1. 

For example, let's image that on last timer interrupt we got from HPET timer value  0x80000000 and saved it to vxtime.last

Now, we're entered to vgetns()

 vread = vread_hpet(); 
      << let's image we got here 0x0000000080000001
 cycle_last = vdso_vxtime->last;  
      << here we have type conversion with sign extension and instead of
         original 0x80000000 we'll assign 0xFFFFFFFF80000000 to cycle_last

And as result (vread - cycle_last) will get us overflow, 0x100000001 instead of correct 0x000000001

To fix this issue we need to prevent sign extension.

We can reach this goal by using clear type conversion from signed int to unsigned int. 

However I believe cast from "signed int" to "unsigned long long" does not help us to avoid sign extension

$ cat test.c 
#include <stdio.h>
 
int main() 
{
    int i_value   = 0x80000000;
    printf("The integer is: 0x%x\n", i_value);
    printf("The unsigned long long is: 0x%llx\n", (unsigned long long)i_value);
}
$ gcc test.c 
$ ./a.out 
The integer is: 0x80000000
The unsigned long long is: 0xffffffff80000000


I believe my patch is best solution of this problem.
Theoretically we can change type of last field in vxtime_data structure from int to some other, however this change will affect some other code in arch/x86_64/kernel/time.c.

btw. I've noticed similar problem in do_gettimeoffset_hpet() too
 
 unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;

and nobody didn't noticed it before because of following spike 
 min(counter,hpet_tick_real)

Comment 4 Vasily Averin 2011-05-13 06:44:20 UTC
I've found that my patch is not fully correct.
It gives incorrect result in case of timer overflow:
i.e if vxtime.last = 0xffffffff and new counters value is 0.
in this case we get 0x0000000100000001.

To resolve this issue correctly we need either:
- use 32-bit variables to calculate the difference (vread - cycle_last), like it is done in monotonic_clock() or do_timer_account_lost_ticks() in your kernel
- or to use any variables but then use "& mask", like it is done in upstreams vgetns():
   v = (vread() - gtod->clock.cycle_last) & gtod->clock.mask;

Comment 5 Vasily Averin 2011-05-13 09:16:07 UTC
(In reply to comment #3)
> btw. I've noticed similar problem in do_gettimeoffset_hpet() too
> 
>  unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
I was wrong, because hpet_readl return 32-bit values.
So this issue affects vdso code only.

Comment 6 Vasily Averin 2011-05-13 09:23:51 UTC
Created attachment 498738 [details]
diff-vdso-hpet.20110513

32bit mask for HPET timer values allows to avoid an overflow in calculation of nanosecond correction for vdso version of clock-gettime().

Comment 7 Prarit Bhargava 2011-05-13 12:26:25 UTC
(In reply to comment #6)
> Created attachment 498738 [details]
> diff-vdso-hpet.20110513
> 
> 32bit mask for HPET timer values allows to avoid an overflow in calculation of
> nanosecond correction for vdso version of clock-gettime().

Hmm ... Vasily, let me think about this over the weekend.  The patch looks good but I'd like to think about it before I submit it...

Thanks :)  It's always great when customers fix their own bugs :)

P.

Comment 8 RHEL Program Management 2011-08-04 04:13:54 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Jarod Wilson 2011-08-23 14:01:57 UTC
Patch(es) available in kernel-2.6.18-282.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 13 errata-xmlrpc 2012-02-21 03:47:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0150.html