Bug 640608

Summary: Wrong value after counting two Time values
Product: Red Hat Enterprise Linux 5 Reporter: Michal Fojtik <mfojtik>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.5CC: drjones, jwilson, vgrinco, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-04 16:51:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace output none

Description Michal Fojtik 2010-10-06 13:08:03 UTC
Description of problem:

Counting with time is broken somehow on XEN guest in Ruby, but it could be a system issue.

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Server release 5.5 (Tikanga)
ruby 1.8.7 (2009-12-24 patchlevel 248) [x86_64-linux]

How reproducible:

Run a simple Ruby script on Xen guest with RHEL5

Steps to Reproduce:

1. Install Ruby
2. Launch this Ruby script:

current_time = Time.now
sleep(1)
actual_time = Time.now
# Time should be >1
puts actual_time-current_time

3.
  
Actual results:

0.994902

Expected results:

>1

Comment 1 Jim Meyering 2010-10-07 08:30:58 UTC
Thanks for the report.
There is definitely a bug in there somewhere.
I wonder if this is ruby-specific.  I suspect Xen, not ruby.

What does this print for you:

  perl '-MTime::HiRes "time"' -le '$t=time; sleep 1; print time - $t'

If that shows the same problem, then please reassign the bug.
If it does not, please run the following and attach the resulting "log" file:

  strace -o log ruby -e 't=Time.now; sleep 1; puts Time.now - t'

Comment 2 Michal Fojtik 2010-10-07 08:57:01 UTC
(In reply to comment #1)
> Thanks for the report.
> There is definitely a bug in there somewhere.
> I wonder if this is ruby-specific.  I suspect Xen, not ruby.

That's right, but weird is, that Perl or C doesn't suffer with this kind of problem.

> What does this print for you:
> 
>   perl '-MTime::HiRes "time"' -le '$t=time; sleep 1; print time - $t'

This one works fine:

perl '-MTime::HiRes "time"' -le '$t=time; sleep 1; print time - $t'
1.00364589691162

> 
> If that shows the same problem, then please reassign the bug.
> If it does not, please run the following and attach the resulting "log" file:
> 
>   strace -o log ruby -e 't=Time.now; sleep 1; puts Time.now - t'

Log file attached.

Comment 3 Michal Fojtik 2010-10-07 09:01:51 UTC
Created attachment 452057 [details]
strace output

Comment 4 Jim Meyering 2010-10-07 10:24:23 UTC
Thanks for the info.
Note that Perl does not use gettimeofday, but rather the superior "nanosleep" function.

Contrast your output:

  gettimeofday({1286441633, 652023}, NULL) = 0
  time(NULL)                              = 1286441633
  select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
  time(NULL)                              = 1286441634
  gettimeofday({1286441634, 651329}, NULL) = 0

with what I see on RHEL6:

  rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
  select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)

It may be that RHEL5's gettimeofday isn't working quite right.

This discussion suggests that your problem could be due to how your virtual system is configured:

   http://thread.gmane.org/gmane.comp.emulators.xen.user/55086/focus=55117

Here is a related bug.  Make sure that you have that fix:

   https://bugzilla.redhat.com/show_bug.cgi?id=427588

Comment 5 Michal Fojtik 2010-10-07 12:30:43 UTC
(In reply to comment #4)
> Thanks for the info.
> Note that Perl does not use gettimeofday, but rather the superior "nanosleep"
> function.
> 
> Contrast your output:
> 
>   gettimeofday({1286441633, 652023}, NULL) = 0
>   time(NULL)                              = 1286441633
>   select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>   time(NULL)                              = 1286441634
>   gettimeofday({1286441634, 651329}, NULL) = 0
> 
> with what I see on RHEL6:
> 
>   rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
>   select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> 
> It may be that RHEL5's gettimeofday isn't working quite right.
> 
> This discussion suggests that your problem could be due to how your virtual
> system is configured:
> 
>    http://thread.gmane.org/gmane.comp.emulators.xen.user/55086/focus=55117
> 
> Here is a related bug.  Make sure that you have that fix:
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=427588

Hi, we tried several combinations of kernel parameters and also kernel update but nothing works :-(
So I thought this bug is not Ruby related as you said before but XEN. I'm changing component to XEN then.

Comment 6 Andrew Jones 2010-10-07 13:32:11 UTC
Please describe the host and guest completely; kernel revisions for both, type of guest (pv|fv), x86_64 or i386. Also take a look at dmesg for both host and guest and 'xm dmesg' on the host. If there's anything interesting, please attach it.

Thanks,
Drew

Comment 7 Vadim Grinco 2010-10-07 13:56:51 UTC
Guest:
 * PV XEN guest
 * 2.6.18-194.17.1.el5xen
 * x86_64
 * 1 VCPU

Host: 
 * RHEL5.5
 * 2.6.18-194.11.3.el5xen
 * x86_64
 * 8 CPU

Do you need anything else?

Comment 8 Andrew Jones 2010-10-08 10:14:12 UTC
I captured some data for a variety of systems.

-------------------------------------------------------------------------------
config                mean     standard          max          min         diff
                              deviation                                 from 1
-------------------------------------------------------------------------------
dom0              0.997042     0.000655     1.005585     0.996048     0.002958
f12_bm            1.000866     0.000346     1.003212     1.000062     0.000866
rhel55-64pv       0.997907     0.000720     1.005739     0.993254     0.002093
rhel6-32pv        1.001077     0.000053     1.001859     1.000138     0.001077
rhel6-64fv        1.001599     0.000296     1.002114     1.001069     0.001599
rhel6-64pv        1.001071     0.000058     1.001999     1.000136     0.001071

The f12_bm is my f12 bare-metal notebook. dom0 is a latest rhel 5.6 candidate and was the dom0 used for all the other cases, which are xen guests.  We see that bare-metal has the closest mean to 1, but it's standard deviation isn't the best. This indicates that we have error here even without virtualization. With respect to virtualization we see that the rhel5 systems (dom0 and rhel55-64pv) are generally less than 1, as was observed when this bug was opened, but not always. They also have the worst standard deviation and difference of the mean from 1. However, the rhel6 guests are all generally higher and have differences closer to that of the bare-metal, but with even less standard deviation.

What all this tells us is that if there's a problem, and not just some expected error, then it's at least in the kernel, not the hypervisor, or it could be even higher up the stack.  I'll move this bug over to the kernel for now to get comments from them.

Comment 9 Andrew Jones 2010-10-08 10:18:44 UTC
Note, I also quickly scanned the data by eye to see if it looked the results were increasing or decreasing -- indicating some cumulative error on time accounting. I didn't see any obvious patterns.

Comment 10 Prarit Bhargava 2013-11-04 16:51:44 UTC
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.
If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business justification
in order to re-open it.