Bug 640608 - Wrong value after counting two Time values
Summary: Wrong value after counting two Time values
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Prarit Bhargava
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-06 13:08 UTC by Michal Fojtik
Modified: 2013-11-04 16:51 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-04 16:51:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
strace output (7.57 KB, text/plain)
2010-10-07 09:01 UTC, Michal Fojtik
no flags Details

Description Michal Fojtik 2010-10-06 13:08:03 UTC
Description of problem:

Counting with time is broken somehow on XEN guest in Ruby, but it could be a system issue.

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Server release 5.5 (Tikanga)
ruby 1.8.7 (2009-12-24 patchlevel 248) [x86_64-linux]

How reproducible:

Run a simple Ruby script on Xen guest with RHEL5

Steps to Reproduce:

1. Install Ruby
2. Launch this Ruby script:

current_time = Time.now
sleep(1)
actual_time = Time.now
# Time should be >1
puts actual_time-current_time

3.
  
Actual results:

0.994902

Expected results:

>1

Comment 1 Jim Meyering 2010-10-07 08:30:58 UTC
Thanks for the report.
There is definitely a bug in there somewhere.
I wonder if this is ruby-specific.  I suspect Xen, not ruby.

What does this print for you:

  perl '-MTime::HiRes "time"' -le '$t=time; sleep 1; print time - $t'

If that shows the same problem, then please reassign the bug.
If it does not, please run the following and attach the resulting "log" file:

  strace -o log ruby -e 't=Time.now; sleep 1; puts Time.now - t'

Comment 2 Michal Fojtik 2010-10-07 08:57:01 UTC
(In reply to comment #1)
> Thanks for the report.
> There is definitely a bug in there somewhere.
> I wonder if this is ruby-specific.  I suspect Xen, not ruby.

That's right, but weird is, that Perl or C doesn't suffer with this kind of problem.

> What does this print for you:
> 
>   perl '-MTime::HiRes "time"' -le '$t=time; sleep 1; print time - $t'

This one works fine:

perl '-MTime::HiRes "time"' -le '$t=time; sleep 1; print time - $t'
1.00364589691162

> 
> If that shows the same problem, then please reassign the bug.
> If it does not, please run the following and attach the resulting "log" file:
> 
>   strace -o log ruby -e 't=Time.now; sleep 1; puts Time.now - t'

Log file attached.

Comment 3 Michal Fojtik 2010-10-07 09:01:51 UTC
Created attachment 452057 [details]
strace output

Comment 4 Jim Meyering 2010-10-07 10:24:23 UTC
Thanks for the info.
Note that Perl does not use gettimeofday, but rather the superior "nanosleep" function.

Contrast your output:

  gettimeofday({1286441633, 652023}, NULL) = 0
  time(NULL)                              = 1286441633
  select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
  time(NULL)                              = 1286441634
  gettimeofday({1286441634, 651329}, NULL) = 0

with what I see on RHEL6:

  rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
  select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)

It may be that RHEL5's gettimeofday isn't working quite right.

This discussion suggests that your problem could be due to how your virtual system is configured:

   http://thread.gmane.org/gmane.comp.emulators.xen.user/55086/focus=55117

Here is a related bug.  Make sure that you have that fix:

   https://bugzilla.redhat.com/show_bug.cgi?id=427588

Comment 5 Michal Fojtik 2010-10-07 12:30:43 UTC
(In reply to comment #4)
> Thanks for the info.
> Note that Perl does not use gettimeofday, but rather the superior "nanosleep"
> function.
> 
> Contrast your output:
> 
>   gettimeofday({1286441633, 652023}, NULL) = 0
>   time(NULL)                              = 1286441633
>   select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>   time(NULL)                              = 1286441634
>   gettimeofday({1286441634, 651329}, NULL) = 0
> 
> with what I see on RHEL6:
> 
>   rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
>   select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
> 
> It may be that RHEL5's gettimeofday isn't working quite right.
> 
> This discussion suggests that your problem could be due to how your virtual
> system is configured:
> 
>    http://thread.gmane.org/gmane.comp.emulators.xen.user/55086/focus=55117
> 
> Here is a related bug.  Make sure that you have that fix:
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=427588

Hi, we tried several combinations of kernel parameters and also kernel update but nothing works :-(
So I thought this bug is not Ruby related as you said before but XEN. I'm changing component to XEN then.

Comment 6 Andrew Jones 2010-10-07 13:32:11 UTC
Please describe the host and guest completely; kernel revisions for both, type of guest (pv|fv), x86_64 or i386. Also take a look at dmesg for both host and guest and 'xm dmesg' on the host. If there's anything interesting, please attach it.

Thanks,
Drew

Comment 7 Vadim Grinco 2010-10-07 13:56:51 UTC
Guest:
 * PV XEN guest
 * 2.6.18-194.17.1.el5xen
 * x86_64
 * 1 VCPU

Host: 
 * RHEL5.5
 * 2.6.18-194.11.3.el5xen
 * x86_64
 * 8 CPU

Do you need anything else?

Comment 8 Andrew Jones 2010-10-08 10:14:12 UTC
I captured some data for a variety of systems.

-------------------------------------------------------------------------------
config                mean     standard          max          min         diff
                              deviation                                 from 1
-------------------------------------------------------------------------------
dom0              0.997042     0.000655     1.005585     0.996048     0.002958
f12_bm            1.000866     0.000346     1.003212     1.000062     0.000866
rhel55-64pv       0.997907     0.000720     1.005739     0.993254     0.002093
rhel6-32pv        1.001077     0.000053     1.001859     1.000138     0.001077
rhel6-64fv        1.001599     0.000296     1.002114     1.001069     0.001599
rhel6-64pv        1.001071     0.000058     1.001999     1.000136     0.001071

The f12_bm is my f12 bare-metal notebook. dom0 is a latest rhel 5.6 candidate and was the dom0 used for all the other cases, which are xen guests.  We see that bare-metal has the closest mean to 1, but it's standard deviation isn't the best. This indicates that we have error here even without virtualization. With respect to virtualization we see that the rhel5 systems (dom0 and rhel55-64pv) are generally less than 1, as was observed when this bug was opened, but not always. They also have the worst standard deviation and difference of the mean from 1. However, the rhel6 guests are all generally higher and have differences closer to that of the bare-metal, but with even less standard deviation.

What all this tells us is that if there's a problem, and not just some expected error, then it's at least in the kernel, not the hypervisor, or it could be even higher up the stack.  I'll move this bug over to the kernel for now to get comments from them.

Comment 9 Andrew Jones 2010-10-08 10:18:44 UTC
Note, I also quickly scanned the data by eye to see if it looked the results were increasing or decreasing -- indicating some cumulative error on time accounting. I didn't see any obvious patterns.

Comment 10 Prarit Bhargava 2013-11-04 16:51:44 UTC
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.
If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business justification
in order to re-open it.


Note You need to log in before you can comment on or make changes to this bug.