Bug 116767

Summary: usleep() call not working correctly - very high granularity compared to 2.4.18
Product: Red Hat Enterprise Linux 3 Reporter: Suman Puthana <suman>
Component: kernelAssignee: Ernie Petrides <petrides>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: petrides, riel, suman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-15 03:20:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Suman Puthana 2004-02-25 00:46:00 UTC
Description of problem:

In 2.4.18, the usleep() call worked as it should, atleast with the 
Redhat 8.0 distribution within a 5-10% error. For example, a usleep() 
request for 33 ms would return in 33-35 ms which is good enough for 
our application and I guess for most applications.

But in the 2.4.21 enterprise 3 kernel,

    for any value between ( 31-40 ms ), usleep sleeps for 50 ms
    for any value between ( 41-50 ms ), usleep sleeps for 60 ms
    for any value between ( 51-60 ms ), usleep sleeps for 70 ms

and so on...You can see the trend which is kind of strange. I see 
that some kind of granularity has been introduced into the newer 
kernels. I'm guessing that something in the stock kernel is causing 
this, but how can an enterprise level kernel do this?

Version-Release number of selected component (if applicable):


How reproducible: Everytime


Steps to Reproduce:
1. Compile the following program and run it giving the desired time 
as input.
 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Here's the program I used to get these results :

#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <sys/time.h>

#define MILLISEC_DIFF(new, old, diff)   
{                              \
        (diff) = (((new).tv_sec  - (old).tv_sec)  * 1000L ) 
+           \
                 (((new).tv_usec - (old).tv_usec) / 
1000L );             \
}

int main(int args, char ** argv) {

        int time = 30;
        struct timeval time1,time2;
        long elapsed;

        if(args < 2) {
                printf("USAGE: sleep <time in milliseconds>\n");
                exit(0);
        }

        time = atoi(argv[1]);

        while(1) {

                gettimeofday(&time1, NULL);
                usleep(time*1000);
                gettimeofday(&time2, NULL);

                MILLISEC_DIFF(time2, time1, elapsed);

                printf("time 1 = <%ld:%ld>, time 2 = <%ld:%
ld> ,elapsed =
%ld\n\n",time1.tv_sec, time1.tv_usec, time2.tv_sec, time2.tv_usec, 
elapsed);
        }
}

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

  
Actual results:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Here's the sample output of the program for 30 ms sleep request:

time 1 = <1069136424:911198>, time 2 = <1069136424:951193> ,elapsed = 
39

time 1 = <1069136424:951198>, time 2 = <1069136424:991188> ,elapsed = 
39

time 1 = <1069136424:991191>, time 2 = <1069136425:31220> ,elapsed = 
41

time 1 = <1069136425:31270>, time 2 = <1069136425:71235> ,elapsed = 39

time 1 = <1069136425:71249>, time 2 = <1069136425:111192> ,elapsed = 
39

time 1 = <1069136425:111200>, time 2 = <1069136425:151190> ,elapsed = 
39

time 1 = <1069136425:151193>, time 2 = <1069136425:191195> ,elapsed = 
40

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


 
Expected results: That elapsed time should be as close to 33 ms as 
possible.


Additional info:

Comment 2 Mark DeWandel 2004-03-17 17:41:04 UTC
The usleep() function turns into sys_nanosleep() in the kernel
which in turn calls schedule_timeout().  The implementation of
neither of these functions has really changed between 2.4.18 and
2.4.21.  The granularity of any sleep from user space is that of
a clock tick which is 10 ms on most machines (i.e., HZ=100).
This is the granularity that you are seeing from the reproducer
you included.  Are you sure that you are running this test on
the same architecture between 2.4.18 and 2.4.21?  For example,
if you run your test program on an Itanium you will see much
finer granularity than on x86.  This is because the value for
HZ is 1024 on ia64.

I am marking this as NEEDSINFO pending your reply but plan to
close as NOTABUG if I don't see any updates.

Comment 3 Suman Puthana 2004-03-17 18:13:40 UTC
I'm definitely sure that I am testing the Redhat 8.0 kernel and the 
RHEL 3.0 kernel on the same architecture. I have a full RHEL3 install 
and with the kernel which comes with this I see this issue. When I 
boot of the Redhat 2.4.18-14 kernel which comes with the Redhat 8 
install on the same system I do not see this problem and it is pretty 
clear when I run this same program on this kernel. I believe the HZ 
was changed to 512 for redhat 8.0 and was reverted back for some 
reason in future releases?

This problem exists on the kernel which came out with the Redhat 9.0 
kernel also. I think it is important that the kernel supports a 
granularity of 1 ms on 686 kernels - especially for an Enterprise 
kernel and especially when it was supported in one of the older 
kernels.

Also ,based on the granularity shouldn't the sleep be 40 ms for 
request between (31-40) instead of 50 ms, 50 ms for request between 
(41-50) instead of 60 ms etc.?

There was a brief description about this issue in the taroon-list 
last december and here is what Arjan had to say:

https://www.redhat.com/archives/taroon-list/2003-
December/msg00205.html

Can we add this high frequency timer in the next update, if possible? 
I'm pretty sure it would be very useful to a lot of users and opens 
linux up for more kinds of time-dependent applications.

This is what I get when I run the program with the redhat 8.0 kernel 
on the same system as the results I gave you above. The difference is 
obvious.

**************************************************************
./sleep 33
time 1 = <1079547099:550832>, time 2 = <1079547099:584337> ,elapsed = 
33

time 1 = <1079547099:584662>, time 2 = <1079547099:621021> ,elapsed = 
36

time 1 = <1079547099:621206>, time 2 = <1079547099:654634> ,elapsed = 
33

time 1 = <1079547099:654651>, time 2 = <1079547099:689790> ,elapsed = 
35

time 1 = <1079547099:689830>, time 2 = <1079547099:724938> ,elapsed = 
35

time 1 = <1079547099:724979>, time 2 = <1079547099:761638> ,elapsed = 
36

time 1 = <1079547099:761677>, time 2 = <1079547099:795236> ,elapsed = 
33

********************************************************

Here's the output of ver_linux on this system I ran both the tests, 
if it helps:

[root@petey scripts]# ./ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux petey 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 
i686 i386
GNU/Linux

Gnu C                  3.2.3
Gnu make               3.79.1
util-linux             2.11y
mount                  2.11y
modutils               2.4.25
e2fsprogs              1.32
jfsutils               1.1.2
reiserfsprogs          3.6.4
pcmcia-cs              3.1.31
PPP                    2.4.1
isdn4k-utils           3.1pre4
Linux C Library        2.3.2
Dynamic linker (ldd)   2.3.2
Procps                 2.0.13
Net-tools              1.60
Kbd                    1.08
Sh-utils               4.5.3
Modules Loaded         nfs lockd sunrpc ide-cd cdrom sg parport_pc lp
parport autofs e1000 floppy microcode keybdev mousedev hid input usb-
uhci
usbcore ext3 jbd qla2300 sd_mod scsi_mod





Comment 4 Mark DeWandel 2004-04-20 16:57:53 UTC
Red Hat will only change the value of HZ if it changes upstream first.
This means that it cannot change for RHEL3.  For x86, HZ was 100,
then briefly was changed to 512 (Red Hat disto only), and then was
changed back to 100.  HZ is now 1000 in 2.6 and consequently is
expected to be 1000 in RHEL4.

Comment 6 Ernie Petrides 2005-06-15 03:20:43 UTC
The usleep() library routine is based on the nanosleep() system call.

Here's part of the explanation from the nanosleep(2) man page:

  "The current implementation of nanosleep is based on the  normal  kernel
   timer  mechanism,  which  has  a  resolution  of  1/HZ s (i.e, 10 ms on
   Linux/i386 and  1 ms  on  Linux/Alpha).   Therefore,  nanosleep  pauses
   always for at least the specified time, however it can take up to 10 ms
   longer than specified until the process becomes runnable again."

However, for real-time processes (with scheduling policy != SCHED_NORMAL),
nanosleep() syscalls for less than 2 ms will loop inside the kernel with a
higher resolution.

By the way, gettimeofday() also has a 10-ms resolution.