Bug 139324 - A simple timer implementation cause high CPU usage periodically
Summary: A simple timer implementation cause high CPU usage periodically
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Ingo Molnar
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-11-15 07:18 UTC by Willy Gao
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-15 18:58:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sample source(c) (903 bytes, text/plain)
2004-11-15 08:31 UTC, Willy Gao
no flags Details

Description Willy Gao 2004-11-15 07:18:34 UTC
Description of problem:
We use a 10 msec non-stop timer in our program. The timer is 
implemented by pthread_cond_signal()/pthread_cond_wait()/select().
Nothing is done in the timer's callback, so the CPU usage is 
generally 0% according to top. But every 2 or 3 minutes, the CPU 
usage will rise to 10% and the state will remain 2-3 seconds. If five 
or more such programs are run at the same time, the idle time of one 
CPU can fall to 0% and the system performance will become busy for 
the mement.

following is the sample program:
#include <pthread.h>
#include <sys/time.h>

pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
unsigned int interval = 10000;
struct timeval tv;

void thread_func1(void *ptr)
{
    while (1) {
        pthread_mutex_lock(&mutex);
        pthread_cond_wait(&cond, &mutex);
        pthread_mutex_unlock(&mutex);
    }
}

void thread_func2(void *ptr)
{
    while (1) {
        pthread_mutex_lock(&mutex);
        tv.tv_sec = 0;
        tv.tv_usec = interval;
        select(0, 0, 0, 0, &tv);
        pthread_cond_signal(&cond);
        pthread_mutex_unlock(&mutex);
    }
}

int main()
{
    pthread_t thread1, thread2;
    pthread_create(&thread1, 0, (void *(*)(void *))&thread_func1, 0);
    pthread_create(&thread2, 0, (void *(*)(void *))&thread_func2, 0);
    pthread_join(thread1, 0);
    pthread_join(thread2, 0);
}

Version-Release number of selected component (if applicable):
glibc-2.3.2-95.27

How reproducible:
Everytime.

Steps to Reproduce:
1. compile the sample program
2. monitor CPU status with top, set update interval to 1sec
3. start five sample programs
4. watch the CPU usage & idle time  

Actual results:
Every 2 or 3 minutes, the CPU usage of each process will rise to 10% 
percent and idle time of one CPU will fall to 0%.
(the state will remain 2-3 seconds.)

Expected results:
The CPU usage should be 0% always.

Additional info:
Our test box:
CPU: two Intel(R) Xeon(TM) CPU 3.06GHz with HyperThread
kernel: 2.4.21-15.0.3.ELsmp
(We cannot reproduct this problem if we use non-SMP kernel.)
gcc 3.2.3 20030502 (Red Hat Linux 3.2.3-20)
glibc-2.3.2-95.27

before this issue, we have encounted another issue related. When we 
use two non-stop timers(the same implemenation) at the same time, 
both of them will become dead after a certain time(from serveral 
seconds to one minute). The issue was identified to be a OS defect, 
particular to the NPTL implementation and is fixed in RHBA-2004:143 
for AS3.

Comment 1 Willy Gao 2004-11-15 08:31:38 UTC
Created attachment 106698 [details]
sample source(c)

Comment 2 Jakub Jelinek 2004-11-15 17:13:22 UTC
I believe this is just about kernel process time accounting.
Kernel samples at 100Hz, so if you are unlucky enough (and the interval you
are using in the test program is exactly 10msec) it can be seen as busy
during several samplings even if the process most of the remaining 10msec sleeps.
You can run oprofile or some other tool to see that the program really is not
taking too much CPU.

Comment 3 Willy Gao 2005-03-08 12:08:14 UTC
Hi,
We found it can be reproduced even by using a nanosleep or select in a forever 
loop, and even the interval is not 10ms. 

#include <sys/select.h>
int main()
{
    while (1) {
        struct timeval tv;
        tv.tv_sec = 0;
        tv.tv_usec = 50000;
        select(0, 0, 0, 0, &tv);
    }
}

Since the process is idle in most of the time, opofile also shows a reasonable 
number. But actually the system becomes busy when a lot of such processes 
started, for example the GUI operation. 

And one more strange thing is that even lots of such processes are started at 
different time, all processes's CPU usage rises at the same time.

Actually it seems something related with the kernel sampling time but we can 
not see the problem on any AS2 machines (and SuSE9).

Comment 4 Ingo Molnar 2005-09-15 18:58:56 UTC
all timers (even nanosleep) have a basic granularity of 10 msecs - so purely
timer-driven workloads might be under or over-sampled by the CPU utilization
measurement code. This is standard Linux behavior.


Note You need to log in before you can comment on or make changes to this bug.