Bug 295071 - CFS scheduler not implementing sched_yield correctly
Summary: CFS scheduler not implementing sched_yield correctly
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: i686
OS: Linux
low
low
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-09-18 16:40 UTC by Robert Soliday
Modified: 2007-11-30 22:12 UTC (History)
2 users (show)

Fixed In Version: 2.6.23*
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-27 20:56:54 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Robert Soliday 2007-09-18 16:40:30 UTC
Description of problem:

The CFS scheduler does not seem to implement sched_yield correctly. If one
program loops with a sched_yield and another program prints out timing
information in a loop. You will see that if both are taskset to the same core
that the timing stats will be twice as long as when they are on different cores.
This problem was not in 2.6.21-1.3194 but showed up in 2.6.22.4-65 and continues
in the newest released kernel 2.6.22.5-76. 

Version-Release number of selected component (if applicable):

2.6.22.4-65 through 2.6.22.5-76

How reproducible:

Very

Steps to Reproduce:
compile task1
int main() {
        while (1) {
            sched_yield();
        }
        return 0;
}

and compile task2
#include <stdio.h>
#include <sys/time.h>
int main() {
    while (1) {
        int i;
        struct timeval t0,t1;
        double usec;

        gettimeofday(&t0, 0);
        for (i = 0; i < 100000000; ++i)
            ;
        gettimeofday(&t1, 0);

        usec = (t1.tv_sec * 1e6 + t1.tv_usec) - (t0.tv_sec * 1e6 + t0.tv_usec);
        printf ("%8.0f\n", usec);
    }
    return 0;
}

Then run:
"taskset -c 0 ./task1"
"taskset -c 0 ./task2"

You will see that both tasks use 50% of the CPU. 
Then kill task2 and run:
"taskset -c 1 ./task2"

Now task2 will run twice as fast verifying that it is not some anomaly with the
way top calculates CPU usage with sched_yield.
  
Actual results:
Tasks with sched_yield do not yield like they are suppose to.

Expected results:
The sched_yield task's CPU usage should go to near 0% when another task is on
the same CPU.

Additional info:

Comment 1 Chuck Ebbert 2007-09-18 17:03:25 UTC
Ingo has proposed a patch for this, but he (correctly) points out that
applications that rely on this behavior are fundamentally broken:

http://lkml.org/lkml/2007/9/14/157

Will put the patch in Fedora 7, possibly with the default changed so it emulates
the old scheduler.


Comment 2 David J. Schwartz 2007-09-19 18:47:24 UTC
POSIX says, "[t]he sched_yield() function forces the running thread to
relinquish the processor until it again becomes the head of its thread list. It
takes no arguments." CFS is perfectly implementing this behavior.

The reporter fundamentally misunderstands sched_yield. It does not block, so the
"yielding" process is always ready-to-run and is burning the CPU in a tight spin
just like the "spinning" process is.


Comment 3 Christopher Brown 2007-10-03 14:43:29 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

Given the comments #1 and #2 I'm closing this NOTABUG however please re-open if
you feel I have erred, preferably with a rebuttal of the statements above.

Cheers
Chris

Comment 4 Robert Soliday 2007-10-03 15:04:36 UTC
It looks like Linus Torvalds and Ingo Molnar worked out a solution for this
problem in future kernel releases.

http://kerneltrap.org/Linux/CFS_and_sched_yield

Comment 5 Chuck Ebbert 2007-10-03 15:55:57 UTC
That patch is queued for Fedora 7.

Comment 6 Chuck Ebbert 2007-11-27 20:56:54 UTC
In Fedora 7 kernel-2.6.23*

Not enabled by default. Activate it with:

# sysctl kernel.sched_compat_yield=1

or

# echo "1">/proc/sys/kernel/sched_compat_yield



Note You need to log in before you can comment on or make changes to this bug.