Bug 1978378 - high cpu usage in pthread_cond_timedwait when using very small timeout
Summary: high cpu usage in pthread_cond_timedwait when using very small timeout
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: glibc
Version: 8.4
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: beta
: ---
Assignee: glibc team
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-01 16:44 UTC by Paulo Andrade
Modified: 2023-07-18 14:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-09 13:51:42 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1977468 1 unspecified CLOSED Question about timerslack_ns 2021-07-09 14:29:45 UTC

Description Paulo Andrade 2021-07-01 16:44:00 UTC
Sample reproducer:

# yum install java-1.8.0-openjdk-devel
# cat Spin.java
public class Spin
{
    public static void main(String[] args)
    {
        while (true)
            {
                //java.util.concurrent.locks.LockSupport.parkNanos(50000);
                java.util.concurrent.locks.LockSupport.parkNanos(1);
            }
    }
}

# javac Spin.java
# java Spin & top

  It will show the java process at 100% cpu usage in rhel8, and 20-40% on rhel7.

  If using 50000 (50us) as argument, rhel7 and rhel8 will show roughly the same cpu usage.

  50000 is also default timer_slack_ns, but playing with /proc/$pid/timerslack_ns does not make any change in rhel8.

  This is somewhat of a random guess, but the issue appears to be this chunk in glibc-2.28, pthread_cond_wait.c:__pthread_cond_wait_common()
...
		  struct timespec rt;
		  if (__clock_gettime (CLOCK_MONOTONIC, &rt) != 0)
		    __libc_fatal ("clock_gettime does not support "
				  "CLOCK_MONOTONIC\n");
		  /* Convert the absolute timeout value to a relative
		     timeout.  */
		  rt.tv_sec = abstime->tv_sec - rt.tv_sec;
		  rt.tv_nsec = abstime->tv_nsec - rt.tv_nsec;
		  if (rt.tv_nsec < 0)
		    {
		      rt.tv_nsec += 1000000000;
		      --rt.tv_sec;
		    }
		  /* Did we already time out?  */
		  if (__glibc_unlikely (rt.tv_sec < 0))
		    err = ETIMEDOUT;
...

that may be preventing a context switch to the kernel, as a 1 nanosecond timeout should have already timed out when reaching that point, and it will return due to the ETIMEDOUT.

In newer glibc that chunk does not exist.
The problem does not happen in Fedora 33.

  If adding a counter, for example:
public class Spin
{
    public static void main(String[] args)
    {
        long n = 1000000;
        while (n > 0)
        {
            java.util.concurrent.locks.LockSupport.parkNanos(1);
            n = n - 1;
        }
    }
}

it runs apparently like 10 times faster in rhel8, for example, on similar systems:

rhel7: /bin/time java Spin
1.22user 2.47system 0:55.70elapsed 6%CPU (0avgtext+0avgdata 22848maxresident)k
0inputs+88outputs (0major+5858minor)pagefaults 0swaps

rhel8: /bin/time java Spin
0.75user 0.26system 0:01.01elapsed 100%CPU (0avgtext+0avgdata 26436maxresident)k
0inputs+64outputs (0major+2709minor)pagefaults 0swaps

Comment 1 Florian Weimer 2021-07-02 13:29:54 UTC
Andrew, is this a valid use case of java.util.concurrent.locks.LockSupport? Is parking with a timeout supposed to de-schedule the current thread?

For the POSIX condition variable, I believe both old and new behavior are okay. But maybe Java has different requirements. Maybe it's time to dust off your old patches to use futex directly?

Comment 2 Paulo Andrade 2021-07-02 14:25:23 UTC
  Even if it is invalid usage, I believe there might be java code around that
relies on any values smaller than default timerslack_ns (50us) to delay at
least the 50us, and allow other threads to run concurrently.
  It appears to be the issue of the support case. Solution in java code would
be to ensure to call parkNanos with at least 50000 as argument, to match default
timerslack_ns.

Comment 3 Andrew Haley 2021-07-03 08:45:37 UTC
(In reply to Florian Weimer from comment #1)
> Andrew, is this a valid use case of java.util.concurrent.locks.LockSupport?
> Is parking with a timeout supposed to de-schedule the current thread?
> 
> For the POSIX condition variable, I believe both old and new behavior are
> okay. But maybe Java has different requirements. Maybe it's time to dust off
> your old patches to use futex directly?

There's no guarantee that parkNanos() will do anything at all. It might return immediately.
Returning immediately without blocking in the kernel is correct when the timer has
already timed out.

Comment 4 Florian Weimer 2021-07-03 11:13:13 UTC
(In reply to Andrew Haley from comment #3)
> There's no guarantee that parkNanos() will do anything at all. It might
> return immediately.
>
> Returning immediately without blocking in the kernel is correct when the
> timer has already timed out.

Okay, then the old and new behavior are equally correct for OpenJDK, too, not just as far as POSIX is concerned.

Paulo, I assume this means that there is nothing to fix on the system side, and the application has to be changed to use a synchronization algorithm with more predictable performance.

Comment 5 Andrew Haley 2021-07-03 13:09:40 UTC
(In reply to Florian Weimer from comment #4)
>
> Paulo, I assume this means that there is nothing to fix on the system side,
> and the application has to be changed to use a synchronization algorithm
> with more predictable performance.

Absolutely. If you need to yield, call Thread.yield(), which uses sched_yield().

Comment 6 Carlos O'Donell 2021-07-09 13:51:42 UTC
(In reply to Andrew Haley from comment #5)
> (In reply to Florian Weimer from comment #4)
> >
> > Paulo, I assume this means that there is nothing to fix on the system side,
> > and the application has to be changed to use a synchronization algorithm
> > with more predictable performance.
> 
> Absolutely. If you need to yield, call Thread.yield(), which uses
> sched_yield().

Andrew, Thank you for your comments and direction.

From our perspective this is a CLOSED NOTABUG, and I'm marking it as such.

Comment 7 Paulo Andrade 2021-07-09 14:24:08 UTC
Thanks for the comments. Still waiting from feedback from the user,
but the final recommendation was to update java code, to not expect
a 1 nano park to sleep for at least 50us, and not be surprised when
the system does (or attempts to do) what was asked.

  Since it appears the issue was to just put a logger thread to sleep
and poll from time to time, we suggested the pseudo code:

my park(delay)
{
  time1 = gettime();
  Thread.yeld();
  time2 = gettime();
  if (time2 - time1 < delay)
    parkNanos(delay - (time2 - time1));
  time3 = gettime();
  if (time3 - time1 < delay) 
    Thread.yeld();
}

  Other than that, likely need a very different approach, probably
using kernel-rt and not java code for the delay.


Note You need to log in before you can comment on or make changes to this bug.