Sample reproducer: # yum install java-1.8.0-openjdk-devel # cat Spin.java public class Spin { public static void main(String[] args) { while (true) { //java.util.concurrent.locks.LockSupport.parkNanos(50000); java.util.concurrent.locks.LockSupport.parkNanos(1); } } } # javac Spin.java # java Spin & top It will show the java process at 100% cpu usage in rhel8, and 20-40% on rhel7. If using 50000 (50us) as argument, rhel7 and rhel8 will show roughly the same cpu usage. 50000 is also default timer_slack_ns, but playing with /proc/$pid/timerslack_ns does not make any change in rhel8. This is somewhat of a random guess, but the issue appears to be this chunk in glibc-2.28, pthread_cond_wait.c:__pthread_cond_wait_common() ... struct timespec rt; if (__clock_gettime (CLOCK_MONOTONIC, &rt) != 0) __libc_fatal ("clock_gettime does not support " "CLOCK_MONOTONIC\n"); /* Convert the absolute timeout value to a relative timeout. */ rt.tv_sec = abstime->tv_sec - rt.tv_sec; rt.tv_nsec = abstime->tv_nsec - rt.tv_nsec; if (rt.tv_nsec < 0) { rt.tv_nsec += 1000000000; --rt.tv_sec; } /* Did we already time out? */ if (__glibc_unlikely (rt.tv_sec < 0)) err = ETIMEDOUT; ... that may be preventing a context switch to the kernel, as a 1 nanosecond timeout should have already timed out when reaching that point, and it will return due to the ETIMEDOUT. In newer glibc that chunk does not exist. The problem does not happen in Fedora 33. If adding a counter, for example: public class Spin { public static void main(String[] args) { long n = 1000000; while (n > 0) { java.util.concurrent.locks.LockSupport.parkNanos(1); n = n - 1; } } } it runs apparently like 10 times faster in rhel8, for example, on similar systems: rhel7: /bin/time java Spin 1.22user 2.47system 0:55.70elapsed 6%CPU (0avgtext+0avgdata 22848maxresident)k 0inputs+88outputs (0major+5858minor)pagefaults 0swaps rhel8: /bin/time java Spin 0.75user 0.26system 0:01.01elapsed 100%CPU (0avgtext+0avgdata 26436maxresident)k 0inputs+64outputs (0major+2709minor)pagefaults 0swaps
Andrew, is this a valid use case of java.util.concurrent.locks.LockSupport? Is parking with a timeout supposed to de-schedule the current thread? For the POSIX condition variable, I believe both old and new behavior are okay. But maybe Java has different requirements. Maybe it's time to dust off your old patches to use futex directly?
Even if it is invalid usage, I believe there might be java code around that relies on any values smaller than default timerslack_ns (50us) to delay at least the 50us, and allow other threads to run concurrently. It appears to be the issue of the support case. Solution in java code would be to ensure to call parkNanos with at least 50000 as argument, to match default timerslack_ns.
(In reply to Florian Weimer from comment #1) > Andrew, is this a valid use case of java.util.concurrent.locks.LockSupport? > Is parking with a timeout supposed to de-schedule the current thread? > > For the POSIX condition variable, I believe both old and new behavior are > okay. But maybe Java has different requirements. Maybe it's time to dust off > your old patches to use futex directly? There's no guarantee that parkNanos() will do anything at all. It might return immediately. Returning immediately without blocking in the kernel is correct when the timer has already timed out.
(In reply to Andrew Haley from comment #3) > There's no guarantee that parkNanos() will do anything at all. It might > return immediately. > > Returning immediately without blocking in the kernel is correct when the > timer has already timed out. Okay, then the old and new behavior are equally correct for OpenJDK, too, not just as far as POSIX is concerned. Paulo, I assume this means that there is nothing to fix on the system side, and the application has to be changed to use a synchronization algorithm with more predictable performance.
(In reply to Florian Weimer from comment #4) > > Paulo, I assume this means that there is nothing to fix on the system side, > and the application has to be changed to use a synchronization algorithm > with more predictable performance. Absolutely. If you need to yield, call Thread.yield(), which uses sched_yield().
(In reply to Andrew Haley from comment #5) > (In reply to Florian Weimer from comment #4) > > > > Paulo, I assume this means that there is nothing to fix on the system side, > > and the application has to be changed to use a synchronization algorithm > > with more predictable performance. > > Absolutely. If you need to yield, call Thread.yield(), which uses > sched_yield(). Andrew, Thank you for your comments and direction. From our perspective this is a CLOSED NOTABUG, and I'm marking it as such.
Thanks for the comments. Still waiting from feedback from the user, but the final recommendation was to update java code, to not expect a 1 nano park to sleep for at least 50us, and not be surprised when the system does (or attempts to do) what was asked. Since it appears the issue was to just put a logger thread to sleep and poll from time to time, we suggested the pseudo code: my park(delay) { time1 = gettime(); Thread.yeld(); time2 = gettime(); if (time2 - time1 < delay) parkNanos(delay - (time2 - time1)); time3 = gettime(); if (time3 - time1 < delay) Thread.yeld(); } Other than that, likely need a very different approach, probably using kernel-rt and not java code for the delay.