Bug 1358661

Summary: 10x performance regression in BitSet.valueOf(ByteBuffer) in 1.8.0_20
Product: Red Hat Enterprise Linux 7 Reporter: Andrey Loskutov <loskutov>
Component: java-1.8.0-openjdkAssignee: Roland Westrelin <rwestrel>
Status: CLOSED ERRATA QA Contact: Lukáš Zachar <lzachar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: ahughes, dbhole, jvanek, rbost, smatasar
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: java-1.8.0-openjdk-1.8.0.101-8.b13.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 22:57:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Example application reproducing the problem (Eclipse project included) none

Description Andrey Loskutov 2016-07-21 08:35:49 UTC
Created attachment 1182382 [details]
Example application reproducing the problem (Eclipse project included)

We detected ~10x performance regression in BitSet.valueOf(ByteBuffer) in 1.8.0_20, affecting our production software using 1.8.0_91.

Attached small application demonstrates ~10x performance regression in Java 8
(1.8.0_20) compared to Java 7 (1.7.0_91) and earlier Java 8 (1.8.0_11) in the
ByteBuffer related code (in ByteBuffer.get(int), used by BitSet.valueOf(ByteBuffer)).

By *disabling* tiered compilation via -XX:-TieredCompilation this specific
regression disappears in Java 8 versions >= 1.8.0_20.

Performance in Java 1.7.0_91, Java 8 <= 1.8.0_11 is same independently of the
-XX:TieredCompilation option value.

The regression still exists in all later Java 8 releases (tested with 1.8.0_45,
1.8.0_71, 1.8.0_91, 1.8.0_102).

Unlike to bug JDK-7147227 / JDK-7147987, this is not a regression between Java 6
and Java 7, but a regression between Java 1.8.0_11 and Java 1.8.0_20, and the
performance impact seem to be much higher (it is not 20 - 50% but whooping 1000%).

The application also demonstrates that depending on the *order* of the used API,
performance regression appears on either DirectByteBuffer or HeapByteBuffer API,
but not on both at same time!

The regression shows up in this code snippet (included into a bigger method,
for example BitSet.valueOf(ByteBuffer), where it was initially found):

int n = buffer.remaining();
while (n > 0 && buffer.get(n - 1) == 0) {
	n--;
}

This is the critical part causing performance regression in the BitSet.valueOf(ByteBuffer),
if this API is called alternating with DirectByteBuffer and HeapByteBuffer arguments.

Two important points required to reproduce the regression:
1) Application should use both DirectByteBuffer and HeapByteBuffer, alternating.
If only one argument type is used during the application execution, *no* performance regression occurs.

2) The code in question should be included into a bigger method context.
If the same code snippet is wrapped into a dedicated method and called indirectly,
this does *not* show performance regression:

/* OK */ static int iterateIndirect(ByteBuffer buffer) {
	int n = buffer.remaining();
	while (n > 0 && buffer.get(n - 1) == 0) {
		n--;
	}
	return n;
}

This is neither DirectByteBuffer nor HeapByteBuffer implementation issue, this
must be the issue with the hot spot compiler. The *first* type used in the
computation will show constant performance regression after ~5 iterations.

Since the regression disappears if the tiered compilation is disabled, the problem
seem to be somewhere in the area of C1/C2 compiler interactions/level transitions.

Probably by updating the data for the later compiled subtype C2 somehow kicks out
the already compiled code for another subtype. This would explain why the
execution runs slow for both ByteBuffer types on first iteration (before optimization),
then improves and is fast up to the ~5th iteration for both ByteBuffer types and
then after some compile event it starts to run slow but only for *one* of the
ByteBuffer types, the one, which was used first (and compiled first).

The regression seem timely to correlate with this compilation event:
Iteration 5
    317   27       4       ByteBufferTest::iterateIndirect (27 bytes)   made not entrant
    317   29    b  4       ByteBufferTest::iterateIndirect (27 bytes)

After this event, the performance drops for one of the  ByteBuffer types.

The provided test runs application 4 times with different configurations, which
show the impact of the direct/indirect code use and the impact of the call order
to the regression (the order where either DirectByteBuffer or HeapByteBuffer is
used first).

Run test with:

./run.sh

It will automatically compile the code. Detailed JVM output is saved into 4 log files:
direct.log
directInv.log
indirect.log
indirectInv.log

One can use specific JVM by exporting JAVA_HOME variable:
JAVA_HOME=/usr/lib/jvm/java-1.7.0 ./run.sh
JAVA_HOME=/usr/lib/jvm/java-1.8.0 ./run.sh

One can specify JVM arguments to be used by simply appending them to the end:
./run.sh -XX:+TieredCompilation

Example output for JDK 7 and 8:
logs/execution_log_java7_91.txt
logs/execution_log_java8_11.txt
logs/execution_log_java8_20.txt
logs/execution_log_java8_91.txt
logs/execution_log_java8_102.txt

Comment 2 Andrey Loskutov 2016-07-21 08:46:12 UTC
I've filed https://access.redhat.com/support/cases/#/case/01671633

Comment 4 Robert Bost 2016-07-22 19:45:41 UTC
Disabling type speculation (-XX:-UseTypeSpeculation) makes the reproducer run must faster. If this a safe workaround or is that something that would not be messed with?

Comment 5 Roland Westrelin 2016-07-25 07:29:12 UTC
-XX:-UseTypeSpeculation should be a safe work around. Depending on the workload, turning type speculation off can cause a performance degradation as well.

Comment 6 Andrey Loskutov 2016-07-25 09:14:45 UTC
(In reply to Roland Westrelin from comment #5)
> -XX:-UseTypeSpeculation should be a safe work around. Depending on the
> workload, turning type speculation off can cause a performance degradation
> as well.

The option (and feature) itself was implemented via https://bugs.openjdk.java.net/browse/JDK-8024070, but seems not be documented in any way.

My search in release notes, in Oracle bug tracker and hot spot mailing list didn't show any mention of "UseTypeSpeculation" value change:

1) http://www.oracle.com/technetwork/java/javase/2col/8u20-bugfixes-2257730.html
2) https://search.oracle.com/search/search?search.timezone=-120&search_startnum=&search_endnum=&num=10&search_dupid=&exttimeout=false&actProfId=0&q=UseTypeSpeculation&group=bugs.java.com&sw=t&search_p_main_operator=all&search_p_atname=&adn=&search_p_op=equals&search_p_val=&search_p_atname=&adn=&search_p_op=equals&search_p_val=
3) https://www.google.com/search?as_q=&as_epq=UseTypeSpeculation&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=http%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F&as_occt=any&safe=images&as_filetype=&as_rights=#q=%22UseTypeSpeculation%22+site:http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/&newwindow=1&as_qdr=all&start=0

It would be great if we would have some kind of user documentation available, explaining:
1) What UseTypeSpeculation flag does and how it affects JVM performance (https://bugs.openjdk.java.net/browse/JDK-8024070 isn't really helpful :-) ).
2) Why it was turned on in 8u20_b16 (see https://patches.linaro.org/patch/31481/).
3) Why it is safe to turn it off as proposed. If I understand it right, this flag was introduced to *improve* performance, so disabling it will probably decrease performance in *other areas* as those where we saw the regression? 
4) Are there any performance numbers available about UseTypeSpeculation effects on some applications?

Comment 7 Roland Westrelin 2016-07-25 09:50:50 UTC
> 1) What UseTypeSpeculation flag does and how it affects JVM performance
> (https://bugs.openjdk.java.net/browse/JDK-8024070 isn't really helpful :-) ).

Type speculation tries to fight profile pollution and make better use of available type profile data in general. Without type speculation, use of profile data is local: for instance, profile collected at a call is used during compilation to optimize that call and nothing else. With type speculation, use of profile data is global (in the compilation unit = root method + all inlined methods) so profile data used to optimize a call, for instance, can come from some other profile point in the compilation unit. In the test case of this bug, in iterateIndirect(), the virtual call buffer.get() is optimized using profile from the buffer.remaining() call which doesn't work well in that case because profile data at buffer.remaining() is not mature enough.

> 2) Why it was turned on in 8u20_b16 (see
> https://patches.linaro.org/patch/31481/).

Most likely because it was enabled upstream around that time. It was enabled upstream as part of an effort to improve performance of dynamic languages on the JVM.

> 3) Why it is safe to turn it off as proposed. If I understand it right, this
> flag was introduced to *improve* performance, so disabling it will probably
> decrease performance in *other areas* as those where we saw the regression? 

Type speculation could, in principle, improve the performance of any application but, in practice, as far as I know, it only makes a difference with implementations of dynamic languages on the JVM that take advantage of invokedynamic (i.e. javascript with nashorn).

> 4) Are there any performance numbers available about UseTypeSpeculation
> effects on some applications?

There are no official numbers that I know of but as far as I know, it makes a big difference with nashorn.

Comment 11 Roland Westrelin 2016-08-19 07:28:49 UTC
For the record, backported to upstream 8u: https://bugs.openjdk.java.net/browse/JDK-8164387

Comment 12 Andrey Loskutov 2016-08-19 08:05:31 UTC
(In reply to Roland Westrelin from comment #11)
> For the record, backported to upstream 8u:
> https://bugs.openjdk.java.net/browse/JDK-8164387

Roland, many thanks for a lightning fast fix!

Comment 15 errata-xmlrpc 2016-11-03 22:57:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2139.html