1358661 – 10x performance regression in BitSet.valueOf(ByteBuffer) in 1.8.0_20

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1358661 - 10x performance regression in BitSet.valueOf(ByteBuffer) in 1.8.0_20

Summary: 10x performance regression in BitSet.valueOf(ByteBuffer) in 1.8.0_20

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	java-1.8.0-openjdk
Sub Component:
Version:	7.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Roland Westrelin
QA Contact:	Lukáš Zachar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-21 08:35 UTC by Andrey Loskutov
Modified:	2019-11-14 08:46 UTC (History)
CC List:	5 users (show)
Fixed In Version:	java-1.8.0-openjdk-1.8.0.101-8.b13.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-03 22:57:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Example application reproducing the problem (Eclipse project included) (20.38 KB, application/zip) 2016-07-21 08:35 UTC, Andrey Loskutov	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:2139	0	normal	SHIPPED_LIVE	java-1.8.0-openjdk bug fix and enhancement update	2016-11-03 13:12:04 UTC

Description Andrey Loskutov 2016-07-21 08:35:49 UTC

Created attachment 1182382 [details]
Example application reproducing the problem (Eclipse project included)

We detected ~10x performance regression in BitSet.valueOf(ByteBuffer) in 1.8.0_20, affecting our production software using 1.8.0_91.

Attached small application demonstrates ~10x performance regression in Java 8
(1.8.0_20) compared to Java 7 (1.7.0_91) and earlier Java 8 (1.8.0_11) in the
ByteBuffer related code (in ByteBuffer.get(int), used by BitSet.valueOf(ByteBuffer)).

By *disabling* tiered compilation via -XX:-TieredCompilation this specific
regression disappears in Java 8 versions >= 1.8.0_20.

Performance in Java 1.7.0_91, Java 8 <= 1.8.0_11 is same independently of the
-XX:TieredCompilation option value.

The regression still exists in all later Java 8 releases (tested with 1.8.0_45,
1.8.0_71, 1.8.0_91, 1.8.0_102).

Unlike to bug JDK-7147227 / JDK-7147987, this is not a regression between Java 6
and Java 7, but a regression between Java 1.8.0_11 and Java 1.8.0_20, and the
performance impact seem to be much higher (it is not 20 - 50% but whooping 1000%).

The application also demonstrates that depending on the *order* of the used API,
performance regression appears on either DirectByteBuffer or HeapByteBuffer API,
but not on both at same time!

The regression shows up in this code snippet (included into a bigger method,
for example BitSet.valueOf(ByteBuffer), where it was initially found):

int n = buffer.remaining();
while (n > 0 && buffer.get(n - 1) == 0) {
	n--;
}

This is the critical part causing performance regression in the BitSet.valueOf(ByteBuffer),
if this API is called alternating with DirectByteBuffer and HeapByteBuffer arguments.

Two important points required to reproduce the regression:
1) Application should use both DirectByteBuffer and HeapByteBuffer, alternating.
If only one argument type is used during the application execution, *no* performance regression occurs.

2) The code in question should be included into a bigger method context.
If the same code snippet is wrapped into a dedicated method and called indirectly,
this does *not* show performance regression:

/* OK */ static int iterateIndirect(ByteBuffer buffer) {
	int n = buffer.remaining();
	while (n > 0 && buffer.get(n - 1) == 0) {
		n--;
	}
	return n;
}

This is neither DirectByteBuffer nor HeapByteBuffer implementation issue, this
must be the issue with the hot spot compiler. The *first* type used in the
computation will show constant performance regression after ~5 iterations.

Since the regression disappears if the tiered compilation is disabled, the problem
seem to be somewhere in the area of C1/C2 compiler interactions/level transitions.

Probably by updating the data for the later compiled subtype C2 somehow kicks out
the already compiled code for another subtype. This would explain why the
execution runs slow for both ByteBuffer types on first iteration (before optimization),
then improves and is fast up to the ~5th iteration for both ByteBuffer types and
then after some compile event it starts to run slow but only for *one* of the
ByteBuffer types, the one, which was used first (and compiled first).

The regression seem timely to correlate with this compilation event:
Iteration 5
    317   27       4       ByteBufferTest::iterateIndirect (27 bytes)   made not entrant
    317   29    b  4       ByteBufferTest::iterateIndirect (27 bytes)

After this event, the performance drops for one of the  ByteBuffer types.

The provided test runs application 4 times with different configurations, which
show the impact of the direct/indirect code use and the impact of the call order
to the regression (the order where either DirectByteBuffer or HeapByteBuffer is
used first).

Run test with:

./run.sh

It will automatically compile the code. Detailed JVM output is saved into 4 log files:
direct.log
directInv.log
indirect.log
indirectInv.log

One can use specific JVM by exporting JAVA_HOME variable:
JAVA_HOME=/usr/lib/jvm/java-1.7.0 ./run.sh
JAVA_HOME=/usr/lib/jvm/java-1.8.0 ./run.sh

One can specify JVM arguments to be used by simply appending them to the end:
./run.sh -XX:+TieredCompilation

Example output for JDK 7 and 8:
logs/execution_log_java7_91.txt
logs/execution_log_java8_11.txt
logs/execution_log_java8_20.txt
logs/execution_log_java8_91.txt
logs/execution_log_java8_102.txt

Comment 2 Andrey Loskutov 2016-07-21 08:46:12 UTC

I've filed https://access.redhat.com/support/cases/#/case/01671633

Comment 3 Roland Westrelin 2016-07-22 09:44:34 UTC

Upstream bug: https://bugs.openjdk.java.net/browse/JDK-8162384
upstream review thread: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-July/023872.html

Comment 4 Robert Bost 2016-07-22 19:45:41 UTC

Disabling type speculation (-XX:-UseTypeSpeculation) makes the reproducer run must faster. If this a safe workaround or is that something that would not be messed with?

Comment 5 Roland Westrelin 2016-07-25 07:29:12 UTC

-XX:-UseTypeSpeculation should be a safe work around. Depending on the workload, turning type speculation off can cause a performance degradation as well.

Comment 6 Andrey Loskutov 2016-07-25 09:14:45 UTC

(In reply to Roland Westrelin from comment #5)
> -XX:-UseTypeSpeculation should be a safe work around. Depending on the
> workload, turning type speculation off can cause a performance degradation
> as well.

The option (and feature) itself was implemented via https://bugs.openjdk.java.net/browse/JDK-8024070, but seems not be documented in any way.

My search in release notes, in Oracle bug tracker and hot spot mailing list didn't show any mention of "UseTypeSpeculation" value change:

1) http://www.oracle.com/technetwork/java/javase/2col/8u20-bugfixes-2257730.html
2) https://search.oracle.com/search/search?search.timezone=-120&search_startnum=&search_endnum=&num=10&search_dupid=&exttimeout=false&actProfId=0&q=UseTypeSpeculation&group=bugs.java.com&sw=t&search_p_main_operator=all&search_p_atname=&adn=&search_p_op=equals&search_p_val=&search_p_atname=&adn=&search_p_op=equals&search_p_val=
3) https://www.google.com/search?as_q=&as_epq=UseTypeSpeculation&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=http%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F&as_occt=any&safe=images&as_filetype=&as_rights=#q=%22UseTypeSpeculation%22+site:http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/&newwindow=1&as_qdr=all&start=0

It would be great if we would have some kind of user documentation available, explaining:
1) What UseTypeSpeculation flag does and how it affects JVM performance (https://bugs.openjdk.java.net/browse/JDK-8024070 isn't really helpful :-) ).
2) Why it was turned on in 8u20_b16 (see https://patches.linaro.org/patch/31481/).
3) Why it is safe to turn it off as proposed. If I understand it right, this flag was introduced to *improve* performance, so disabling it will probably decrease performance in *other areas* as those where we saw the regression? 
4) Are there any performance numbers available about UseTypeSpeculation effects on some applications?

Comment 7 Roland Westrelin 2016-07-25 09:50:50 UTC

> 1) What UseTypeSpeculation flag does and how it affects JVM performance
> (https://bugs.openjdk.java.net/browse/JDK-8024070 isn't really helpful :-) ).

Type speculation tries to fight profile pollution and make better use of available type profile data in general. Without type speculation, use of profile data is local: for instance, profile collected at a call is used during compilation to optimize that call and nothing else. With type speculation, use of profile data is global (in the compilation unit = root method + all inlined methods) so profile data used to optimize a call, for instance, can come from some other profile point in the compilation unit. In the test case of this bug, in iterateIndirect(), the virtual call buffer.get() is optimized using profile from the buffer.remaining() call which doesn't work well in that case because profile data at buffer.remaining() is not mature enough.

> 2) Why it was turned on in 8u20_b16 (see
> https://patches.linaro.org/patch/31481/).

Most likely because it was enabled upstream around that time. It was enabled upstream as part of an effort to improve performance of dynamic languages on the JVM.

> 3) Why it is safe to turn it off as proposed. If I understand it right, this
> flag was introduced to *improve* performance, so disabling it will probably
> decrease performance in *other areas* as those where we saw the regression? 

Type speculation could, in principle, improve the performance of any application but, in practice, as far as I know, it only makes a difference with implementations of dynamic languages on the JVM that take advantage of invokedynamic (i.e. javascript with nashorn).

> 4) Are there any performance numbers available about UseTypeSpeculation
> effects on some applications?

There are no official numbers that I know of but as far as I know, it makes a big difference with nashorn.

Comment 8 jiri vanek 2016-07-28 12:40:40 UTC

http://pkgs.devel.redhat.com/cgit/rpms/java-1.8.0-openjdk/commit/?h=rhel-7.3&id=b2a4d93793b3846caf5db0a7d196f786baaf6385

Comment 11 Roland Westrelin 2016-08-19 07:28:49 UTC

For the record, backported to upstream 8u: https://bugs.openjdk.java.net/browse/JDK-8164387

Comment 12 Andrey Loskutov 2016-08-19 08:05:31 UTC

(In reply to Roland Westrelin from comment #11)
> For the record, backported to upstream 8u:
> https://bugs.openjdk.java.net/browse/JDK-8164387

Roland, many thanks for a lightning fast fix!

Comment 15 errata-xmlrpc 2016-11-03 22:57:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2139.html

Note You need to log in before you can comment on or make changes to this bug.