Bug 1130447 - Insufficient indexing performance
Summary: Insufficient indexing performance
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.3.0,6.3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: CR1
: 6.4.0
Assignee: Gustavo Fernandes
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On: 1180693
Blocks: jdg64-GA-Blockers
TreeView+ depends on / blocked
 
Reported: 2014-08-15 09:05 UTC by Radim Vansa
Modified: 2015-01-28 13:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
JBoss Issue Tracker ISPN-4847 Critical Open Improve indexing performance 2017-09-07 08:37:53 UTC

Description Radim Vansa 2014-08-15 09:05:25 UTC
This bug should track progress with multiple issues causing insufficient performance of index updates when indexing to Infinispan directory.

Setting property exclusive_index_use=true should provide much better performance but in our tests it did not help. Also, it is not safe enough for production when successful failover is expected.

Comment 2 Radim Vansa 2014-10-02 08:04:27 UTC
With JDG 6.3.1 we can achieve modest performance when using batching (variant of transactions) - we can upload batch of e.g. 500 entries and this is indexed quite soon. This improvement allows us to load enough data into the cluster to perform querying performance test.

However, customer want to use cluster with good response times, not just batch loading data and executing queries. Therefore, we have to focus on reducing response time of single write.

I think that reasonable goal would be to achieve response time not larger than two times as with indexing disabled.

Comment 3 Gustavo Fernandes 2014-10-03 13:06:51 UTC
Hi,

What is the motivation of "response time not larger than two times as with indexing disabled" goal? Is that based on some competitor or any other database? 
Currently, what is the measured response time factor due to indexing?

Since the infinispan directory maintain a full Lucene index in the grid, it minimally need to update two caches (sometimes three) in order to save the index data structures, so the number of extra RPCs will not be a power of two.

Comment 6 Gustavo Fernandes 2014-10-10 11:59:38 UTC
We are not aiming at a numeric 'goal' at the moment, but rather to reduce the number of commits that is happening during indexing.
 
In short, current indexing involves a Lucene commit on every document, which in turn generates an internal segment composed of multiple files, each file requiring roughly a couple of RPCs to persist inside the directory. Not only that, each commit also involves deleting files from the previous commit(s), which adds more overhead.

This is not optimal and by reducing the number of commits, we expect to  drastically improve performance on both the sync and async indexing styles

Comment 7 Radim Vansa 2014-10-10 14:28:40 UTC
My personal opinion is that neither 'drastic' improvement won't be enough to be competitive, as long as the number of RPC messages for single write will be significantly larger with indexing than without indexing (that's usually <= 2 RPCs = 4 messages for non-tx write and <= 4 RPCs for tx with 2-phase commit; assuming dist mode with 2 owners).

What is your expectancy on average number of RPCs per write that you could achieve by optimizing current design?

Comment 8 Radim Vansa 2014-10-14 14:32:31 UTC
Hi, I have benchmarked update speeds for replicated cluster of 4 nodes without indexing, with indexing to RAM and indexing to FS (both set with NRT indexmanager) and the results are not really impressive: with indexing the throughput is about 25% or non-indexed version (for both RAM and FS).

Test results: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-perf-query-indexing/4/artifact/results/html/test_test.html
Configuration etc.: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-perf-query-indexing/4/artifact/results/html/index.html

Comment 9 Sanne Grinovero 2014-10-14 18:22:41 UTC
Thanks Radim, it's great to have a benchmark. That implies we can now improve things, and don't worry we won't need to keep this high amount of RPCs.
Solutions however will be discussed on mailing list of relevant projects; I have some ideas but are currently overloaded with work on other subjects.

Your "goal" is reasonable but it should be tracked as a temptative aim for community version: we can only backport improvements from community, and only commit on backports when we've verified feasibility in community.

Please close this BZ, especially as this is not having the visibility of those who can fix it.

Comment 10 Radim Vansa 2014-10-15 06:24:19 UTC
Sanne: I agree that the improvements have to start in community, though, competitive indexing performance is a business requirement - and the product is driving community development.

Therefore, I'll keep this BZ open and link JIRA to that.

Comment 13 Radim Vansa 2014-10-15 16:28:23 UTC
So, with async worker and execution queue of 1,000 entries we can achieve total throughput about 1,900 ops/s with average latency of 50 ms for writes and 30 ms for removes - that's certainly better [1] but still far from non-indexed performance (the linked test for non-indexed writes does not reach the maximum but it is > 73,000 ops/s that can be seen in the report)

It seems that there's some bug, though, as on one node (this is the node where the updates are processed) we can't achieve almost any writes. Generally, the performance on different nodes is not very even (that's why the report is red; click the configuration name in the report to expand statistics for each node).

There are no errors in the log on this node during the test execution, although the node shutdown after was not very clean - there were tons of errors as the worker tried to index into cache that was already shut down.

I did not measure the lag between writing the entry and reflecting the change in the query.

[1] https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-perf-query-indexing/8/artifact/results/html/test_test.html

Comment 15 Radim Vansa 2014-10-15 17:09:09 UTC
Just out of curiosity, I've ran another asynchronous configuration with worker threadpool with 8 threads (default value is 1).

I have achieved 2100 ops/s, that's not much different result.

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-perf-query-indexing/9/artifact/results/html/test_test.html

Comment 17 Gustavo Fernandes 2014-10-17 08:31:40 UTC
The async performance you're seeing is not a surprise - as it stands, async will simply do the same commit per entity as the sync mode, but in the background. 
Async performance is being improved on https://hibernate.atlassian.net/browse/HSEARCH-1693

Comment 18 Radim Vansa 2014-10-17 11:20:01 UTC
Gustavo: OK, thanks for the info. Now, what about sync performance, as it seems that competitors are sync? You were talking about several improvements in tens of percents - what are your expectations when those improvements are finished? I welcome these as it should improve the performance of local directory access, but how will that multiply with that ~11 operations/s throughput I am getting?

Comment 19 Gustavo Fernandes 2014-10-20 07:27:41 UTC
SYNC performance is being handled on https://hibernate.atlassian.net/browse/HSEARCH-1699
Strategy will be similar to the ASYNC case (reducing number of index commits), and given that ASYNC preliminary tests showed some great improvement (https://github.com/hibernate/hibernate-search/pull/681), we expect throughput to increase more than tens of percent for the SYNC case as well.

Comment 21 Martin Gencur 2014-12-10 11:36:25 UTC
Moving this to ON_QA as Hibernate Search has been upgraded in ER7 to the right version.

Gustavo, please move it back if this is not ready yet. Thanks

Comment 22 Radim Vansa 2014-12-16 12:25:58 UTC
The current strategy allows increased throughput, but the latency of each requests was not improved. Therefore, I have tried to find the maximum throughput by running increased load from many parallel threads.

Regrettably, as there is single node processing all the requests and other nodes issue synchronous RPCs to this node, the 'remote' threadpool and later (due to rejection policy set to caller-runs) OOB threadpool too gets depleted. After that, the node cannot execute any more RPCs to the clustered cache which holds the indexing information and this leads to deadlock.

I have detected this situation with 250 concurrent threads on each node of 4-node cluster (200 threads were OK) and 100 threads on each node of 8-node cluster (80 threads were handled properly). OOB thread pool was set to 500 threads, remote thread pool was on defaults.

I will yet provide data for the latencies and throughput under manageable load. 

Still, there's no period of graceful degradation, the application suddenly deadlocks.

Comment 26 Radim Vansa 2015-01-15 10:31:45 UTC
I've ran JDG 6.3.2 with up to 500 concurrent threads writing to the cache and removing entries (without batching), and JDG 6.4.0.CR1 with up to 2000 threads, both on cluster of 4 nodes.

JDG 6.3.2 offered unstable performance of 15-40 updates/s, while with certain load (300 or 350 threads) almost no operations were executed.

JDG 6.4.0.CR1 was able to get up to about 2100 updates/s. But only up to parallelism of 800 concurrent threads. After that, thread pools (OOB=500 + Remote=200) on one node were depleted and the performance dropped to about 100 updates/s.

With 8 nodes, 1700 updates/s were achieved with 600 concurrent threads.
Mean latency of puts was about 500 ms, removes about 50-90 ms (the results were quite unstable).

Such performance can be considered viable for applications doing occasional updates and offers dramatic improvement compared to previous version.


Note You need to log in before you can comment on or make changes to this bug.