This bug should track progress with multiple issues causing insufficient performance of index updates when indexing to Infinispan directory. Setting property exclusive_index_use=true should provide much better performance but in our tests it did not help. Also, it is not safe enough for production when successful failover is expected.
With JDG 6.3.1 we can achieve modest performance when using batching (variant of transactions) - we can upload batch of e.g. 500 entries and this is indexed quite soon. This improvement allows us to load enough data into the cluster to perform querying performance test. However, customer want to use cluster with good response times, not just batch loading data and executing queries. Therefore, we have to focus on reducing response time of single write. I think that reasonable goal would be to achieve response time not larger than two times as with indexing disabled.
Hi, What is the motivation of "response time not larger than two times as with indexing disabled" goal? Is that based on some competitor or any other database? Currently, what is the measured response time factor due to indexing? Since the infinispan directory maintain a full Lucene index in the grid, it minimally need to update two caches (sometimes three) in order to save the index data structures, so the number of extra RPCs will not be a power of two.
We are not aiming at a numeric 'goal' at the moment, but rather to reduce the number of commits that is happening during indexing. In short, current indexing involves a Lucene commit on every document, which in turn generates an internal segment composed of multiple files, each file requiring roughly a couple of RPCs to persist inside the directory. Not only that, each commit also involves deleting files from the previous commit(s), which adds more overhead. This is not optimal and by reducing the number of commits, we expect to drastically improve performance on both the sync and async indexing styles
My personal opinion is that neither 'drastic' improvement won't be enough to be competitive, as long as the number of RPC messages for single write will be significantly larger with indexing than without indexing (that's usually <= 2 RPCs = 4 messages for non-tx write and <= 4 RPCs for tx with 2-phase commit; assuming dist mode with 2 owners). What is your expectancy on average number of RPCs per write that you could achieve by optimizing current design?
Hi, I have benchmarked update speeds for replicated cluster of 4 nodes without indexing, with indexing to RAM and indexing to FS (both set with NRT indexmanager) and the results are not really impressive: with indexing the throughput is about 25% or non-indexed version (for both RAM and FS). Test results: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-perf-query-indexing/4/artifact/results/html/test_test.html Configuration etc.: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-perf-query-indexing/4/artifact/results/html/index.html
Thanks Radim, it's great to have a benchmark. That implies we can now improve things, and don't worry we won't need to keep this high amount of RPCs. Solutions however will be discussed on mailing list of relevant projects; I have some ideas but are currently overloaded with work on other subjects. Your "goal" is reasonable but it should be tracked as a temptative aim for community version: we can only backport improvements from community, and only commit on backports when we've verified feasibility in community. Please close this BZ, especially as this is not having the visibility of those who can fix it.
Sanne: I agree that the improvements have to start in community, though, competitive indexing performance is a business requirement - and the product is driving community development. Therefore, I'll keep this BZ open and link JIRA to that.
So, with async worker and execution queue of 1,000 entries we can achieve total throughput about 1,900 ops/s with average latency of 50 ms for writes and 30 ms for removes - that's certainly better [1] but still far from non-indexed performance (the linked test for non-indexed writes does not reach the maximum but it is > 73,000 ops/s that can be seen in the report) It seems that there's some bug, though, as on one node (this is the node where the updates are processed) we can't achieve almost any writes. Generally, the performance on different nodes is not very even (that's why the report is red; click the configuration name in the report to expand statistics for each node). There are no errors in the log on this node during the test execution, although the node shutdown after was not very clean - there were tons of errors as the worker tried to index into cache that was already shut down. I did not measure the lag between writing the entry and reflecting the change in the query. [1] https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-perf-query-indexing/8/artifact/results/html/test_test.html
Just out of curiosity, I've ran another asynchronous configuration with worker threadpool with 8 threads (default value is 1). I have achieved 2100 ops/s, that's not much different result. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-perf-query-indexing/9/artifact/results/html/test_test.html
The async performance you're seeing is not a surprise - as it stands, async will simply do the same commit per entity as the sync mode, but in the background. Async performance is being improved on https://hibernate.atlassian.net/browse/HSEARCH-1693
Gustavo: OK, thanks for the info. Now, what about sync performance, as it seems that competitors are sync? You were talking about several improvements in tens of percents - what are your expectations when those improvements are finished? I welcome these as it should improve the performance of local directory access, but how will that multiply with that ~11 operations/s throughput I am getting?
SYNC performance is being handled on https://hibernate.atlassian.net/browse/HSEARCH-1699 Strategy will be similar to the ASYNC case (reducing number of index commits), and given that ASYNC preliminary tests showed some great improvement (https://github.com/hibernate/hibernate-search/pull/681), we expect throughput to increase more than tens of percent for the SYNC case as well.
Moving this to ON_QA as Hibernate Search has been upgraded in ER7 to the right version. Gustavo, please move it back if this is not ready yet. Thanks
The current strategy allows increased throughput, but the latency of each requests was not improved. Therefore, I have tried to find the maximum throughput by running increased load from many parallel threads. Regrettably, as there is single node processing all the requests and other nodes issue synchronous RPCs to this node, the 'remote' threadpool and later (due to rejection policy set to caller-runs) OOB threadpool too gets depleted. After that, the node cannot execute any more RPCs to the clustered cache which holds the indexing information and this leads to deadlock. I have detected this situation with 250 concurrent threads on each node of 4-node cluster (200 threads were OK) and 100 threads on each node of 8-node cluster (80 threads were handled properly). OOB thread pool was set to 500 threads, remote thread pool was on defaults. I will yet provide data for the latencies and throughput under manageable load. Still, there's no period of graceful degradation, the application suddenly deadlocks.
I've ran JDG 6.3.2 with up to 500 concurrent threads writing to the cache and removing entries (without batching), and JDG 6.4.0.CR1 with up to 2000 threads, both on cluster of 4 nodes. JDG 6.3.2 offered unstable performance of 15-40 updates/s, while with certain load (300 or 350 threads) almost no operations were executed. JDG 6.4.0.CR1 was able to get up to about 2100 updates/s. But only up to parallelism of 800 concurrent threads. After that, thread pools (OOB=500 + Remote=200) on one node were depleted and the performance dropped to about 100 updates/s. With 8 nodes, 1700 updates/s were achieved with 600 concurrent threads. Mean latency of puts was about 500 ms, removes about 50-90 ms (the results were quite unstable). Such performance can be considered viable for applications doing occasional updates and offers dramatic improvement compared to previous version.