Red Hat Bugzilla – Bug 1024803
Application hangs out with large datasets (150k records)
Last modified: 2014-08-06 16:12:37 EDT
Description of problem:
Application runs indefinitely after inserting a KPI obtained from a SQL query over a table with 150.000 records.
Version-Release number of selected component (if applicable):
commit of last 24th october.
JBoss Application Server 7.1.1.Final.
Not sure. I don't have more large datasets.
Steps to Reproduce:
1.Create a query over a table with a large number of records.
2.Create a new page.
3.Insert a new KPI.
4.Select the query as its data source.
5.The browser starts to "think" (circle icon turning-around indefinetely).
6.The server's CPU JBoss process occupies a lot of CPU.
Browser waiting for a server's response for minutes, and the JBoss process consuming about half of the CPU.
The solution found is to kill the JBoss process (sure there are better ones).
If a new dashbuilder session is opened on a different browser it does not work, as the server seems to be fully attending the previous session.
Putting back to high. Based on 'severity' description this is more fitting.
To prevent that kind of issues a few of runtime constraints has been added to the tool. If some of these constraints are broken then the current request is aborted an a proper error message is displayed to the user. The list of runtime constrains that have been added are (default values within brackets):
- Maximum memory in bytes a data set load operation may consume (200 Mb)
- Maximum size in bytes for a single data set instance (100 Mb)
- Maximum time in milliseconds a data set load operation may last (10 s)
- Maximum time in milliseconds a data set filter operation may last (10 s)
- Maximum time in milliseconds a data set group operation may last (10 s)
- Maximum time in milliseconds a data set sort operation may last (10 s)
All these values can be changed via system properties at start-up (see https://github.com/droolsjbpm/dashboard-builder/blob/master/scripts/run.sh as an example of how to do so in jetty mode).
Notice, the main purpose of this mechanism is to protect the system against cpu or memory intensive requests which could cause the system to hang.
Notice also that dashbuilder is not intended to hold large data sets in memory. The tooling is not a data base nor a data management tool nor a business intelligence system. It's just a data visualization tool with the ability to gather data from different systems. Therefore users should carefully think about how the data to be displayed in a dashboard can be accessed in a optimized way.
Github commit (master): https://github.com/droolsjbpm/dashboard-builder/commit/1684ddf7783d31862552628697b3ccb7f3a27a19
Github commit (6.0.x): https://github.com/droolsjbpm/dashboard-builder/commit/82034d720c9da328be82e9511cb18b206f358982
Thanks for the clarification.
But we have advanced on our analysis with a 5000 records table.
It runs smoothly on all operations like drill-down, unless..
You insert as a filter o a Filters panel, one field that has about 1000 distinct values.
If the Filters panel does not included that field, it works quite fast.
But after inserting that field as a filter, the response time multiplies by 10...
For all other fields it works perfectly, as the number of distinct values per param is not as high.
Perhaps implementing that search as something similar to a "query" search when the user enters the value, instead of a combo box, could solve the problem for those params with a huge number of options.
As of ER5 (comunity 6 final), there exists a system property 'org.jboss.dashboard.ui.DashboardSettings.maxEntriesInFilters' to deal with columns with a greater number of distinct values. The default value is 1000 but you can change it. So only properties with less than 1000 distinct entries will display them in filter combos. For others, the combo will display only an entry called '- Custom -'. This option activates an input field where you can type custom search patterns, like 'John*' (= all entries starting with 'John'). I think, this looks like very similar to what you're suggesting.
Notice also that for any label property, there exists an internal index to boost the access to distinct values. We've tested the system with medium/large datasets (around 500K rows) and filter capabilities did run within acceptable response times.
So, we recommend using the latest product version and check if it's works for you. If not, I'd be very pleased to assist you. Just contact me on irc.freenode.net, #dashbuilder channel or via email firstname.lastname@example.org
Thanks a lot, David, for the quick response.
We will definitely try this. Sure it's quite similar to what I meant.
We have just migrated from version 22.214.171.124-RC2, running under JBoss, to yesterday's snapshot, running under Tomcat, and we have noticed that it runs slower... Queries seems to take long.
We are going to test this latest snapshot against JBoss and compare performance on both Tomcat and JBoss.
I'll keep you informed,
Are there additional options to enhance performance under Tomcat?
Thanks in advance,
Ok, verified with BPMS 6.0.0 ER7.
I generated a database with 10^6 records and manually tried to load (subsets of) all the data when creating data provider. The mechanism seems to work correctly: when time limit is exceeded, the warning 'Data set load time has been exceeded = 10.0s' is displayed and loading of data is stopped.
However no automated tests will be written to test this because the tests would be too resource hungry and very likely also brittle.