Bug 534571 (RHQ-1355)
Summary: | Investigate performance improvements for table purges | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Charles Crouch <ccrouch> |
Component: | Performance | Assignee: | Joseph Marques <jmarques> |
Status: | CLOSED DUPLICATE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 1.2 | CC: | hbrock |
Target Milestone: | --- | Keywords: | Task |
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://jira.rhq-project.org/browse/RHQ-1355 | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Charles Crouch
2009-01-13 14:44:00 UTC
Deletes become more and more inefficient on large tables because they are almost always implemented as single-threaded algorithms. However, selects are often *highly* threaded algorithms. Knowing this, can improve purges by issuing selects to pull data into memory, immediately followed by delete statements on those same rows. In short, because we can select with greater concurrency, we can delete more in the same amount of time (or delete the same amount in less time). Once the physical blocks which contain the rows to be deleted are in memory from the select statements, the deletes (depending on the size of the db cache) should have almost no cache misses, thus speeding up the overall operation. Oh, and this solution is irrespective of the chunk size (# of rows to be deleted determined by deleteRowDataOlderThan timestamp for the respective table being shortened). We could always do a little analytics ahead of time to figure out whether we should chunk the work out or just try to purge the whole lot of data each time the purge job runs. If the database (or all the servers) have been down for a while, this will naturally create a lot of work for the purge job if we try to process all data deleteRowDataOlderThan in a single transaction. it might behoove us to figure out how many rows that is or what timeframe that spans (oldest row up until deleteRowDataOlderThan). then, break up the work into separate transactions that each delete chunks (perhaps 1-hr, perhaps smaller) of data. duplicate of jiras which RHQ-2372, RHQ-2376, and RHQ-1448 collectively resolved by using a chunking solutions combined with appropriate indexes to reduce table contention in high-throughput write environments. This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1355 This bug is related to RHQ-1336 This bug is related to RHQ-1119 This bug was marked DUPLICATE in the database but has no duplicate of bug id. *** This bug has been marked as a duplicate of bug 535704 *** |