570360 – improve performance for call time data subsystem

Bug 570360 - improve performance for call time data subsystem

Summary: improve performance for call time data subsystem

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Performance
Sub Component:
Version:	1.4
Hardware:	All
OS:	All
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Joseph Marques
QA Contact:	Jeff Weiss
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	jon24-perf jon-calltime
TreeView+	depends on / blocked

Reported:	2010-03-04 01:21 UTC by Joseph Marques
Modified:	2014-11-09 22:50 UTC (History)
CC List:	2 users (show)
Fixed In Version:	2.4
Clone Of:
Environment:
Last Closed:	2010-08-12 16:58:31 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	658491	0	low	CLOSED	For Oracle several call-time metrics received in one batch are assigned to the first resource in the batch	2021-02-22 00:41:40 UTC

Internal Links: 658491

Description Joseph Marques 2010-03-04 01:21:38 UTC

Customer DBA indicates that the following queries seem to contribute to locking when using call time metrics:

INSERT INTO RHQ_CALLTIME_DATA_KEY(id, schedule_id, call_destination)
SELECT RHQ_calltime_data_key_id_seq.nextval, :1, :2 FROM RHQ_numbers
WHERE i = 42 AND NOT EXISTS (SELECT * FROM RHQ_CALLTIME_DATA_KEY WHERE
schedule_id = :3 AND call_destination = :4)

DELETE FROM RHQ_CALLTIME_DATA_VALUE WHERE key_id = (SELECT id FROM
RHQ_CALLTIME_DATA_KEY WHERE schedule_id = :1 AND call_destination = :2)
AND begin_time = :3

I did some digging into the call time subsystem.  Wrote up my thoughts here -- http://www.rhq-project.org/display/RHQ/Improving+CallTimeData+Insertion+Logic

I suggest we implement 1 & 2, and then consider whether it's worth the time to try and include any of the others in the next release.

Comment 1 Charles Crouch 2010-06-18 17:34:43 UTC

I think testing and fixing other performance areas are a priority for JON2.4

Comment 2 Joseph Marques 2010-06-21 11:48:44 UTC

commit 7f912881eb1ed141a5519e62fd13522eb97a42d1
Author: Joseph Marques <joseph>
Date:   Sat Jun 19 09:42:55 2010 -0400

    BZ-570360: improve call-time data reporting performance by adjusting transactional boundaries

-----

This maps to "Part 1 - Transactional Boundaries" in the wiki link.

Comment 3 Joseph Marques 2010-06-21 11:53:11 UTC

commit a6a707cab05a9ee8cebf49e87b381b507c343285
Author: Joseph Marques <joseph>
Date:   Mon Jun 21 07:32:12 2010 -0400

    BZ-570360: eliminate routine that purges duplicate call-time data values
    
Is this really needed?  Under what circumstances are duplicates generated?  Are we presuming this is a misbehaving plugin, or a problem at the comm-layer where the same piece of data is delivered twice?  I'm tentatively removing this because it is a big hit to call-time data reporting performance.  If, however, evidence is presented that implores us to bring this functionality back, it should be implemented by getting rid of the duplicate data points that share the same key_id and begin_time. These records can be found with:
    
   SELECT key_id, begin_time, count(id)
     FROM rhq_calltime_data_value
 GROUP BY key_id, begin_time
   HAVING count(id) > 1
    
This is functionally equivalent to the CALLTIME_VALUE_DELETE_SUPERCEDED_STATEMENT query, but takes advantage of the fact that key_id represents the already-computed pair of schedule_id/destination, thus allowing the duplicate-search to be implemented against a single table.
    
Taking this solution one step further, an appropriate delete statement can be crafted which leverages the above concept but deletes all but one of the duplicates (perhaps leaving the record with the smallest id/pk).  This purge routine can either be grouped in with the rest in DataPurgeJob, or it can be implemented as its own quartz job that runs more (or less) frequently, depending on how needs (i.e., how often we anticipate duplicates)

-----

This work originally would have mapped to "Part 2 - Refactoring table maintenance", but upon further reflection was better to remove the routine altogether.

Comment 4 Joseph Marques 2010-06-21 11:57:45 UTC

Pushing this bug out of dev because we won't be doing any further incremental improvements to the call-time data subsystem.  In the future, the schema might be rewritten to yield even greater performance, but a separate BZ can be opened at that time to track that once it gets on the current release plan schedule.

FYI, testing for this item is almost a no-op, as no functional changes were made.  Transactional boundaries were manipulated, and unnecessary routines were by-passed.  I've already verified that there are no regressions to the insertion / reporting routine overall, but QA should feel free to test this if they need greater confidence before release.

Comment 5 Corey Welton 2010-06-21 14:31:50 UTC

QA Closing

Comment 6 Corey Welton 2010-08-12 16:58:31 UTC

Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.