Bug 570360

Summary:	improve performance for call time data subsystem
Product:	[Other] RHQ Project	Reporter:	Joseph Marques <jmarques>
Component:	Performance	Assignee:	Joseph Marques <jmarques>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Jeff Weiss <jweiss>
Severity:	urgent	Docs Contact:
Priority:	medium
Version:	1.4	CC:	cwelton, dajohnso
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:	2.4	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-08-12 16:58:31 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	577041, 577050

Description Joseph Marques 2010-03-04 01:21:38 UTC

Customer DBA indicates that the following queries seem to contribute to locking when using call time metrics:

INSERT INTO RHQ_CALLTIME_DATA_KEY(id, schedule_id, call_destination)
SELECT RHQ_calltime_data_key_id_seq.nextval, :1, :2 FROM RHQ_numbers
WHERE i = 42 AND NOT EXISTS (SELECT * FROM RHQ_CALLTIME_DATA_KEY WHERE
schedule_id = :3 AND call_destination = :4)

DELETE FROM RHQ_CALLTIME_DATA_VALUE WHERE key_id = (SELECT id FROM
RHQ_CALLTIME_DATA_KEY WHERE schedule_id = :1 AND call_destination = :2)
AND begin_time = :3

I did some digging into the call time subsystem.  Wrote up my thoughts here -- http://www.rhq-project.org/display/RHQ/Improving+CallTimeData+Insertion+Logic

I suggest we implement 1 & 2, and then consider whether it's worth the time to try and include any of the others in the next release.

Comment 1 Charles Crouch 2010-06-18 17:34:43 UTC

I think testing and fixing other performance areas are a priority for JON2.4

Comment 2 Joseph Marques 2010-06-21 11:48:44 UTC

commit 7f912881eb1ed141a5519e62fd13522eb97a42d1
Author: Joseph Marques <joseph>
Date:   Sat Jun 19 09:42:55 2010 -0400

    BZ-570360: improve call-time data reporting performance by adjusting transactional boundaries

-----

This maps to "Part 1 - Transactional Boundaries" in the wiki link.

Comment 3 Joseph Marques 2010-06-21 11:53:11 UTC

commit a6a707cab05a9ee8cebf49e87b381b507c343285
Author: Joseph Marques <joseph>
Date:   Mon Jun 21 07:32:12 2010 -0400

    BZ-570360: eliminate routine that purges duplicate call-time data values
    
Is this really needed?  Under what circumstances are duplicates generated?  Are we presuming this is a misbehaving plugin, or a problem at the comm-layer where the same piece of data is delivered twice?  I'm tentatively removing this because it is a big hit to call-time data reporting performance.  If, however, evidence is presented that implores us to bring this functionality back, it should be implemented by getting rid of the duplicate data points that share the same key_id and begin_time. These records can be found with:
    
   SELECT key_id, begin_time, count(id)
     FROM rhq_calltime_data_value
 GROUP BY key_id, begin_time
   HAVING count(id) > 1
    
This is functionally equivalent to the CALLTIME_VALUE_DELETE_SUPERCEDED_STATEMENT query, but takes advantage of the fact that key_id represents the already-computed pair of schedule_id/destination, thus allowing the duplicate-search to be implemented against a single table.
    
Taking this solution one step further, an appropriate delete statement can be crafted which leverages the above concept but deletes all but one of the duplicates (perhaps leaving the record with the smallest id/pk).  This purge routine can either be grouped in with the rest in DataPurgeJob, or it can be implemented as its own quartz job that runs more (or less) frequently, depending on how needs (i.e., how often we anticipate duplicates)

-----

This work originally would have mapped to "Part 2 - Refactoring table maintenance", but upon further reflection was better to remove the routine altogether.

Comment 4 Joseph Marques 2010-06-21 11:57:45 UTC

Pushing this bug out of dev because we won't be doing any further incremental improvements to the call-time data subsystem.  In the future, the schema might be rewritten to yield even greater performance, but a separate BZ can be opened at that time to track that once it gets on the current release plan schedule.

FYI, testing for this item is almost a no-op, as no functional changes were made.  Transactional boundaries were manipulated, and unnecessary routines were by-passed.  I've already verified that there are no regressions to the insertion / reporting routine overall, but QA should feel free to test this if they need greater confidence before release.

Comment 5 Corey Welton 2010-06-21 14:31:50 UTC

QA Closing

Comment 6 Corey Welton 2010-08-12 16:58:31 UTC

Mass-closure of verified bugs against JON.