Bug 535686 - (RHQ-2356) Oracle Database need 20h to complete DELETE statement on call-time data
Oracle Database need 20h to complete DELETE statement on call-time data
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Performance (Show other bugs)
unspecified
All All
high Severity medium (vote)
: ---
: ---
Assigned To: Heiko W. Rupp
http://jira.rhq-project.org/browse/RH...
: FutureFeature, Improvement
Depends On:
Blocks: jon-calltime
  Show dependency treegraph
 
Reported: 2009-08-14 10:50 EDT by Frank Brueseke
Modified: 2010-08-12 12:49 EDT (History)
3 users (show)

See Also:
Fixed In Version: 2.4
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
RHQ 1.2, Oracle 10g on a Linux Box; Agent on a Solaris server
Last Closed: 2010-08-12 12:49:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Frank Brueseke 2009-08-14 10:50:00 EDT
The database needs an impossible amount of time to clean up its call-time data tables if the call-time data tables get big (more than 500000 entries). We observed such a delete statement that took 20 hours to complete! What's more, it pratically disabled the machine using up 100% CPU and caused the RHQ server to produce all kinds of DB errors (failed inserts, deletes, timed out transactions, etc.).

Our solution is to install the following indexes:
CREATE INDEX PERMEO.CALL_DAT_END ON PERMEO.RHQ_CALLTIME_DATA_VALUE (END_TIME) COMPRESS 1 TABLESPACE BASIDX;
CREATE INDEX PERMEO.CALL_KEY_BEG ON PERMEO.RHQ_CALLTIME_DATA_VALUE (KEY_ID, BEGIN_TIME) COMPRESS 1 TABLESPACE BASIDX ONLINE;

In our case everything works like charm now (same test case but CPU <5%).
Please include this fix in the next version of RHQ!

Also note that I have seen a similar thing happening using Postgres. I suggest to include similar indexes for Postgres as well.
Comment 1 John Mazzitelli 2009-08-14 12:39:22 EDT
Frank - what was the size of your calltime data?  Row counts and all that.
Comment 2 John Mazzitelli 2009-08-14 12:40:19 EDT
and adding those indexes - did you notice any degredation of performance on insert?  if so, can you quantify that?
Comment 3 Frank Brueseke 2009-08-19 02:58:03 EDT
I think at the time we added the INDEX we had about 900.000 rows in the call-time value table. Now after some more test runs we have 4.843.626 rows.

Actually we didn't check the performance of the insert statement before and after. I remember that we noticed that the insert statement dominated the resource usage statistics in Oracle after adding the indexes. But that is no wonder as I was running a test case for data collection and my RHQ plugin basically consists of call-time data metrics. We didn't investigate further as the box was near idle and thus the performance was satisfactory for us. 

I will ask a college to have a look at the Oracle performance tool. We can give you some stats for the insert statement execution after the indexes where added.
Comment 4 Frank Brueseke 2009-08-20 05:17:18 EDT
Ok here are some DB-stats for you. Formatting in pure text is hard, but I try my best. First for the delete statement:
####
Query: DELETE FROM RHQ_CALLTIME_DATA_VALUE WHERE key_id = (SELECT id FROM RHQ_CALLTIME_DATA_KEY WHERE
schedule_id = :1 AND call_destination = :2) AND begin_time = :3

#BEFORE:

Plan:
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT              |                           |       |       |     2 (100)|       |
|   1 |  DELETE                       | RHQ_CALLTIME_DATA_VALUE   |       |       |            |       |
|   2 |   TABLE ACCESS FULL           | RHQ_CALLTIME_DATA_VALUE   |     1 |    39 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| RHQ_CALLTIME_DATA_KEY     |     1 |  2028 |     0   (0)|       |
|   4 |     INDEX UNIQUE SCAN         | RHQ_CALLTIME_DATA_KEY_IDX |     1 |       |     0   (0)|       |
-----------------------------------------------------------------------------------------------------------
Please note that this involves a full table scan!

Stats, i.e Resource usage for the query:
				Total		Per Execution	Per Row
Executions			3			1			0.00
CPU Time (sec)	7066.83		2355.61			1.79
Buffer Gets		377280523	125760174.33	95538.24
Disk Reads		1815		605.00			0.46
Direct Writes		0			0.00			0.00
Rows			3949		1316.33			1
Fetches			0			0.00			0.00

#AFTER

Plan:
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT              |                           |       |       |     4 (100)|       |
|   1 |  DELETE                       | RHQ_CALLTIME_DATA_VALUE   |       |       |            |       |
|   2 |   INDEX RANGE SCAN            | CALL_KEY_BEG              |     1 |    29 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| RHQ_CALLTIME_DATA_KEY     |     1 |    80 |     1   (0)| 00:00:01 |
|   4 |     INDEX UNIQUE SCAN         | RHQ_CALLTIME_DATA_KEY_IDX |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Stats:
				Total		Per Execution	Per Row
Executions		104			1				3.85
CPU Time (sec)	13.08		0.13			0.48
Buffer Gets		2697502	25937.52		99907.48
Disk Reads		19			0.18			0.70
Direct Writes		0			0.00			0.00
Rows			27			0.26			1
Fetches			0			0.00			0.00
###
As you can see adding the indexes yielded in a lot less CPU time and a lot less buffer get operations. We are talking about at least an order of magnitude!

For the insert statement I have only stats confirming the performance *after* adding the indexes is satisfactory. But as you will see the execution plan for the query does not involve the new indexes. So I think the indexes will not affect the speed of the insert query as such, but overall DB peformance. Her come the stats:

###
Query: INSERT INTO RHQ_CALLTIME_DATA_VALUE(id, key_id, begin_time, end_time, minimum, maximum,
total, count) SELECT RHQ_calltime_data_value_id_seq.nextval, key.id, :1, :2, :3, :4, :5, :6
FROM RHQ_numbers num, RHQ_calltime_data_key key WHERE num.i = 42 AND key.id = (SELECT id FROM
RHQ_CALLTIME_DATA_KEY WHERE schedule_id = :7 AND call_destination = :8)

Plan:
------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                           | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT               |                                |       |       |     1 (100)|
|   1 |  SEQUENCE                      | RHQ_CALLTIME_DATA_VALUE_ID_SEQ |       |       |            |
|   2 |   NESTED LOOPS                 |                                |     1 |    15 |     0   (0)|
|   3 |    INDEX UNIQUE SCAN           | SYS_C0029575                   |     1 |     2 |     0   (0)|
|   4 |    INDEX UNIQUE SCAN           | SYS_C0029545                   |     1 |    13 |     0   (0)|
|   5 |     TABLE ACCESS BY INDEX ROWID| RHQ_CALLTIME_DATA_KEY          |     1 |  2028 |     0   (0)|
|   6 |      INDEX UNIQUE SCAN         | RHQ_CALLTIME_DATA_KEY_IDX      |     1 |       |     0   (0)|
------------------------------------------------------------------------------------------------------

Stats:
				Total	Per Execution	Per Row
Executions		21		1				0.00
CPU Time (sec)	4.63	0.22			0.00
Buffer Gets		633018	30143.71		20.75
Disk Reads		123		5.86			0.00
Direct Writes		0		0.00			0.00
Rows			30503	1452.52			1
Fetches			0		0.00			0.00
###
Comment 5 Charles Crouch 2009-08-20 12:17:09 EDT
Target at 1.3 for triage
Comment 6 Charles Crouch 2009-09-01 10:52:31 EDT
Moving features/improvements to 1.4
Comment 7 Red Hat Bugzilla 2009-11-10 16:02:48 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2356
Comment 8 wes hayutin 2010-02-16 12:09:58 EST
mass add of key word FutureFeature to help track
Comment 9 Heiko W. Rupp 2010-06-29 14:36:49 EDT
Index on end_time added in 1d53a6b on release-3.0.0

I've seen on PG 8.4, OS X Laptop the insert rate dropping from ~1million/h to ~600000/h which is still blasting fast. Also Postgres 'explain' figures show much better values for delete after the addition on the index:

Postgres explain functionlity suggests that just putting an index on
end_time will reduce delete time from the 
0... 317232 units range to the 0...60 range. For removal of ~1300 old rows
0... 317232  to 0...61926 for 1.5 million rows (data older than 12h)
0... 317232 to 249341 for 6 million rows (data older than 6h)

for 10.8 million rows (data older than 4h) PG switches to direct table access 

In reality with an insert rate of e.g. 100.000/h one can expect that the purge also deletes 100.000 entries each hour. With 31 days of data retention this would be less than 1% of the data, so just using the right data from the index is favorable.
Comment 10 Heiko W. Rupp 2010-06-30 02:22:35 EDT
After end_time index addition :

08:00:27,997 INFO  [DataPurgeJob] Calltime purged [1156544] - completed in [25926]ms

with around 13 million rows before purge.
Comment 11 Corey Welton 2010-07-19 14:22:11 EDT
QA Closing - perf env code change.
Comment 12 Corey Welton 2010-08-12 12:49:03 EDT
Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.