535686 – (RHQ-2356) Oracle Database need 20h to complete DELETE statement on call-time data

Bug 535686 (RHQ-2356) - Oracle Database need 20h to complete DELETE statement on call-time data

Summary: Oracle Database need 20h to complete DELETE statement on call-time data

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	RHQ-2356
Product:	RHQ Project
Classification:	Other
Component:	Performance
Sub Component:
Version:	unspecified
Hardware:	All
OS:	All
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Heiko W. Rupp
QA Contact:
Docs Contact:
URL:	http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks:	jon-calltime
TreeView+	depends on / blocked

Reported:	2009-08-14 14:50 UTC by Frank Brueseke
Modified:	2010-08-12 16:49 UTC (History)
CC List:	3 users (show)
Fixed In Version:	2.4
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:	RHQ 1.2, Oracle 10g on a Linux Box; Agent on a Solaris server
Last Closed:	2010-08-12 16:49:03 UTC
Embargoed:

Attachments	(Terms of Use)

Description Frank Brueseke 2009-08-14 14:50:00 UTC

The database needs an impossible amount of time to clean up its call-time data tables if the call-time data tables get big (more than 500000 entries). We observed such a delete statement that took 20 hours to complete! What's more, it pratically disabled the machine using up 100% CPU and caused the RHQ server to produce all kinds of DB errors (failed inserts, deletes, timed out transactions, etc.).

Our solution is to install the following indexes:
CREATE INDEX PERMEO.CALL_DAT_END ON PERMEO.RHQ_CALLTIME_DATA_VALUE (END_TIME) COMPRESS 1 TABLESPACE BASIDX;
CREATE INDEX PERMEO.CALL_KEY_BEG ON PERMEO.RHQ_CALLTIME_DATA_VALUE (KEY_ID, BEGIN_TIME) COMPRESS 1 TABLESPACE BASIDX ONLINE;

In our case everything works like charm now (same test case but CPU <5%).
Please include this fix in the next version of RHQ!

Also note that I have seen a similar thing happening using Postgres. I suggest to include similar indexes for Postgres as well.

Comment 1 John Mazzitelli 2009-08-14 16:39:22 UTC

Frank - what was the size of your calltime data?  Row counts and all that.

Comment 2 John Mazzitelli 2009-08-14 16:40:19 UTC

and adding those indexes - did you notice any degredation of performance on insert?  if so, can you quantify that?

Comment 3 Frank Brueseke 2009-08-19 06:58:03 UTC

I think at the time we added the INDEX we had about 900.000 rows in the call-time value table. Now after some more test runs we have 4.843.626 rows.

Actually we didn't check the performance of the insert statement before and after. I remember that we noticed that the insert statement dominated the resource usage statistics in Oracle after adding the indexes. But that is no wonder as I was running a test case for data collection and my RHQ plugin basically consists of call-time data metrics. We didn't investigate further as the box was near idle and thus the performance was satisfactory for us. 

I will ask a college to have a look at the Oracle performance tool. We can give you some stats for the insert statement execution after the indexes where added.

Comment 4 Frank Brueseke 2009-08-20 09:17:18 UTC

Ok here are some DB-stats for you. Formatting in pure text is hard, but I try my best. First for the delete statement:
####
Query: DELETE FROM RHQ_CALLTIME_DATA_VALUE WHERE key_id = (SELECT id FROM RHQ_CALLTIME_DATA_KEY WHERE
schedule_id = :1 AND call_destination = :2) AND begin_time = :3

#BEFORE:

Plan:
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT              |                           |       |       |     2 (100)|       |
|   1 |  DELETE                       | RHQ_CALLTIME_DATA_VALUE   |       |       |            |       |
|   2 |   TABLE ACCESS FULL           | RHQ_CALLTIME_DATA_VALUE   |     1 |    39 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| RHQ_CALLTIME_DATA_KEY     |     1 |  2028 |     0   (0)|       |
|   4 |     INDEX UNIQUE SCAN         | RHQ_CALLTIME_DATA_KEY_IDX |     1 |       |     0   (0)|       |
-----------------------------------------------------------------------------------------------------------
Please note that this involves a full table scan!

Stats, i.e Resource usage for the query:
				Total		Per Execution	Per Row
Executions			3			1			0.00
CPU Time (sec)	7066.83		2355.61			1.79
Buffer Gets		377280523	125760174.33	95538.24
Disk Reads		1815		605.00			0.46
Direct Writes		0			0.00			0.00
Rows			3949		1316.33			1
Fetches			0			0.00			0.00

#AFTER

Plan:
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT              |                           |       |       |     4 (100)|       |
|   1 |  DELETE                       | RHQ_CALLTIME_DATA_VALUE   |       |       |            |       |
|   2 |   INDEX RANGE SCAN            | CALL_KEY_BEG              |     1 |    29 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| RHQ_CALLTIME_DATA_KEY     |     1 |    80 |     1   (0)| 00:00:01 |
|   4 |     INDEX UNIQUE SCAN         | RHQ_CALLTIME_DATA_KEY_IDX |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Stats:
				Total		Per Execution	Per Row
Executions		104			1				3.85
CPU Time (sec)	13.08		0.13			0.48
Buffer Gets		2697502	25937.52		99907.48
Disk Reads		19			0.18			0.70
Direct Writes		0			0.00			0.00
Rows			27			0.26			1
Fetches			0			0.00			0.00
###
As you can see adding the indexes yielded in a lot less CPU time and a lot less buffer get operations. We are talking about at least an order of magnitude!

For the insert statement I have only stats confirming the performance *after* adding the indexes is satisfactory. But as you will see the execution plan for the query does not involve the new indexes. So I think the indexes will not affect the speed of the insert query as such, but overall DB peformance. Her come the stats:

###
Query: INSERT INTO RHQ_CALLTIME_DATA_VALUE(id, key_id, begin_time, end_time, minimum, maximum,
total, count) SELECT RHQ_calltime_data_value_id_seq.nextval, key.id, :1, :2, :3, :4, :5, :6
FROM RHQ_numbers num, RHQ_calltime_data_key key WHERE num.i = 42 AND key.id = (SELECT id FROM
RHQ_CALLTIME_DATA_KEY WHERE schedule_id = :7 AND call_destination = :8)

Plan:
------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                           | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT               |                                |       |       |     1 (100)|
|   1 |  SEQUENCE                      | RHQ_CALLTIME_DATA_VALUE_ID_SEQ |       |       |            |
|   2 |   NESTED LOOPS                 |                                |     1 |    15 |     0   (0)|
|   3 |    INDEX UNIQUE SCAN           | SYS_C0029575                   |     1 |     2 |     0   (0)|
|   4 |    INDEX UNIQUE SCAN           | SYS_C0029545                   |     1 |    13 |     0   (0)|
|   5 |     TABLE ACCESS BY INDEX ROWID| RHQ_CALLTIME_DATA_KEY          |     1 |  2028 |     0   (0)|
|   6 |      INDEX UNIQUE SCAN         | RHQ_CALLTIME_DATA_KEY_IDX      |     1 |       |     0   (0)|
------------------------------------------------------------------------------------------------------

Stats:
				Total	Per Execution	Per Row
Executions		21		1				0.00
CPU Time (sec)	4.63	0.22			0.00
Buffer Gets		633018	30143.71		20.75
Disk Reads		123		5.86			0.00
Direct Writes		0		0.00			0.00
Rows			30503	1452.52			1
Fetches			0		0.00			0.00
###

Comment 5 Charles Crouch 2009-08-20 16:17:09 UTC

Target at 1.3 for triage

Comment 6 Charles Crouch 2009-09-01 14:52:31 UTC

Moving features/improvements to 1.4

Comment 7 Red Hat Bugzilla 2009-11-10 21:02:48 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2356

Comment 8 wes hayutin 2010-02-16 17:09:58 UTC

mass add of key word FutureFeature to help track

Comment 9 Heiko W. Rupp 2010-06-29 18:36:49 UTC

Index on end_time added in 1d53a6b on release-3.0.0

I've seen on PG 8.4, OS X Laptop the insert rate dropping from ~1million/h to ~600000/h which is still blasting fast. Also Postgres 'explain' figures show much better values for delete after the addition on the index:

Postgres explain functionlity suggests that just putting an index on
end_time will reduce delete time from the 
0... 317232 units range to the 0...60 range. For removal of ~1300 old rows
0... 317232  to 0...61926 for 1.5 million rows (data older than 12h)
0... 317232 to 249341 for 6 million rows (data older than 6h)

for 10.8 million rows (data older than 4h) PG switches to direct table access 

In reality with an insert rate of e.g. 100.000/h one can expect that the purge also deletes 100.000 entries each hour. With 31 days of data retention this would be less than 1% of the data, so just using the right data from the index is favorable.

Comment 10 Heiko W. Rupp 2010-06-30 06:22:35 UTC

After end_time index addition :

08:00:27,997 INFO  [DataPurgeJob] Calltime purged [1156544] - completed in [25926]ms

with around 13 million rows before purge.

Comment 11 Corey Welton 2010-07-19 18:22:11 UTC

QA Closing - perf env code change.

Comment 12 Corey Welton 2010-08-12 16:49:03 UTC

Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.