642799 – Unable to delete systems or probes with large # of rows (e.g., 180 million) in time_series table

Bug 642799 - Unable to delete systems or probes with large # of rows (e.g., 180 million) in time_series table

Summary: Unable to delete systems or probes with large # of rows (e.g., 180 million) i...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat Satellite 5
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	530
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Milan Zázrivec
QA Contact:	Jan Hutař
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	462714
TreeView+	depends on / blocked

Reported:	2010-10-13 20:33 UTC by Xixi
Modified:	2018-11-27 21:51 UTC (History)
CC List:	6 users (show)
Fixed In Version:	spacewalk-schema-1.7.55-31 spacewalk-java-1.7.54-90
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	651383 (view as bug list)
Environment:
Last Closed:	2015-07-13 09:17:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Xixi 2010-10-13 20:33:38 UTC

Description of problem:
On a Satellite with Monitoring usage and large amounts of monitoring data (e.g., 180 million rows in time_series table for this customer case), deleting a Monitoring enabled system or a Probe becomes virtually impossible as it would take hours and hours if it actually ever completes. Customer has a couple hundred systems and over 2 thousand probes and make extensive use of Monitoring functionality.

Per discussion with Engineering (Milan), the root of the issue is the sheer number of rows in time_series table. The function-based index which was created for satellite 5.3 as fix for Bug 494064 won't help here at all - in fact, while it speeds things up for general use cases, when in the specific situation of deleting very many rows it's actually slightly slower than before the fix (see https://bugzilla.redhat.com/show_bug.cgi?id=494064#c32). This is because when deleting large portion of data from the time_series of table, there's a high probability the data to delete is scattered all around the table, so you're modifying pretty much every db block - and with the index in place, you not only have to scan through all blocks of the table but also through all blocks of the index.

This BZ is for the "real", long-term solution, which is to be decided. For example, making deletion asynchronous since speed is the primary concern? To have monitoring data deletion (or system deletion as a whole) performed asynchronously in the background as a taskomatic or other daemon task.

Version-Release number of selected component (if applicable):
Red Hat Network (RHN) Satellite 5.3.0

How reproducible:
Always.

Steps to Reproduce:
1. Have a satellite with ~180 million rows in time_series (customer db dump available).
2. Truncate snapshots (since that's also a factor in this particular case, so just getting it out of the picture).
3. Try to delete a Monitoring-enabled system profile;
4. Try to delete a Monitoring probe.

Actual results:
Neither 3 or 4 completes anytime in foreseeable future (from 10 minutes to 14 hrs if ever).

Expected results:
3 and 4 completes in reasonable time - like at least under a minute.

Additional info:
Related bugs for reference -
Bug 461704 - Monitoring probes data in time_series table are not cleaned up when probes/system profiles are deleted
Bug 494064 - Satellite: performance issue in procedure remove_server_entitlement (seen on satellite with lots of monitoring data)

Comment 5 Milan Zázrivec 2012-02-15 09:39:31 UTC

FYI, in current Spacewalk nightly (soon to be Spacewalk 1.7) we implemented
fully asynchronous deletion of monitoring data.

In particular, we re-structured time_series table to be normalized (foreign
keys, no function based indexes) while allowing fully asynchronous deletion
of monitoring data.

This would mean that for example probe or server deletion would not block
until all the monitoring data from time_series would be all cleared up, but
the monitoring data would be left in place and be cleared later (and
automatically) by a dedicated taskomatic job.

Note You need to log in before you can comment on or make changes to this bug.