Red Hat Bugzilla – Bug 642799
Unable to delete systems or probes with large # of rows (e.g., 180 million) in time_series table
Last modified: 2015-07-13 05:17:56 EDT
Description of problem:
On a Satellite with Monitoring usage and large amounts of monitoring data (e.g., 180 million rows in time_series table for this customer case), deleting a Monitoring enabled system or a Probe becomes virtually impossible as it would take hours and hours if it actually ever completes. Customer has a couple hundred systems and over 2 thousand probes and make extensive use of Monitoring functionality.
Per discussion with Engineering (Milan), the root of the issue is the sheer number of rows in time_series table. The function-based index which was created for satellite 5.3 as fix for Bug 494064 won't help here at all - in fact, while it speeds things up for general use cases, when in the specific situation of deleting very many rows it's actually slightly slower than before the fix (see https://bugzilla.redhat.com/show_bug.cgi?id=494064#c32). This is because when deleting large portion of data from the time_series of table, there's a high probability the data to delete is scattered all around the table, so you're modifying pretty much every db block - and with the index in place, you not only have to scan through all blocks of the table but also through all blocks of the index.
This BZ is for the "real", long-term solution, which is to be decided. For example, making deletion asynchronous since speed is the primary concern? To have monitoring data deletion (or system deletion as a whole) performed asynchronously in the background as a taskomatic or other daemon task.
Version-Release number of selected component (if applicable):
Red Hat Network (RHN) Satellite 5.3.0
Steps to Reproduce:
1. Have a satellite with ~180 million rows in time_series (customer db dump available).
2. Truncate snapshots (since that's also a factor in this particular case, so just getting it out of the picture).
3. Try to delete a Monitoring-enabled system profile;
4. Try to delete a Monitoring probe.
Neither 3 or 4 completes anytime in foreseeable future (from 10 minutes to 14 hrs if ever).
3 and 4 completes in reasonable time - like at least under a minute.
Related bugs for reference -
Bug 461704 - Monitoring probes data in time_series table are not cleaned up when probes/system profiles are deleted
Bug 494064 - Satellite: performance issue in procedure remove_server_entitlement (seen on satellite with lots of monitoring data)
FYI, in current Spacewalk nightly (soon to be Spacewalk 1.7) we implemented
fully asynchronous deletion of monitoring data.
In particular, we re-structured time_series table to be normalized (foreign
keys, no function based indexes) while allowing fully asynchronous deletion
of monitoring data.
This would mean that for example probe or server deletion would not block
until all the monitoring data from time_series would be all cleared up, but
the monitoring data would be left in place and be cleared later (and
automatically) by a dedicated taskomatic job.