1554307 – [RFE] Spacewalk needs a way for the Spacewalk Adminstrator to manage Snapshots

Bug 1554307 - [RFE] Spacewalk needs a way for the Spacewalk Adminstrator to manage Snapshots

Summary: [RFE] Spacewalk needs a way for the Spacewalk Adminstrator to manage Snapshots

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite 5
Classification:	Red Hat
Component:	Server
Sub Component:
Version:	580
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Grant Gainey
QA Contact:	Radovan Drazny
Docs Contact:
URL:
Whiteboard:
Depends On:	1537766
Blocks:	sat58-errata
TreeView+	depends on / blocked

Reported:	2018-03-12 12:07 UTC by Tomáš Kašpárek
Modified:	2021-09-09 13:24 UTC (History)
CC List:	8 users (show)
Fixed In Version:	spacewalk-utils-2.5.1-31-sat
Doc Type:	Enhancement
Doc Text:	Feature: Add tooling to give the Satellite Administrator more control over Snapshots Reason: If enable_snapshots is set for a server, entries are made in the RHNSNAPSHOT* tables with every change to every server. Over time, these tables grow without bounds, with increasing impact on space and performance of the Satellite instance. The current tool for managing snapshots, sw-system-snapshot, is designed for a system-administrator to manage the snapshots associated with their systems. It has no way to 'see' ALL the snapshots, its use of the public API means it must be invoked with a login/password, and its use of absolute-timestamps, all make it less than useful as a tool for the Satellite Administrator. Result: This RFE introduces a new tool, spacewalk-manage-snapshots, that addresses these concerns. 'man spacewalk-manage-snapshots' for details.
Clone Of:	1537766
Environment:
Last Closed:	2018-05-15 21:46:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	1537766	0	None	None	None	2018-03-12 12:07:45 UTC
Red Hat Product Errata	RHEA-2018:1565	0	None	None	None	2018-05-15 21:47:03 UTC

Comment 3 Radovan Drazny 2018-04-24 13:42:34 UTC

Tested on spacewalk-utils-2.5.1-30.el6sat.noarch.

I have created a large number of host registrations, and then forced a creation of many snapshots by repeatedly removing and adding provisioning entitlement, and repeatedly installing and removing some small package on all registered hosts.

Displaying information about number of snaphosts works well, but I have encountered an issue while deleting old snapshots. The reproducer is basically this:

1. Have a constant stream of operations creating new snapshots on the Satellite server, so there is a new snapshot(s) created every second.

2. Come back next day, and run a report with 1 day interval to make sure there are snapshots older than 1 day:

$ spacewalk-manage-snapshots -r -i 1

Table name : rows
rhnsnapshot : 80294
rhnsnapshotchannel : 80294
rhnsnapshotconfigchannel : 0
rhnsnapshotconfigrevision : 0
rhnsnapshotinvalidreason : 6
rhnsnapshotpackage : 22727553
rhnsnapshotservergroup : 160586
rhnsnapshottag : 0

: Snapshot info, 1-day interval :
: age(days) : systems : snapshots :
: 1-1 : 2147 : 73121 :
: 2-2 : 1775 : 7173 :

If you run the same report again after a few seconds or minutes, you will see that while the total number of snapshots is same (if there is no snapshot creating operation running, of course), the info for 1-day interval has changed, as your 1 day cut-off line has moved with you in the time and there is more snapshots older than one day:

$ spacewalk-manage-snapshots -r -i 1

: Snapshot info, 1-day interval :
: age(days) : systems : snapshots :
: 1-1 : 2147 : 71715 :
: 2-2 : 1775 : 8579 :

So far so good. Now try to delete snapshots older than one day:

$ spacewalk-manage-snapshots -d 1 -b 1000
Deleting snapshots older than 1 days
80294 snapshots currently
11042 snapshots to be deleted, 1000 per commit
... 11042 snapshots left to purge
... 10050 snapshots left to purge
... 9057 snapshots left to purge
... 8063 snapshots left to purge
... 7069 snapshots left to purge
... 6076 snapshots left to purge
... 5081 snapshots left to purge
... 4087 snapshots left to purge
... 3094 snapshots left to purge
... 2104 snapshots left to purge
... 1110 snapshots left to purge
... 119 snapshots left to purge
... 3 snapshots left to purge
... 3 snapshots left to purge
... 3 snapshots left to purge
... 2 snapshots left to purge
... 2 snapshots left to purge
... 4 snapshots left to purge
... 2 snapshots left to purge
... 3 snapshots left to purge
... 3 snapshots left to purge
... 2 snapshots left to purge
... 4 snapshots left to purge
... 1 snapshots left to purge
... 4 snapshots left to purge
... 2 snapshots left to purge
... 3 snapshots left to purge
... 3 snapshots left to purge
... 3 snapshots left to purge
... 2 snapshots left to purge
... 2 snapshots left to purge
... 3 snapshots left to purge
... 2 snapshots left to purge
... 4 snapshots left to purge
... 1 snapshots left to purge
... 2 snapshots left to purge
... 2 snapshots left to purge
<...>

...and so on, you have to stop the command manually.

My wild guess is that the tool counts the number of snapshots at the beginning, then starts deleting them in batches specified by the "-b" option, as expected. After finishing with each single batch, it checks for snapshots older than the specified age, and gathers them to the next batch to be deleted. Problem is, the time used to determine the age of snapshots is the one at the end of processing of the previous batch, not the one that was used for the initial summary at the start of the tool run. And because there was a constant trickle of new snapshots, there is always a few new snapshots that crossed the one-day boundary during the processing of the previous batch, and should be deleted, so we delete them, and while we delete them, a few more cross the one-day boundary, so let's get the new batch created, etc...you get the idea.

Of course, if there was a long enough break in the stream of snapshots to process the last batch without new snapshots crossing the cut-off boundary, the command would successfully exit, so this is rather extreme corner case. But on a busy Satellite server, or when accidentally selecting cut-off line in the middle of a busy patching day, this could cause problems, or at least unpleasant surprise for the admin.

Comment 5 Grant Gainey 2018-04-25 15:08:41 UTC

spacewalk.github:
439bbadb0ef583a6ff917c897997002750b2355d

Comment 8 Radovan Drazny 2018-04-27 15:29:18 UTC

When running a report and then delete right after it, there is quite a discrepancy between the number of snaps older than 1 day in report and later in number of snapshots older than 1 day to delete. Shouldn't be these two numbers same? See two following examples:

[root@host-8-248-201 rpms]# spacewalk-manage-snapshots -r -i 1

                Table name :         rows
               rhnsnapshot :        95750
        rhnsnapshotchannel :       191500
  rhnsnapshotconfigchannel :        95750
 rhnsnapshotconfigrevision :        95750
  rhnsnapshotinvalidreason :            6
        rhnsnapshotpackage :     27623850
    rhnsnapshotservergroup :       191500
            rhnsnapshottag :            0

:    Snapshot info, 1-day interval     :
: age(days) :   systems :    snapshots :
:    1-1    :       100 :        84657 :
:    2-2    :       100 :        11093 :
[root@host-8-248-201 rpms]# spacewalk-manage-snapshots -d 1 -b 1000
Deleting snapshots older than 1 days
       95750 snapshots currently
       12353 snapshots to be deleted, 1000 per commit
...       12353 snapshots left to purge
...       11353 snapshots left to purge
...       10353 snapshots left to purge
...        9353 snapshots left to purge
...        8353 snapshots left to purge
...        7353 snapshots left to purge
...        6353 snapshots left to purge
...        5353 snapshots left to purge
...        4353 snapshots left to purge
...        3353 snapshots left to purge
...        2353 snapshots left to purge
...        1353 snapshots left to purge
...         353 snapshots left to purge
       83397 snapshots remain

-----------------------------------------------------------------------
[root@host-8-248-201 rpms]# spacewalk-manage-snapshots -r -i 1

                Table name :         rows
               rhnsnapshot :        98651
        rhnsnapshotchannel :       197302
  rhnsnapshotconfigchannel :        98651
 rhnsnapshotconfigrevision :        98651
  rhnsnapshotinvalidreason :            6
        rhnsnapshotpackage :     28460788
    rhnsnapshotservergroup :       197302
            rhnsnapshottag :            0

:    Snapshot info, 1-day interval     :
: age(days) :   systems :    snapshots :
:    1-1    :       100 :        96600 :
:    2-2    :       100 :         2051 :
[root@host-8-248-201 rpms]# spacewalk-manage-snapshots -d 1 -b 100
Deleting snapshots older than 1 days
       98651 snapshots currently
        2901 snapshots to be deleted, 100 per commit
...        2901 snapshots left to purge
...        2801 snapshots left to purge
...        2701 snapshots left to purge
...        2601 snapshots left to purge
...        2501 snapshots left to purge
...        2401 snapshots left to purge
...        2301 snapshots left to purge
...        2201 snapshots left to purge
...        2101 snapshots left to purge
...        2001 snapshots left to purge
...        1901 snapshots left to purge
...        1801 snapshots left to purge
...        1701 snapshots left to purge
...        1601 snapshots left to purge
...        1501 snapshots left to purge
...        1401 snapshots left to purge
...        1301 snapshots left to purge
...        1201 snapshots left to purge
...        1101 snapshots left to purge
...        1001 snapshots left to purge
...         901 snapshots left to purge
...         801 snapshots left to purge
...         701 snapshots left to purge
...         601 snapshots left to purge
...         501 snapshots left to purge
...         401 snapshots left to purge
...         301 snapshots left to purge
...         201 snapshots left to purge
...         101 snapshots left to purge
...           1 snapshots left to purge
       95750 snapshots remain

Comment 9 Grant Gainey 2018-04-27 18:26:19 UTC

The difference is because you were creating a lot of snapshots very very quickly, and 'in the last day' is governed by "within the last 24 hours' worth of milliseconds" - so the "purge older than" window slides forward between "-r -i 1" and "-d 1 -b 100", and several hundred more snapshots become eligible.

On the reproducing system, you can see this if you just keep re-running the "how many should I delete boss?" query against the DB:

rhnschema=# select count(ss.id) from rhnsnapshot ss where ss.created < (current_timestamp - numtodsinterval(1,  'day'));
 count 
-------
 19646
(1 row)

rhnschema=# select count(ss.id) from rhnsnapshot ss where ss.created < (current_timestamp - numtodsinterval(1,  'day'));
 count 
-------
 19669
(1 row)

rhnschema=# select count(ss.id) from rhnsnapshot ss where ss.created < (current_timestamp - numtodsinterval(1,  'day'));
 count 
-------
 19677
(1 row)

rhnschema=# select count(ss.id) from rhnsnapshot ss where ss.created < (current_timestamp - numtodsinterval(1,  'day'));
 count 
-------
 19697
(1 row)


So just hitting up-arrow/enter as fast as I could made the count change from 19646 to 19697.

Working as intended, I believe.

Comment 10 Radovan Drazny 2018-04-30 10:17:10 UTC

Ok then. Verified on spacewalk-utils-2.5.1-31.

Comment 13 errata-xmlrpc 2018-05-15 21:46:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1565

Note You need to log in before you can comment on or make changes to this bug.