Tested on spacewalk-utils-2.5.1-30.el6sat.noarch. I have created a large number of host registrations, and then forced a creation of many snapshots by repeatedly removing and adding provisioning entitlement, and repeatedly installing and removing some small package on all registered hosts. Displaying information about number of snaphosts works well, but I have encountered an issue while deleting old snapshots. The reproducer is basically this: 1. Have a constant stream of operations creating new snapshots on the Satellite server, so there is a new snapshot(s) created every second. 2. Come back next day, and run a report with 1 day interval to make sure there are snapshots older than 1 day: $ spacewalk-manage-snapshots -r -i 1 Table name : rows rhnsnapshot : 80294 rhnsnapshotchannel : 80294 rhnsnapshotconfigchannel : 0 rhnsnapshotconfigrevision : 0 rhnsnapshotinvalidreason : 6 rhnsnapshotpackage : 22727553 rhnsnapshotservergroup : 160586 rhnsnapshottag : 0 : Snapshot info, 1-day interval : : age(days) : systems : snapshots : : 1-1 : 2147 : 73121 : : 2-2 : 1775 : 7173 : If you run the same report again after a few seconds or minutes, you will see that while the total number of snapshots is same (if there is no snapshot creating operation running, of course), the info for 1-day interval has changed, as your 1 day cut-off line has moved with you in the time and there is more snapshots older than one day: $ spacewalk-manage-snapshots -r -i 1 Table name : rows rhnsnapshot : 80294 rhnsnapshotchannel : 80294 rhnsnapshotconfigchannel : 0 rhnsnapshotconfigrevision : 0 rhnsnapshotinvalidreason : 6 rhnsnapshotpackage : 22727553 rhnsnapshotservergroup : 160586 rhnsnapshottag : 0 : Snapshot info, 1-day interval : : age(days) : systems : snapshots : : 1-1 : 2147 : 71715 : : 2-2 : 1775 : 8579 : So far so good. Now try to delete snapshots older than one day: $ spacewalk-manage-snapshots -d 1 -b 1000 Deleting snapshots older than 1 days 80294 snapshots currently 11042 snapshots to be deleted, 1000 per commit ... 11042 snapshots left to purge ... 10050 snapshots left to purge ... 9057 snapshots left to purge ... 8063 snapshots left to purge ... 7069 snapshots left to purge ... 6076 snapshots left to purge ... 5081 snapshots left to purge ... 4087 snapshots left to purge ... 3094 snapshots left to purge ... 2104 snapshots left to purge ... 1110 snapshots left to purge ... 119 snapshots left to purge ... 3 snapshots left to purge ... 3 snapshots left to purge ... 3 snapshots left to purge ... 2 snapshots left to purge ... 2 snapshots left to purge ... 4 snapshots left to purge ... 2 snapshots left to purge ... 3 snapshots left to purge ... 3 snapshots left to purge ... 2 snapshots left to purge ... 4 snapshots left to purge ... 1 snapshots left to purge ... 4 snapshots left to purge ... 2 snapshots left to purge ... 3 snapshots left to purge ... 3 snapshots left to purge ... 3 snapshots left to purge ... 2 snapshots left to purge ... 2 snapshots left to purge ... 3 snapshots left to purge ... 2 snapshots left to purge ... 4 snapshots left to purge ... 1 snapshots left to purge ... 2 snapshots left to purge ... 2 snapshots left to purge <...> ...and so on, you have to stop the command manually. My wild guess is that the tool counts the number of snapshots at the beginning, then starts deleting them in batches specified by the "-b" option, as expected. After finishing with each single batch, it checks for snapshots older than the specified age, and gathers them to the next batch to be deleted. Problem is, the time used to determine the age of snapshots is the one at the end of processing of the previous batch, not the one that was used for the initial summary at the start of the tool run. And because there was a constant trickle of new snapshots, there is always a few new snapshots that crossed the one-day boundary during the processing of the previous batch, and should be deleted, so we delete them, and while we delete them, a few more cross the one-day boundary, so let's get the new batch created, etc...you get the idea. Of course, if there was a long enough break in the stream of snapshots to process the last batch without new snapshots crossing the cut-off boundary, the command would successfully exit, so this is rather extreme corner case. But on a busy Satellite server, or when accidentally selecting cut-off line in the middle of a busy patching day, this could cause problems, or at least unpleasant surprise for the admin.
spacewalk.github: 439bbadb0ef583a6ff917c897997002750b2355d
When running a report and then delete right after it, there is quite a discrepancy between the number of snaps older than 1 day in report and later in number of snapshots older than 1 day to delete. Shouldn't be these two numbers same? See two following examples: [root@host-8-248-201 rpms]# spacewalk-manage-snapshots -r -i 1 Table name : rows rhnsnapshot : 95750 rhnsnapshotchannel : 191500 rhnsnapshotconfigchannel : 95750 rhnsnapshotconfigrevision : 95750 rhnsnapshotinvalidreason : 6 rhnsnapshotpackage : 27623850 rhnsnapshotservergroup : 191500 rhnsnapshottag : 0 : Snapshot info, 1-day interval : : age(days) : systems : snapshots : : 1-1 : 100 : 84657 : : 2-2 : 100 : 11093 : [root@host-8-248-201 rpms]# spacewalk-manage-snapshots -d 1 -b 1000 Deleting snapshots older than 1 days 95750 snapshots currently 12353 snapshots to be deleted, 1000 per commit ... 12353 snapshots left to purge ... 11353 snapshots left to purge ... 10353 snapshots left to purge ... 9353 snapshots left to purge ... 8353 snapshots left to purge ... 7353 snapshots left to purge ... 6353 snapshots left to purge ... 5353 snapshots left to purge ... 4353 snapshots left to purge ... 3353 snapshots left to purge ... 2353 snapshots left to purge ... 1353 snapshots left to purge ... 353 snapshots left to purge 83397 snapshots remain ----------------------------------------------------------------------- [root@host-8-248-201 rpms]# spacewalk-manage-snapshots -r -i 1 Table name : rows rhnsnapshot : 98651 rhnsnapshotchannel : 197302 rhnsnapshotconfigchannel : 98651 rhnsnapshotconfigrevision : 98651 rhnsnapshotinvalidreason : 6 rhnsnapshotpackage : 28460788 rhnsnapshotservergroup : 197302 rhnsnapshottag : 0 : Snapshot info, 1-day interval : : age(days) : systems : snapshots : : 1-1 : 100 : 96600 : : 2-2 : 100 : 2051 : [root@host-8-248-201 rpms]# spacewalk-manage-snapshots -d 1 -b 100 Deleting snapshots older than 1 days 98651 snapshots currently 2901 snapshots to be deleted, 100 per commit ... 2901 snapshots left to purge ... 2801 snapshots left to purge ... 2701 snapshots left to purge ... 2601 snapshots left to purge ... 2501 snapshots left to purge ... 2401 snapshots left to purge ... 2301 snapshots left to purge ... 2201 snapshots left to purge ... 2101 snapshots left to purge ... 2001 snapshots left to purge ... 1901 snapshots left to purge ... 1801 snapshots left to purge ... 1701 snapshots left to purge ... 1601 snapshots left to purge ... 1501 snapshots left to purge ... 1401 snapshots left to purge ... 1301 snapshots left to purge ... 1201 snapshots left to purge ... 1101 snapshots left to purge ... 1001 snapshots left to purge ... 901 snapshots left to purge ... 801 snapshots left to purge ... 701 snapshots left to purge ... 601 snapshots left to purge ... 501 snapshots left to purge ... 401 snapshots left to purge ... 301 snapshots left to purge ... 201 snapshots left to purge ... 101 snapshots left to purge ... 1 snapshots left to purge 95750 snapshots remain
The difference is because you were creating a lot of snapshots very very quickly, and 'in the last day' is governed by "within the last 24 hours' worth of milliseconds" - so the "purge older than" window slides forward between "-r -i 1" and "-d 1 -b 100", and several hundred more snapshots become eligible. On the reproducing system, you can see this if you just keep re-running the "how many should I delete boss?" query against the DB: rhnschema=# select count(ss.id) from rhnsnapshot ss where ss.created < (current_timestamp - numtodsinterval(1, 'day')); count ------- 19646 (1 row) rhnschema=# select count(ss.id) from rhnsnapshot ss where ss.created < (current_timestamp - numtodsinterval(1, 'day')); count ------- 19669 (1 row) rhnschema=# select count(ss.id) from rhnsnapshot ss where ss.created < (current_timestamp - numtodsinterval(1, 'day')); count ------- 19677 (1 row) rhnschema=# select count(ss.id) from rhnsnapshot ss where ss.created < (current_timestamp - numtodsinterval(1, 'day')); count ------- 19697 (1 row) So just hitting up-arrow/enter as fast as I could made the count change from 19646 to 19697. Working as intended, I believe.
Ok then. Verified on spacewalk-utils-2.5.1-31.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1565