Bug 1609928 - Pulp monthly maintenance not being ran
Summary: Pulp monthly maintenance not being ran
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: jcallaha
URL:
Whiteboard:
Depends On:
Blocks: 1628782
TreeView+ depends on / blocked
 
Reported: 2018-07-30 20:42 UTC by Mike McCune
Modified: 2021-04-06 17:48 UTC (History)
14 users (show)

Fixed In Version: pulp-2.13.4.11-2,pulp-2.13.4.12-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1612964 1628782 (view as bug list)
Environment:
Last Closed: 2018-08-22 20:07:12 UTC
Target Upstream Version:


Attachments (Terms of Use)
rpm -qa output from customer environment (10.00 KB, application/x-gzip)
2018-08-01 16:08 UTC, Mike McCune
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 3887 0 High CLOSED - CURRENTRELEASE Restarting pulp_celerybeat weekly or so causes pulp.server.maintenance.monthly to not get scheduled 2018-09-18 18:32:01 UTC
Pulp Redmine 3893 0 High CLOSED - CURRENTRELEASE pulp.server.maintenance.monthly fails if applicability collection is too large 2018-10-12 07:01:26 UTC
Red Hat Product Errata RHBA-2018:2550 0 None None None 2018-08-22 20:07:18 UTC

Description Mike McCune 2018-07-30 20:42:29 UTC
We noticed that during investigation of https://bugzilla.redhat.com/show_bug.cgi?id=1573892 that there was a large amount of orphan applicability profile data (in some cases up to 90% of all applicability profile data was orphaned).

Did some log analysis and no Satellite that we had access to with 30+ days of logs had a single entry in the journal indicating the pulp.server.maintenance.monthly.monthly_maintenance call is being fired.

We need to ensure that not only is this maintenance routine being executed but we have the ability to run it more often than every 30 days.

If you examine celery, there is nothing listed as scheduled, which I'd expect to see ( may not be inspecting the right thing):

 celery -A pulp.server.async.app  inspect scheduled
-> reserved_resource_worker-7@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -
-> resource_manager@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -
-> reserved_resource_worker-1@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -
-> reserved_resource_worker-3@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -
-> reserved_resource_worker-5@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -
-> reserved_resource_worker-4@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -
-> reserved_resource_worker-2@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -
-> reserved_resource_worker-0@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -
-> reserved_resource_worker-6@sat-r220-07.lab.eng.rdu2.redhat.com: OK
    - empty -

Comment 1 Mike McCune 2018-07-30 20:46:50 UTC
You can kick off the call from your terminal:


# celery -A pulp.server.async.app call pulp.server.maintenance.monthly.queue_monthly_maintenance

note, this can be very slow on large Satellites with millions of rows in db.repo_profile_applicability

Comment 3 pulp-infra@redhat.com 2018-07-31 08:33:21 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 4 pulp-infra@redhat.com 2018-07-31 08:33:23 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 5 Mike McCune 2018-07-31 13:54:58 UTC
On a pulp database with 60k consumers we had 1.5 million entries in db.repo_profile_applicability:

> db.repo_profile_applicability.find().size()

1,586,607

> db.consumers.find().size()

59,248


At this state, Bulk RegenerateApplicability one of the larger repos took about 66 minutes to complete 

I ran the montly job via:

# celery -A pulp.server.async.app call pulp.server.maintenance.monthly.queue_monthly_maintenance

This cleanup took 9.5 hours and brought repo_profile_applicability down to 56088 items.

After this cleanup, I re-ran the RegenerateApplicability and it completed in 23 minutes. 

This is a huge improvement, and will make a big difference on customers who have large #s of repos and consumers when we need to run multiple RegenerateApplicability calls.

Comment 7 pulp-infra@redhat.com 2018-07-31 15:03:40 UTC
The Pulp upstream bug priority is at High. Updating the external tracker on this bug.

Comment 8 Brian Bouterse 2018-08-01 15:53:48 UTC
In terms of the symptom that the maintainence task is not being run, it is likely not an easyfix issue because it's the integration code between Celery and pulp_celerybeat. It would be helpful to have an `rpm -qa` output from one of these environments.

Regarding the 16mb cap error in Comment 6, that is an issue with the task code itself, so that is a different root cause entirely.

Comment 9 Mike McCune 2018-08-01 16:04:21 UTC
Brian, any reason we don't just set up a weekly (or monthly) os level cron job and skip celery scheduling of this?

Will get an 'rpm -qa' as well.

Comment 10 Mike McCune 2018-08-01 16:08:59 UTC
Created attachment 1472150 [details]
rpm -qa output from customer environment

Comment 11 Brian Bouterse 2018-08-01 16:12:23 UTC
That sounds like a great idea, but the issue is that you have to dispatch that specific task to the tasking system and there is no API endpoint that Pulp has that can do that.

A variation on that idea is to write a script though that could import the tasking code and call apply_async_with_reservation on it which would cause the dispatch to occur. Cron could call that script.

Comment 12 Mike McCune 2018-08-01 16:19:07 UTC
so calling from the shell:

# celery -A pulp.server.async.app call pulp.server.maintenance.monthly.queue_monthly_maintenance

is not sufficient? This definitely seemed to initiate the job, watching the journal as well as various collections in the database shrink while it was being run:

Jul 30 13:42:34 sat-r220-07.lab.eng.rdu2.redhat.com pulp[24857]: celery.worker.strategy:INFO: Received task: pulp.server.maintenance.monthly.monthly_maintenance[486008fb-dac9-4698-bb50-6b79b510dba1]


> db.repo_profile_applicability.count()
5735477
> db.repo_profile_applicability.count()
4984406
> db.repo_profile_applicability.count()
4845218
> db.repo_profile_applicability.count()
4767190
> db.repo_profile_applicability.count()
4762954
> db.repo_profile_applicability.count()
2683751
> db.repo_profile_applicability.count()
2271209
> db.repo_profile_applicability.count()
2105499
> 

...

Comment 13 Brian Bouterse 2018-08-01 17:57:15 UTC
Mike, you're right! It is that simple because those task types don't go through the scheduler with apply_async_with_reservation() so you can generically dispatch them. +1 to your great workaround!

Comment 14 pulp-infra@redhat.com 2018-08-02 07:17:29 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 15 pulp-infra@redhat.com 2018-08-02 07:17:32 UTC
The Pulp upstream bug priority is at High. Updating the external tracker on this bug.

Comment 16 pulp-infra@redhat.com 2018-08-02 15:34:02 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 25 pulp-infra@redhat.com 2018-08-08 21:01:32 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 26 pulp-infra@redhat.com 2018-08-08 21:31:29 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 27 pulp-infra@redhat.com 2018-08-09 18:03:51 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 28 pulp-infra@redhat.com 2018-08-09 18:04:01 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 33 jcallaha 2018-08-16 14:13:42 UTC
The currently solution is to create a con job to run the maintenance task at whatever interval you think is best. Simply create a new python script with the following two lines, then run that via cron.

from pulp.server.maintenance.monthly import queue_monthly_maintenance
queue_monthly_maintenance.apply_async()



-bash-4.2# cat test1609928.py 
from pulp.server.maintenance.monthly import queue_monthly_maintenance
print ("adding monthly maintenance job")
queue_monthly_maintenance.apply_async()
print ("done")

-bash-4.2# python test1609928.py && echo "--------------------------" && tail -f /var/log/messages | grep monthly_maintenance
adding monthly maintenance job
done
--------------------------
Aug 14 09:54:35 hp-ml350egen8-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.maintenance.monthly.queue_monthly_maintenance[c642f21e-cc5a-43a9-a09b-b85a469aa944]
Aug 14 09:54:35 hp-ml350egen8-01 pulp: celery.worker.strategy:INFO: Received task: pulp.server.maintenance.monthly.monthly_maintenance[3470fe62-dd1c-4cca-9154-c5beb7ac840d]
Aug 14 09:54:35 hp-ml350egen8-01 pulp: celery.worker.job:INFO: Task pulp.server.maintenance.monthly.queue_monthly_maintenance[c642f21e-cc5a-43a9-a09b-b85a469aa944] succeeded in 0.0409811839927s: None
Aug 14 09:54:35 hp-ml350egen8-01 pulp: celery.worker.job:INFO: Task pulp.server.maintenance.monthly.monthly_maintenance[3470fe62-dd1c-4cca-9154-c5beb7ac840d] succeeded in 0.0406332570128s: None

Comment 34 Brian Bouterse 2018-08-16 15:08:24 UTC
This is the command you can run from the shell as an alternative to the python approach. I believe this is what the rpm will install in cron.

celery -A pulp.server.async.app call pulp.server.maintenance.monthly.queue_monthly_maintenance

Comment 35 jcallaha 2018-08-16 15:55:22 UTC
Verified in Satellite 6.3.3 Snap 3.

The additional fix, for larger systems, was previously applied and tested on a customer system. With no issues encountered, we are marking this as verified.

Comment 36 Mike McCune 2018-08-20 16:45:35 UTC
We are delivering a new subpackage with this bug called pulp-maintenance which needs to be included in our composes as well as pulled in as a dep by either the 'satellite' meta RPM or some other package.

As of right now, it is not installing when upgrading to the latest Satellite 6.3.3 build. 

# rpm -q satellite
satellite-6.3.3-1.el7sat.noarch

# yum install pulp-maintenance
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Sat6-CI_Red_Hat_Satellite_6_3_Composes_Satellite_6_3_RHEL7                                                                                                                                                                                                                                             | 2.5 kB  00:00:00     
Sat6-CI_Red_Hat_Satellite_Puppet_4_6_3_Composes_Satellite_Puppet_4_6_3_RHEL7_x86_64                                                                                                                                                                                                                    | 2.1 kB  00:00:00     
Sat6-CI_Red_Hat_Satellite_Tools_6_3_Composes_Satellite_Tools                                                                                                                                                                                                                                           | 2.1 kB  00:00:00     
qemu-firmware-jenkins                                                                                                                                                                                                                                                                                  | 2.9 kB  00:00:00     
No package pulp-maintenance available.
Error: Nothing to do

Comment 37 pulp-infra@redhat.com 2018-08-20 17:59:51 UTC
Requesting needsinfo from upstream developer ttereshc@redhat.com because the 'FailedQA' flag is set.

Comment 38 pulp-infra@redhat.com 2018-08-20 18:00:02 UTC
Requesting needsinfo from upstream developer bbouters@redhat.com because the 'FailedQA' flag is set.

Comment 39 Mike McCune 2018-08-20 18:15:31 UTC
clearing needinfo, this is being resolved by pcreech

Comment 40 Patrick Creech 2018-08-20 18:29:40 UTC
Added to satellite package to be auto-installed.

Comment 41 Mike McCune 2018-08-20 21:39:07 UTC
updated to satellite.noarch 0:6.3.3-1.el7sat which pulled in the dep:

Installing for dependencies:
 pulp-maintenance                            noarch       2.13.4.12-1.el7sat         Sat6-CI_Red_Hat_Satellite_6_3_Composes_Satellite_6_3_RHEL7          60 k

..

Check the cron config:


# file /etc/cron.weekly/pulp-maintenance 
/etc/cron.weekly/pulp-maintenance: POSIX shell script, ASCII text executable


Run cron:

# /etc/cron.weekly/pulp-maintenance
a372f3d9-5ace-422e-a626-6537041e8fa3

Check journal:

Aug 20 17:26:42 sat-r220-07.lab.eng.rdu2.redhat.com pulp[8563]: celery.worker.job:INFO: Task pulp.server.maintenance.monthly.queue_monthly_maintenance[a372f3d9-5ace-422e-a626-6537041e8fa3] succeeded in 0.0403493009508s: None
Aug 20 17:26:42 sat-r220-07.lab.eng.rdu2.redhat.com pulp[15514]: pulp.server.managers.consumer.applicability:INFO: [4b14b8a0] Orphaned consumer profiles to process: 0
Aug 20 17:26:42 sat-r220-07.lab.eng.rdu2.redhat.com pulp[8560]: celery.worker.job:INFO: Task pulp.server.maintenance.monthly.monthly_maintenance[4b14b8a0-c593-4116-9c07-2570348b2401] succeeded in 0.0408377237618s: None

looks good to me.

Comment 42 jcallaha 2018-08-21 14:54:44 UTC
Verified in Satellite 6.3.3 Snap 4

The pulp-maintenance package is now being installed as part of satellite. This creates the cron job in weekly.



-bash-4.2# rpm -q pulp-maintenance
pulp-maintenance-2.13.4.12-1.el7sat.noarch

   
-bash-4.2# file /etc/cron.weekly/pulp-maintenance 
/etc/cron.weekly/pulp-maintenance: POSIX shell script, ASCII text executable


-bash-4.2# cat /etc/cron.weekly/pulp-maintenance 
#!/bin/sh
celery -A pulp.server.async.app call pulp.server.maintenance.monthly.queue_monthly_maintenance


-bash-4.2# bash /etc/cron.weekly/pulp-maintenance 
7d5e2483-cc51-41f2-865f-c9148820b0db


-bash-4.2# tail -50 /var/log/messages 
...
Aug 21 10:52:00 intel-canoepass-12 pulp: celery.worker.job:INFO: Task pulp.server.maintenance.monthly.queue_monthly_maintenance[7d5e2483-cc51-41f2-865f-c9148820b0db] succeeded in 0.0490260770002s: None
Aug 21 10:52:00 intel-canoepass-12 pulp: pulp.server.managers.consumer.applicability:INFO: [a15c899d] Orphaned consumer profiles to process: 0
Aug 21 10:52:00 intel-canoepass-12 pulp: celery.worker.job:INFO: Task pulp.server.maintenance.monthly.monthly_maintenance[a15c899d-9d90-40cc-bd7d-92d96b34ff05] succeeded in 0.0570463659997s: None

Comment 43 errata-xmlrpc 2018-08-22 20:07:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2550

Comment 44 pulp-infra@redhat.com 2018-09-18 18:32:02 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 45 pulp-infra@redhat.com 2018-10-12 07:01:27 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.


Note You need to log in before you can comment on or make changes to this bug.