Description of problem:
Currently, the system registration and unregistration processes create tasks, and then wait for those tasks to complete before returning to subscription-manager.
This usually works fine, but if the satellite is heavily loaded, registration requests may hold passenger workers for a long period of time while tasks are processed. This can eventually cause the satellite to run out of passenger workers and will tip over. Additionally, if http requests get backed up and take all queue slots, pulp API requests will become unresponsive. Many tasks (including system registration) require Pulp to accept API requests, so this can result in the satellite getting wound around the axle.
If registrations and unregistrations are done without blocking on tasks, this avoids this issue. It's technically not an issue if unregistrations occur via task, but if a register happens without a task, we would want the unregister to happen in the same time frame to ensure the subscription is freed before its consumed elsewhere.
Version-Release number of selected component (if applicable): 6.2.12
How reproducible: every time
Steps to Reproduce:
** note ** these steps describe how to make the satellite slow but not fall over from registrations, so the issue is more apparent. Once it falls over, there is not much to observe :)
1. set up a new satellite and configure it to have 40 passenger workers. The number isn't important as long as it's higher than the number in step 2
2. set up 15 clients to register to the Satellite in a loop, using subscrpition-mangaer register --force
3. on the satellite, run 'watch "passenger-status --show=requests | grep -e 'uri\|connected'" '. This will show the response times for the registration requests as they come in.
Actual results: lots of registration requests over 10 seconds, some over 20-30 seconds. Lots of running/pending tasks in the task list.
Expected results: registrations should get processed within a few seconds so as not to hold up passenger workers
*** Bug 1438945 has been marked as a duplicate of this bug. ***
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/21703 has been resolved.
*** Bug 1523254 has been marked as a duplicate of this bug. ***
Verified in Satellite 6.2.15 Snap 2
I loosely followed the steps outlined in the description.
I increased the passengers to 40
Then ran a loop of 150 hosts creating, registering, and destroying themselves in batches of 15.
See the attached gif for a recording of the grep'd passenger-status.
Created attachment 1427796 [details]
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.