When synchronizing 20+ repositories users will see one or more of the synchronization tasks never complete with a step showing: "waiting for Pulp to start the task" when in fact, Pulp has completed the task. This is resolved upstream with: https://github.com/Dynflow/dynflow/pull/362/ Errors in the log include: PersistenceError in executor caused by Sequel::PoolTimeout: timeout: 5.0, elapsed: 5.000096781004686 (Dynflow::Errors::PersistenceError) This is a regression from 6.7
Upstream fix was merged, moving to POST
Upstream release 1.4.7 containing fix for this BZ is out, moving to modified.
Hello. Even with tfm-rubygem-dynflow-1.4.7-1.fm2_1.el7sat.noarch (used `yum upgrade; satellite-installer --scenario satellite` to upgrade) I'm still getting lots of these when registering 62 hosts in parallel: 2020-09-07T09:51:20 [E|kat|6d773629] ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.003 seconds); all pooled connections were in use (30 passed, 32 failed). Looking into our monitoring, now we are topping 59 connections to PostgreSQL (47 for foreman, 12 for candlepin) where we had 48 before (39 for foreman and 9 for candlepin) - so there probably is some improvement, but either there is some issue, or I'm missing something?
Could you confirm that all dynflow-sidekiq@* services were restarted during that upgrade?
I have tried to increase "pool" value from 5 to 10 and 20 in /etc/foreman/database.yml and that helped partially (~30 and ~40 passed) but still I do not see DB connections in our monitoring (max was 78 - 63 for foreman). Now I have tried to change "concurrency" value from 5 to 10 in /etc/foreman/dynflow/worker.yml (with "pool" back to 5 in /etc/foreman/database.yml) and got 25 passes (out of these 62 concurrent registrations).
(In reply to Adam Ruzicka from comment #4) > Could you confirm that all dynflow-sidekiq@* services were restarted during > that upgrade? Yep, I have `foreman-maintain service reboot` multiple times and process age for sidekiq confirms it: [root@f03-h29-000-r620 ~]# ps axf | grep side 28735 pts/2 S+ 0:00 | \_ grep --color=auto side 24692 ? Ssl 1:29 sidekiq 5.2.7 [0 of 1 busy] 25507 ? Ssl 1:19 sidekiq 5.2.7 [0 of 10 busy] 25518 ? Ssl 1:52 sidekiq 5.2.7 [0 of 5 busy] [root@f03-h29-000-r620 ~]# ps -p 24692,25507,25518 -o lstart STARTED Mon Sep 7 10:38:49 2020 Mon Sep 7 10:40:38 2020 Mon Sep 7 10:40:38 2020 [root@f03-h29-000-r620 ~]# date Mon Sep 7 10:52:02 UTC 2020 I'm trying to reboot to be 100% sure all is fresh.
The line from logs from #3 leads me to believe the pool ran out of connections one of puma workers, while this BZ focused purely on dynflow-sidekiq workers. The pool depletion in puma is tracked in upstream as https://projects.theforeman.org/issues/30789/ . Maybe it should have its own BZ, the symptoms are almost the same, but it happens in a different process and the fix is completely different.
Once we switch back to Passenger in snap 16, we can re-test this as it effects how DB pooling is utilized.
Moving back ON_QA for a re-test now that we are running Passenger again.
Hello, I have tried to sync more than 20+ repositories and they synced successfully. Tested 10,20 and 30 repositories with iso, yum and docker repo types. This bug has been verified in a new snap of Satellite 6.8. Thank you.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.8 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4366