Description of problem: When having one Capsule for REX (or when only one Caps can be used for REX to some hosts), "Could not use any Capsule" error might happen after a fresh restart of smart_proxy_dynflow_core (spdc) service, any time delay, and then a bulk of REX jobs. Per aruzicka++, it happens because: - restarting or reloading this service itself does not load DB scheme nor apply DB migrations - that is done during the very first REX job / probe of this Capsule "liveness" - if multiple REX jobs are scheduled to the same time and multiple such queries "is this Capsule running?" are raised concurrently, a race condition followed by some SQL error in spdc logs can cause the probe fails - if this Caps is the only available for some host, whole REX job for the host fails with "Could not use any Capsule" error Version-Release number of selected component (if applicable): any (incl. Sat 6.2.14 and 6.3.0) How reproducible: 100% (in few attempts, the worst) Steps to Reproduce: 0) Have Sat without another Caps and REX working to some host (1 is enough) 1) restart spdc service: service smart_proxy_dynflow_core restart 2) invoke more REX jobs in near future: cat "date" > /tmp/rex-date # update --start-at to some soon-in-future time, and update "MYHOST" to some host you have, optionally update --job-template-id to SSH REX one for i in $(seq 1 100); do echo "job-invocation create --job-template-id 94 --input-files command=/tmp/rex-date --search-query \"name ~ MYHOST\" --async --start-at \"2018-03-19T13:50:00\""; done | hammer -u admin -p redhat shell 3) observe if all jobs succeeded Actual results: 3) few very first jobs (usually two) fail - not granted, depends on how fast they trigger the probe to spdc Expected results: 3) no such errors Additional info: workaround: run a dummy REX job against that Capsule after any spdc service reload and/or restart
Created redmine issue http://projects.theforeman.org/issues/22935 from this bug
Upstream bug assigned to aruzicka
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22935 has been resolved.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1950