Bug 1558069 - First bulk REX after smart_proxy_dynflow_core can raise "Could not use any Capsule" error
Summary: First bulk REX after smart_proxy_dynflow_core can raise "Could not use any Ca...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Remote Execution
Version: 6.3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: Unspecified
Assignee: Adam Ruzicka
QA Contact: Harshad More
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-19 14:38 UTC by Pavel Moravec
Modified: 2021-12-10 15:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1572292 (view as bug list)
Environment:
Last Closed: 2018-06-19 20:17:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 22935 0 Normal Closed First bulk REX after smart_proxy_dynflow_core can raise "Could not use any Capsule" error 2020-06-10 09:08:50 UTC
Red Hat Knowledge Base (Solution) 3385741 0 None None None 2018-03-19 15:37:02 UTC
Red Hat Product Errata RHBA-2018:1950 0 None None None 2018-06-19 20:17:33 UTC

Description Pavel Moravec 2018-03-19 14:38:39 UTC
Description of problem:
When having one Capsule for REX (or when only one Caps can be used for REX to some hosts), "Could not use any Capsule" error might happen after a fresh restart of smart_proxy_dynflow_core (spdc) service, any time delay, and then a bulk of REX jobs.

Per aruzicka++, it happens because:
- restarting or reloading this service itself does not load DB scheme nor apply DB migrations
- that is done during the very first REX job / probe of this Capsule "liveness"
- if multiple REX jobs are scheduled to the same time and multiple such queries "is this Capsule running?" are raised concurrently, a race condition followed by some SQL error in spdc logs can cause the probe fails
- if this Caps is the only available for some host, whole REX job for the host fails with "Could not use any Capsule" error



Version-Release number of selected component (if applicable):
any (incl. Sat 6.2.14 and 6.3.0)


How reproducible:
100% (in few attempts, the worst)


Steps to Reproduce:
0) Have Sat without another Caps and REX working to some host (1 is enough)

1) restart spdc service:
service smart_proxy_dynflow_core restart

2) invoke more REX jobs in near future:
cat "date" > /tmp/rex-date

# update --start-at to some soon-in-future time, and update "MYHOST" to some host you have, optionally update --job-template-id to SSH REX one

for i in $(seq 1 100); do echo "job-invocation create --job-template-id 94 --input-files command=/tmp/rex-date --search-query \"name ~ MYHOST\" --async --start-at \"2018-03-19T13:50:00\""; done | hammer -u admin -p redhat shell

3) observe if all jobs succeeded


Actual results:
3) few very first jobs (usually two) fail - not granted, depends on how fast they trigger the probe to spdc


Expected results:
3) no such errors


Additional info:
workaround: run a dummy REX job against that Capsule after any spdc service reload and/or restart

Comment 1 Adam Ruzicka 2018-03-19 14:55:19 UTC
Created redmine issue http://projects.theforeman.org/issues/22935 from this bug

Comment 3 Satellite Program 2018-03-19 16:14:01 UTC
Upstream bug assigned to aruzicka

Comment 4 Satellite Program 2018-03-19 16:14:04 UTC
Upstream bug assigned to aruzicka

Comment 5 Satellite Program 2018-03-20 14:14:03 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22935 has been resolved.

Comment 9 errata-xmlrpc 2018-06-19 20:17:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1950


Note You need to log in before you can comment on or make changes to this bug.