Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1558069 - First bulk REX after smart_proxy_dynflow_core can raise "Could not use any Capsule" error
First bulk REX after smart_proxy_dynflow_core can raise "Could not use any Ca...
Status: CLOSED ERRATA
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Remote Execution (Show other bugs)
6.3.0
x86_64 Linux
medium Severity medium (vote)
: 6.3.2
: Unused
Assigned To: Adam Ruzicka
Harshad More
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-19 10:38 EDT by Pavel Moravec
Modified: 2018-06-19 16:17 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1572292 (view as bug list)
Environment:
Last Closed: 2018-06-19 16:17:00 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3385741 None None None 2018-03-19 11:37 EDT
Foreman Issue Tracker 22935 None None None 2018-03-19 10:55 EDT
Red Hat Product Errata RHBA-2018:1950 None None None 2018-06-19 16:17 EDT

  None (edit)
Description Pavel Moravec 2018-03-19 10:38:39 EDT
Description of problem:
When having one Capsule for REX (or when only one Caps can be used for REX to some hosts), "Could not use any Capsule" error might happen after a fresh restart of smart_proxy_dynflow_core (spdc) service, any time delay, and then a bulk of REX jobs.

Per aruzicka++, it happens because:
- restarting or reloading this service itself does not load DB scheme nor apply DB migrations
- that is done during the very first REX job / probe of this Capsule "liveness"
- if multiple REX jobs are scheduled to the same time and multiple such queries "is this Capsule running?" are raised concurrently, a race condition followed by some SQL error in spdc logs can cause the probe fails
- if this Caps is the only available for some host, whole REX job for the host fails with "Could not use any Capsule" error



Version-Release number of selected component (if applicable):
any (incl. Sat 6.2.14 and 6.3.0)


How reproducible:
100% (in few attempts, the worst)


Steps to Reproduce:
0) Have Sat without another Caps and REX working to some host (1 is enough)

1) restart spdc service:
service smart_proxy_dynflow_core restart

2) invoke more REX jobs in near future:
cat "date" > /tmp/rex-date

# update --start-at to some soon-in-future time, and update "MYHOST" to some host you have, optionally update --job-template-id to SSH REX one

for i in $(seq 1 100); do echo "job-invocation create --job-template-id 94 --input-files command=/tmp/rex-date --search-query \"name ~ MYHOST\" --async --start-at \"2018-03-19T13:50:00\""; done | hammer -u admin -p redhat shell

3) observe if all jobs succeeded


Actual results:
3) few very first jobs (usually two) fail - not granted, depends on how fast they trigger the probe to spdc


Expected results:
3) no such errors


Additional info:
workaround: run a dummy REX job against that Capsule after any spdc service reload and/or restart
Comment 1 Adam Ruzicka 2018-03-19 10:55:19 EDT
Created redmine issue http://projects.theforeman.org/issues/22935 from this bug
Comment 3 pm-sat@redhat.com 2018-03-19 12:14:01 EDT
Upstream bug assigned to aruzicka@redhat.com
Comment 4 pm-sat@redhat.com 2018-03-19 12:14:04 EDT
Upstream bug assigned to aruzicka@redhat.com
Comment 5 pm-sat@redhat.com 2018-03-20 10:14:03 EDT
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22935 has been resolved.
Comment 9 errata-xmlrpc 2018-06-19 16:17:00 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1950

Note You need to log in before you can comment on or make changes to this bug.