Bug 1622802 - Running Ansible role fails with: Actions::ProxyAction::ProxyActionMissing: Proxy task gone missing from the capsule
Summary: Running Ansible role fails with: Actions::ProxyAction::ProxyActionMissing: ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Remote Execution
Version: 6.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium vote
Target Milestone: Released
Assignee: Adam Ruzicka
QA Contact: Peter Ondrejka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-28 04:16 UTC by sbadhwar
Modified: 2019-10-07 17:13 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-14 12:37:48 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:1222 None None None 2019-05-14 12:37:57 UTC
Foreman Issue Tracker 24909 None None None 2018-09-12 08:30:34 UTC

Comment 3 Ivan Necas 2018-08-29 16:02:29 UTC
Few notes from the investigation so far:

This seems like a behaviour introduced by https://projects.theforeman.org/issues/23017: we're checking the for the task details on the smart proxy periodically (every 10 minutes) to see, if there are any tasks in wrong state (or are missing) and we update the jobs on Satellite side to prevent situations, where the task gets stuck forever, for example in the case when the smart_proxy_dynflow_core service would be restarted.

For some unknown reason so far (working on a patch for giving us more insights into the behaviour), at some occasion, we failed to get the results from the smart proxy, or the tasks in the results were missing although the tasks were actually present on the proxy.

There are two additional issues observed:

* [2018-08-27 14:11:39.782 #26629] ERROR -- dynflow: timeout: 5.0, elapsed: 5.914493396994658 (Sequel::PoolTimeout) in /var/log/foreman-proxy/smart_proxy_dynflow_core.log - it's probably related to the default max_connection limit on the sequel library, and under heavy load we can exceed this connection - there is a rescue mechanism on dynflow to retry the operation in this case, and it should not directly affect anything, but it's still a chance it has some influence on this bug as well: anyway, I will file different BZ to track this behaviour

* the smart_proxy_dynflow_core.log not being updated after the log-rotate - I will file yet another issue for this: however what that means for this BZ is that we're missing some important log messages for futher investigation: therefore I suggest, during the next test attempt (after I provide the additional logging patch), to first restart smart_proxy_dynflow_core on all the capsules involved in the testing

Comment 11 Adam Ruzicka 2018-09-12 08:30:31 UTC
Created redmine issue http://projects.theforeman.org/issues/24909 from this bug

Comment 12 pm-sat@redhat.com 2018-09-12 10:03:00 UTC
Upstream bug assigned to aruzicka@redhat.com

Comment 13 pm-sat@redhat.com 2018-09-12 10:03:03 UTC
Upstream bug assigned to aruzicka@redhat.com

Comment 15 Peter Ondrejka 2018-11-06 12:15:11 UTC
Verified on Satellite 6.5 snap 2 using steps from comment 14.

Comment 19 errata-xmlrpc 2019-05-14 12:37:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222


Note You need to log in before you can comment on or make changes to this bug.