Bug 1622802

Summary: Running Ansible role fails with: Actions::ProxyAction::ProxyActionMissing: Proxy task gone missing from the capsule
Product: Red Hat Satellite Reporter: sbadhwar
Component: Remote ExecutionAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.4CC: aruzicka, inecas, jhutar, pcreech, psuriset
Target Milestone: 6.5.0Keywords: Performance, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-14 12:37:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 3 Ivan Necas 2018-08-29 16:02:29 UTC
Few notes from the investigation so far:

This seems like a behaviour introduced by https://projects.theforeman.org/issues/23017: we're checking the for the task details on the smart proxy periodically (every 10 minutes) to see, if there are any tasks in wrong state (or are missing) and we update the jobs on Satellite side to prevent situations, where the task gets stuck forever, for example in the case when the smart_proxy_dynflow_core service would be restarted.

For some unknown reason so far (working on a patch for giving us more insights into the behaviour), at some occasion, we failed to get the results from the smart proxy, or the tasks in the results were missing although the tasks were actually present on the proxy.

There are two additional issues observed:

* [2018-08-27 14:11:39.782 #26629] ERROR -- dynflow: timeout: 5.0, elapsed: 5.914493396994658 (Sequel::PoolTimeout) in /var/log/foreman-proxy/smart_proxy_dynflow_core.log - it's probably related to the default max_connection limit on the sequel library, and under heavy load we can exceed this connection - there is a rescue mechanism on dynflow to retry the operation in this case, and it should not directly affect anything, but it's still a chance it has some influence on this bug as well: anyway, I will file different BZ to track this behaviour

* the smart_proxy_dynflow_core.log not being updated after the log-rotate - I will file yet another issue for this: however what that means for this BZ is that we're missing some important log messages for futher investigation: therefore I suggest, during the next test attempt (after I provide the additional logging patch), to first restart smart_proxy_dynflow_core on all the capsules involved in the testing

Comment 11 Adam Ruzicka 2018-09-12 08:30:31 UTC
Created redmine issue http://projects.theforeman.org/issues/24909 from this bug

Comment 12 Satellite Program 2018-09-12 10:03:00 UTC
Upstream bug assigned to aruzicka

Comment 13 Satellite Program 2018-09-12 10:03:03 UTC
Upstream bug assigned to aruzicka

Comment 15 Peter Ondrejka 2018-11-06 12:15:11 UTC
Verified on Satellite 6.5 snap 2 using steps from comment 14.

Comment 19 errata-xmlrpc 2019-05-14 12:37:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222