Few notes from the investigation so far:
This seems like a behaviour introduced by https://projects.theforeman.org/issues/23017: we're checking the for the task details on the smart proxy periodically (every 10 minutes) to see, if there are any tasks in wrong state (or are missing) and we update the jobs on Satellite side to prevent situations, where the task gets stuck forever, for example in the case when the smart_proxy_dynflow_core service would be restarted.
For some unknown reason so far (working on a patch for giving us more insights into the behaviour), at some occasion, we failed to get the results from the smart proxy, or the tasks in the results were missing although the tasks were actually present on the proxy.
There are two additional issues observed:
* [2018-08-27 14:11:39.782 #26629] ERROR -- dynflow: timeout: 5.0, elapsed: 5.914493396994658 (Sequel::PoolTimeout) in /var/log/foreman-proxy/smart_proxy_dynflow_core.log - it's probably related to the default max_connection limit on the sequel library, and under heavy load we can exceed this connection - there is a rescue mechanism on dynflow to retry the operation in this case, and it should not directly affect anything, but it's still a chance it has some influence on this bug as well: anyway, I will file different BZ to track this behaviour
* the smart_proxy_dynflow_core.log not being updated after the log-rotate - I will file yet another issue for this: however what that means for this BZ is that we're missing some important log messages for futher investigation: therefore I suggest, during the next test attempt (after I provide the additional logging patch), to first restart smart_proxy_dynflow_core on all the capsules involved in the testing
Created redmine issue http://projects.theforeman.org/issues/24909 from this bug
Upstream bug assigned to email@example.com
Verified on Satellite 6.5 snap 2 using steps from comment 14.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.