Bug 2251014

Summary: Error loading data from Capsule: RestClient::NotFound - 404 Not Found
Product: Red Hat Satellite Reporter: Lukáš Hellebrandt <lhellebr>
Component: Tasks PluginAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Lukáš Hellebrandt <lhellebr>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.15.0CC: ahumbe, aruzicka, pcreech, rlavi
Target Milestone: 6.15.0Keywords: BetaBlocker, Regression, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-foreman-tasks-8.3.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-23 17:15:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukáš Hellebrandt 2023-11-22 10:41:28 UTC
Description of problem:
When dynflow-sidekiq service is restarted while a job is run on multiple hosts sequentially, the next host's run in line after the restart fails: "Error loading data from Capsule: RestClient::NotFound - 404 Not Found"

Version-Release number of selected component (if applicable):
Reproduced on Sat Stream 6.15 snap 37.0.
I couldn't reproduce on Sat 6.14 so this is a regression.

How reproducible:
Deterministic

Steps to Reproduce:
1. Have a Satellite with two hosts registered
2. Create a Job Template useful for your debugging, I used contents:
echo $(date) >> /root/test-<%= @host %>; sleep 120; echo slept-$(date) >> /root/test-<%= @host %>
3. Monitor -> Jobs -> Run Job
4. Select that template
5. Set filter to match the two hosts
6. Set concurrency level to one
7. Submit
8. On Satellite:
# systemctl stop dynflow-sidekiq@*
9. Wait a few seconds...
10. # systemctl start dynflow-sidekiq dynflow-sidekiq dynflow-sidekiq
11. Wait...
12. In WebUI, watch Job run details. Run on the first host should end somehow depending on what phase the daemon was killed in - it either fails after ~15 minutes or succeeds, it doesn't matter. Then run on the second host should start.

Actual results:
The run never finishes and its output shows the following error repeated indefinitely: "Error loading data from Capsule: RestClient::NotFound - 404 Not Found". No further hosts will ever run the job.

Expected results:
The run on second host should pass and other potential hosts should get their turn afterwards.

Comment 1 Lukáš Hellebrandt 2024-01-11 16:48:18 UTC
Verified on Sat 6.15.0 snap 5.0 and 3 hosts with RHELs 8.9 and 9.2 using reproducer from OP.

Comment 4 errata-xmlrpc 2024-04-23 17:15:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010