Bug 2090271
| Summary: | Manifest refresh randomly fails with "No such file or directory" when having multile dynflow workers | |||
|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | |
| Component: | Subscription Management | Assignee: | Adam Ruzicka <aruzicka> | |
| Status: | CLOSED ERRATA | QA Contact: | Cole Higgins <chiggins> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 6.10.5 | CC: | aruzicka, egolov, osousa, pmendezh, sraut | |
| Target Milestone: | 6.11.1 | Keywords: | EasyFix, Triaged | |
| Target Release: | Unused | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | tfm-rubygem-katello-4.3.0.43-1 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2093408 2106092 (view as bug list) | Environment: | ||
| Last Closed: | 2022-07-27 17:27:09 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
We could either use a different temporary directory (~foreman/tmp maybe?) or make all the workers run in the same mount namespace using JoinsNamespaceOf[1] in the service definition. Depending on this, the fix will either need to go to katello or foreman, either way I'm not sure about the right component. [1] - https://www.freedesktop.org/software/systemd/man/systemd.unit.html#JoinsNamespaceOf= Created redmine issue https://projects.theforeman.org/issues/34957 from this bug I guess the "correct" solution depends on which part of this we consider a bug ;-) Is the general answer "dynflow workers should be able to exchange data via the filesystem", then they need either be in the same namespace (JoinNamespaceOf above) or explicitly have a way to say "store this data for sharing" (in Rails.root/tmp, or somewhere else). Is the general answer "dynflow workers should be as isolated as possible, but this specific katello workflow needs it" then this workflow should write to Rails.root/tmp or similar Upstream bug assigned to aruzicka Upstream bug assigned to aruzicka Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/34957 has been resolved. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Satellite 6.11.1 Async Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5742 |
Description of problem: Manifest refresh randomly fails on a Satellite with multiple dynflow workers with error: Error: No such file or directory @ rb_sysopen - /tmp/0.7851943882678857.zip The reason is *tricky* : - ManifestRefresh task determines filename for the new manifest file as /tmp/#{rand}.zip - UpstreamExport dynflow step is asked to export the new manifest to that file - subsequent Import dynflow step is asked to read the file and process the update further The dynflow steps can be processed by different dynflow workers, which are run as different systemd services. And sadly for us, the services use their own private temp directory like: /tmp/systemd-private-4f8b157ce7c040f4b27e7ecbba68aa22-dynflow-sidekiq/tmp/ So, when UpstreamExport step is executed by one dynflow worker, it puts the zip file to its own private temp. And if we are unlucky, the Import step is picked by another worker that misses the file in its own private temp /o\ . Which means, having 3 dynflow workers, there is just 1/3 probability the manifest refresh succeeds. We need to use static/shared tmp file instead. Version-Release number of selected component (if applicable): Sat 6.10.5 How reproducible: 2/3 when having 3 dynflow workers Steps to Reproduce: 1. Set up Satellite with 3 dynflow workers, e.g. per https://access.redhat.com/solutions/5695311 2. Import a manifest 3. Repeatedly refresh it: hammer subscription refresh-manifest --organization-id=1 Actual results: 3. randomly fails with error: Error: No such file or directory @ rb_sysopen - /tmp/0.7851943882678857.zip in such a case, the zip file can be spot under a private temp dir of a worker's service, like: /tmp/systemd-private-4f8b157ce7c040f4b27e7ecbba68aa22-dynflow-sidekiq/tmp/0.7851943882678857.zip Expected results: manifest refresh to always succeed Additional info: