Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2090271 - Manifest refresh randomly fails with "No such file or directory" when having multile dynflow workers
Summary: Manifest refresh randomly fails with "No such file or directory" when having ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Subscription Management
Version: 6.10.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: 6.11.1
Assignee: Adam Ruzicka
QA Contact: Cole Higgins
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-25 13:12 UTC by Pavel Moravec
Modified: 2023-02-05 03:06 UTC (History)
5 users (show)

Fixed In Version: tfm-rubygem-katello-4.3.0.43-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2093408 2106092 (view as bug list)
Environment:
Last Closed: 2022-07-27 17:27:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 34957 0 Normal Closed Manifest refresh randomly fails with "No such file or directory" when having multile dynflow workers 2022-06-02 13:22:55 UTC
Red Hat Issue Tracker SAT-10607 0 None None None 2023-01-26 17:48:59 UTC
Red Hat Product Errata RHBA-2022:5742 0 None None None 2022-07-27 17:27:19 UTC

Description Pavel Moravec 2022-05-25 13:12:46 UTC
Description of problem:
Manifest refresh randomly fails on a Satellite with multiple dynflow workers with error:

Error: No such file or directory @ rb_sysopen - /tmp/0.7851943882678857.zip

The reason is *tricky* :
- ManifestRefresh task determines filename for the new manifest file as /tmp/#{rand}.zip
- UpstreamExport dynflow step is asked to export the new manifest to that file
- subsequent Import dynflow step is asked to read the file and process the update further

The dynflow steps can be processed by different dynflow workers, which are run as different systemd services. And sadly for us, the services use their own private temp directory like:

/tmp/systemd-private-4f8b157ce7c040f4b27e7ecbba68aa22-dynflow-sidekiq/tmp/

So, when UpstreamExport step is executed by one dynflow worker, it puts the zip file to its own private temp. And if we are unlucky, the Import step is picked by another worker that misses the file in its own private temp /o\ .

Which means, having 3 dynflow workers, there is just 1/3 probability the manifest refresh succeeds.


We need to use static/shared tmp file instead.


Version-Release number of selected component (if applicable):
Sat 6.10.5


How reproducible:
2/3 when having 3 dynflow workers


Steps to Reproduce:
1. Set up Satellite with 3 dynflow workers, e.g. per https://access.redhat.com/solutions/5695311
2. Import a manifest
3. Repeatedly refresh it:
hammer subscription refresh-manifest  --organization-id=1


Actual results:
3. randomly fails with error:
Error: No such file or directory @ rb_sysopen - /tmp/0.7851943882678857.zip

in such a case, the zip file can be spot under a private temp dir of a worker's service, like:
/tmp/systemd-private-4f8b157ce7c040f4b27e7ecbba68aa22-dynflow-sidekiq/tmp/0.7851943882678857.zip


Expected results:
manifest refresh to always succeed


Additional info:

Comment 1 Adam Ruzicka 2022-05-25 14:20:07 UTC
We could either use a different temporary directory (~foreman/tmp maybe?) or make all the workers run in the same mount namespace using JoinsNamespaceOf[1] in the service definition. Depending on this, the fix will either need to go to katello or foreman, either way I'm not sure about the right component.

[1] - https://www.freedesktop.org/software/systemd/man/systemd.unit.html#JoinsNamespaceOf=

Comment 2 Adam Ruzicka 2022-05-25 14:27:32 UTC
Created redmine issue https://projects.theforeman.org/issues/34957 from this bug

Comment 3 Evgeni Golov 2022-05-25 14:33:19 UTC
I guess the "correct" solution depends on which part of this we consider a bug ;-)

Is the general answer "dynflow workers should be able to exchange data via the filesystem", then they need either be in the same namespace (JoinNamespaceOf above) or explicitly have a way to say "store this data for sharing" (in Rails.root/tmp, or somewhere else).

Is the general answer "dynflow workers should be as isolated as possible, but this specific katello workflow needs it" then this workflow should write to Rails.root/tmp or similar

Comment 4 Bryan Kearney 2022-05-26 16:04:46 UTC
Upstream bug assigned to aruzicka

Comment 5 Bryan Kearney 2022-05-26 16:04:49 UTC
Upstream bug assigned to aruzicka

Comment 6 Bryan Kearney 2022-05-27 16:04:44 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/34957 has been resolved.

Comment 15 errata-xmlrpc 2022-07-27 17:27:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.11.1 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5742


Note You need to log in before you can comment on or make changes to this bug.