Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
with high UI & API load, getting about 0.5% of errors: No such file or directory @ rb_sysopen - /usr/share/foreman/tmp/cache/C3B/630/.permissions_check.224680.136744.49172
Description of problem:
I'm testing new script to simulate high UI and API load and I'm getting about 0.5% of errors:
No such file or directory @ rb_sysopen - /usr/share/foreman/tmp/cache/C3B/630/.permissions_check.224680.136744.49172
Version-Release number of selected component (if applicable):
satellite-6.12.0-2.el8sat.noarch
How reproducible:
Always
Steps to Reproduce:
1. Install Satellite, sync content, register hosts
2. Run the script (you need to `pip install locust`)
Actual results:
Errors like:
=====
<div class="alert alert-danger "><span class="pficon pficon-error-circle-o "></span> <strong>Oops, we're sorry but something went wrong </strong><span class="text">No such file or directory @ rb_sysopen - /usr/share/foreman/tmp/cache/C3B/630/.permissions_check.224680.136744.49172</span><div class="alert-actions"><hr><a class="btn btn-default" href="/">Back</a></div></div>
<p id="message">
If you feel this is an error with Satellite itself, please open a new issue with
<a rel="external" href="https://access.redhat.com/support/cases/#/case/new">Satellite ticketing system</a>,
Please include in your report the full error log that can be acquired by running:
<strong> foreman-rake errors:fetch_log request_id=5ffb6961</strong>
and it is highly recommended to also attach the sosreport output.
</p>
=====
Expected results:
Requests should not be failing
Additional info:
# python aaa.py --satellite-password changeme --locust-host https://localhost --locust-num-clients 10 --test-duration 300
[...]
request count fail ratio med resp time total RPS
---------------------------------------- ------- ------------ --------------- -----------
GET users_login_get 10 0.000 1000.000 0.033
GET locations 262 0.004 340.000 0.871
GET smart_proxies 277 0.004 340.000 0.921
GET hostgroups 275 0.007 340.000 0.915
GET organizations 272 0.000 340.000 0.905
GET overview 304 0.007 420.000 1.011
GET foreman_tasks_tasks 293 0.003 290.000 0.975
GET audits_page_per_page_search 258 0.000 790.000 0.858
GET templates_provisioning_templates 280 0.007 510.000 0.931
GET job_invocations 260 0.008 410.000 0.865
GET hosts 275 0.007 830.000 0.915
GET domains 296 0.010 310.000 0.985
GET katello_api_v2_content_views_nondef… 291 0.000 320.000 0.968
GET audits 255 0.016 290.000 0.848
GET foreman_tasks_api_tasks_include_per… 283 0.000 650.000 0.941
GET katello_api_v2_products_organizatio… 259 0.004 1400.000 0.861
GET katello_api_v2_packages_organizatio… 274 0.000 1800.000 0.911
SUMMARY 4424 0.005 581.361 14.715
Errors encountered:
name method error occurrences
--------------------------------------- -------- ---------------------------------------- -------------
katello_api_v2_products_organization_id GET CatchResponseError('Got wrong response') 1
overview GET CatchResponseError('Got wrong response') 2
hosts GET CatchResponseError('Got wrong response') 2
hostgroups GET CatchResponseError('Got wrong response') 2
smart_proxies GET CatchResponseError('Got wrong response') 1
job_invocations GET CatchResponseError('Got wrong response') 2
audits GET CatchResponseError('Got wrong response') 4
locations GET CatchResponseError('Got wrong response') 1
foreman_tasks_tasks GET CatchResponseError('Got wrong response') 1
domains GET CatchResponseError('Got wrong response') 3
templates_provisioning_templates GET CatchResponseError('Got wrong response') 2
Error "Got wrong response" means some very basic check on content sanity (usually just checking for page title or other unique-enough string) failed.
In this run, we can see that 21 requests out of 4424 failed. I have quickly checked the output and looks like they are all the same.
Comment 4Ewoud Kohl van Wijngaarden
2022-09-20 13:41:12 UTC
It is my theory that we're hitting the limits of the file based cache that we use. Quoting https://guides.rubyonrails.org/caching_with_rails.html#activesupport-cache-filestore> With this cache store, multiple server processes on the same host can share a cache. This cache store is appropriate for low to medium traffic sites that are served off one or two hosts. Server processes running on different hosts could share a cache by using a shared file system, but that setup is not recommended.> As the cache will grow until the disk is full, it is recommended to periodically clear out old entries.
Rails also has support for Redis caching (https://guides.rubyonrails.org/caching_with_rails.html#activesupport-cache-rediscachestore) and so does our installer (https://github.com/theforeman/puppet-foreman#rails-cache-support). Untested, but I think this should work:
--foreman-rails-cache-store:type redis
There are consideration we (as the platform team) should make. For example, our current Redis is tuned for persistence (because Dynflow and Pulp need to survive a Redis restart) but for caching you don't. You can only tune a whole instance so we may want to run two Redis instances. On the other hand, the amount we cache is rather small so perhaps it's not really an issue.
It also mentions hiredis as a faster library, which we could also package.
Jan: is this something you could test? I'd be happy to work with you offline to see if we can make this happen.
Description of problem: I'm testing new script to simulate high UI and API load and I'm getting about 0.5% of errors: No such file or directory @ rb_sysopen - /usr/share/foreman/tmp/cache/C3B/630/.permissions_check.224680.136744.49172 Version-Release number of selected component (if applicable): satellite-6.12.0-2.el8sat.noarch How reproducible: Always Steps to Reproduce: 1. Install Satellite, sync content, register hosts 2. Run the script (you need to `pip install locust`) Actual results: Errors like: ===== <div class="alert alert-danger "><span class="pficon pficon-error-circle-o "></span> <strong>Oops, we're sorry but something went wrong </strong><span class="text">No such file or directory @ rb_sysopen - /usr/share/foreman/tmp/cache/C3B/630/.permissions_check.224680.136744.49172</span><div class="alert-actions"><hr><a class="btn btn-default" href="/">Back</a></div></div> <p id="message"> If you feel this is an error with Satellite itself, please open a new issue with <a rel="external" href="https://access.redhat.com/support/cases/#/case/new">Satellite ticketing system</a>, Please include in your report the full error log that can be acquired by running: <strong> foreman-rake errors:fetch_log request_id=5ffb6961</strong> and it is highly recommended to also attach the sosreport output. </p> ===== Expected results: Requests should not be failing Additional info: # python aaa.py --satellite-password changeme --locust-host https://localhost --locust-num-clients 10 --test-duration 300 [...] request count fail ratio med resp time total RPS ---------------------------------------- ------- ------------ --------------- ----------- GET users_login_get 10 0.000 1000.000 0.033 GET locations 262 0.004 340.000 0.871 GET smart_proxies 277 0.004 340.000 0.921 GET hostgroups 275 0.007 340.000 0.915 GET organizations 272 0.000 340.000 0.905 GET overview 304 0.007 420.000 1.011 GET foreman_tasks_tasks 293 0.003 290.000 0.975 GET audits_page_per_page_search 258 0.000 790.000 0.858 GET templates_provisioning_templates 280 0.007 510.000 0.931 GET job_invocations 260 0.008 410.000 0.865 GET hosts 275 0.007 830.000 0.915 GET domains 296 0.010 310.000 0.985 GET katello_api_v2_content_views_nondef… 291 0.000 320.000 0.968 GET audits 255 0.016 290.000 0.848 GET foreman_tasks_api_tasks_include_per… 283 0.000 650.000 0.941 GET katello_api_v2_products_organizatio… 259 0.004 1400.000 0.861 GET katello_api_v2_packages_organizatio… 274 0.000 1800.000 0.911 SUMMARY 4424 0.005 581.361 14.715 Errors encountered: name method error occurrences --------------------------------------- -------- ---------------------------------------- ------------- katello_api_v2_products_organization_id GET CatchResponseError('Got wrong response') 1 overview GET CatchResponseError('Got wrong response') 2 hosts GET CatchResponseError('Got wrong response') 2 hostgroups GET CatchResponseError('Got wrong response') 2 smart_proxies GET CatchResponseError('Got wrong response') 1 job_invocations GET CatchResponseError('Got wrong response') 2 audits GET CatchResponseError('Got wrong response') 4 locations GET CatchResponseError('Got wrong response') 1 foreman_tasks_tasks GET CatchResponseError('Got wrong response') 1 domains GET CatchResponseError('Got wrong response') 3 templates_provisioning_templates GET CatchResponseError('Got wrong response') 2 Error "Got wrong response" means some very basic check on content sanity (usually just checking for page title or other unique-enough string) failed. In this run, we can see that 21 requests out of 4424 failed. I have quickly checked the output and looks like they are all the same.