Description of problem: I'm testing new script to simulate high UI and API load and I'm getting about 0.5% of errors: No such file or directory @ rb_sysopen - /usr/share/foreman/tmp/cache/C3B/630/.permissions_check.224680.136744.49172 Version-Release number of selected component (if applicable): satellite-6.12.0-2.el8sat.noarch How reproducible: Always Steps to Reproduce: 1. Install Satellite, sync content, register hosts 2. Run the script (you need to `pip install locust`) Actual results: Errors like: ===== <div class="alert alert-danger "><span class="pficon pficon-error-circle-o "></span> <strong>Oops, we're sorry but something went wrong </strong><span class="text">No such file or directory @ rb_sysopen - /usr/share/foreman/tmp/cache/C3B/630/.permissions_check.224680.136744.49172</span><div class="alert-actions"><hr><a class="btn btn-default" href="/">Back</a></div></div> <p id="message"> If you feel this is an error with Satellite itself, please open a new issue with <a rel="external" href="https://access.redhat.com/support/cases/#/case/new">Satellite ticketing system</a>, Please include in your report the full error log that can be acquired by running: <strong> foreman-rake errors:fetch_log request_id=5ffb6961</strong> and it is highly recommended to also attach the sosreport output. </p> ===== Expected results: Requests should not be failing Additional info: # python aaa.py --satellite-password changeme --locust-host https://localhost --locust-num-clients 10 --test-duration 300 [...] request count fail ratio med resp time total RPS ---------------------------------------- ------- ------------ --------------- ----------- GET users_login_get 10 0.000 1000.000 0.033 GET locations 262 0.004 340.000 0.871 GET smart_proxies 277 0.004 340.000 0.921 GET hostgroups 275 0.007 340.000 0.915 GET organizations 272 0.000 340.000 0.905 GET overview 304 0.007 420.000 1.011 GET foreman_tasks_tasks 293 0.003 290.000 0.975 GET audits_page_per_page_search 258 0.000 790.000 0.858 GET templates_provisioning_templates 280 0.007 510.000 0.931 GET job_invocations 260 0.008 410.000 0.865 GET hosts 275 0.007 830.000 0.915 GET domains 296 0.010 310.000 0.985 GET katello_api_v2_content_views_nondef… 291 0.000 320.000 0.968 GET audits 255 0.016 290.000 0.848 GET foreman_tasks_api_tasks_include_per… 283 0.000 650.000 0.941 GET katello_api_v2_products_organizatio… 259 0.004 1400.000 0.861 GET katello_api_v2_packages_organizatio… 274 0.000 1800.000 0.911 SUMMARY 4424 0.005 581.361 14.715 Errors encountered: name method error occurrences --------------------------------------- -------- ---------------------------------------- ------------- katello_api_v2_products_organization_id GET CatchResponseError('Got wrong response') 1 overview GET CatchResponseError('Got wrong response') 2 hosts GET CatchResponseError('Got wrong response') 2 hostgroups GET CatchResponseError('Got wrong response') 2 smart_proxies GET CatchResponseError('Got wrong response') 1 job_invocations GET CatchResponseError('Got wrong response') 2 audits GET CatchResponseError('Got wrong response') 4 locations GET CatchResponseError('Got wrong response') 1 foreman_tasks_tasks GET CatchResponseError('Got wrong response') 1 domains GET CatchResponseError('Got wrong response') 3 templates_provisioning_templates GET CatchResponseError('Got wrong response') 2 Error "Got wrong response" means some very basic check on content sanity (usually just checking for page title or other unique-enough string) failed. In this run, we can see that 21 requests out of 4424 failed. I have quickly checked the output and looks like they are all the same.
It is my theory that we're hitting the limits of the file based cache that we use. Quoting https://guides.rubyonrails.org/caching_with_rails.html#activesupport-cache-filestore > With this cache store, multiple server processes on the same host can share a cache. This cache store is appropriate for low to medium traffic sites that are served off one or two hosts. Server processes running on different hosts could share a cache by using a shared file system, but that setup is not recommended. > As the cache will grow until the disk is full, it is recommended to periodically clear out old entries. Rails also has support for Redis caching (https://guides.rubyonrails.org/caching_with_rails.html#activesupport-cache-rediscachestore) and so does our installer (https://github.com/theforeman/puppet-foreman#rails-cache-support). Untested, but I think this should work: --foreman-rails-cache-store:type redis There are consideration we (as the platform team) should make. For example, our current Redis is tuned for persistence (because Dynflow and Pulp need to survive a Redis restart) but for caching you don't. You can only tune a whole instance so we may want to run two Redis instances. On the other hand, the amount we cache is rather small so perhaps it's not really an issue. It also mentions hiredis as a faster library, which we could also package. Jan: is this something you could test? I'd be happy to work with you offline to see if we can make this happen.
Sure, happy to test as I have appropriate setup around, pinging you on GChat.
We found out that in Satellite the foreman-redis package is not in the repositories, so as a user you're unable to use it today.
*** This bug has been marked as a duplicate of bug 2063717 ***